At approximately 1:08PM EDT we received an alert that a stratus node was offline. We began investigating and found that AWS was reporting all status checks on the instance were failing. At this point we issued a reboot to the downed instance. While waiting for the node to come back up a second node reported down. Both nodes then became responsive minutes later and we then began recovery of containerized services of those nodes.
A support case was opened with AWS and they confirmed that there was an issue with the underlying server which they had automatically rebooted. AWS also confirmed that both of these instances resided on that same physical hardware. The report from AWS and the timeline of how the nodes were rebooted at different times indicates this was a fault at the hypervisor level and not hardware. Systems have been stable since the nodes have been recovered.