Node unresponsive in us-east
Incident Report for Webscale STRATUS
Resolved
After 12+ hours of monitoring, the issue is fixed. A particular instance was becoming overloaded and causing some cascading failures related to autoscaling. Additional checks and safeguards have been added for that circumstance to prevent it. The underlying EC2 instances can reboot if they are overloaded and suffer a kernel panic.
Posted Jan 05, 2019 - 11:20 EST
Monitoring
This appears resolved but we are still monitoring all metrics at this time.
Posted Jan 04, 2019 - 17:53 EST
Identified
We've identified a root cause which is affecting more stores and we are resolving it now.
Posted Jan 04, 2019 - 12:51 EST
Monitoring
Some stores have relaunched in us-east related to a particular node failing, we see no errors on our side and we're working with AWS support to get more information.

Sites are already back online or will be shortly.
Posted Jan 04, 2019 - 11:15 EST
This incident affected: Webscale STRATUS - Northern Virginia.