After 12+ hours of monitoring, the issue is fixed. A particular instance was becoming overloaded and causing some cascading failures related to autoscaling. Additional checks and safeguards have been added for that circumstance to prevent it. The underlying EC2 instances can reboot if they are overloaded and suffer a kernel panic.
Posted 3 months ago. Jan 05, 2019 - 11:20 EST
This appears resolved but we are still monitoring all metrics at this time.
Posted 3 months ago. Jan 04, 2019 - 17:53 EST
We've identified a root cause which is affecting more stores and we are resolving it now.
Posted 3 months ago. Jan 04, 2019 - 12:51 EST
Some stores have relaunched in us-east related to a particular node failing, we see no errors on our side and we're working with AWS support to get more information.
Sites are already back online or will be shortly.
Posted 3 months ago. Jan 04, 2019 - 11:15 EST
This incident affected: Mojo Stratus - Northern Virginia.