After 12+ hours of monitoring, the issue is fixed. A particular instance was becoming overloaded and causing some cascading failures related to autoscaling. Additional checks and safeguards have been added for that circumstance to prevent it. The underlying EC2 instances can reboot if they are overloaded and suffer a kernel panic.
Posted 17 days ago. Jan 05, 2019 - 11:20 EST
This appears resolved but we are still monitoring all metrics at this time.
Posted 18 days ago. Jan 04, 2019 - 17:53 EST
We've identified a root cause which is affecting more stores and we are resolving it now.
Posted 18 days ago. Jan 04, 2019 - 12:51 EST
Some stores have relaunched in us-east related to a particular node failing, we see no errors on our side and we're working with AWS support to get more information.
Sites are already back online or will be shortly.
Posted 18 days ago. Jan 04, 2019 - 11:15 EST
This incident affected: Mojo Stratus - Northern Virginia.