Network Disruption
Incident Report for Webscale STRATUS
Resolved
During routine maintenance of the Stratus platform where a node was being rebooted a temporary disruption of internal dns resolution occurred for some requests. On the node that was being rebooted there was one of the services that run coredns. This service is redundant, however it took the system a few minutes to figure out this pod was no longer available, during which time some requests were still being routed to it and failing. We are currently evaluating how to disable core dns requests from going to nodes under maintenance.
Posted Jun 12, 2019 - 04:00 EDT