Pricing Log In

Outage in Microsoft Azure

Service Management Operation Errors Across Azure Services in East US 2

Resolved Minor

April 08, 2022 - Started over 3 years ago - Lasted about 7 hours

Incident Report

Impact Statement: Starting at approximately 12:25 UTC on 08 Apr 2022, customers running services in the East US 2 region may be experiencing service management errors, delays, and/or timeouts. We are investigating an underlying issue causing GET and PUT errors impacting the Azure portal itself, as well as services including Azure Virtual Machines (VMs), Virtual Machine Scale Sets (VMSS), and additional downstream services. Customers may see errors including “The network connectivity issue encountered for Microsoft.Compute cannot fulfill the request.” Finally, for some downstream services that have auto-scale enabled, this service management issue may cause data plane impact.Current Status: The series of mitigation efforts described in earlier incident updates is still making progress in improving error rates. Internal services continue to report significant improvements in the proportion of requests that are succeeding. While mitigation is still being applied, the investigation into what is causing this incident has determined that the Compute Resource Provider (CRP) gateways in East US 2 are being overwhelmed with requests for compute resources. Mitigation workstreams continue to focus on how to prevent CRP gateways from becoming unhealthy. While the combination of restarts, scaling out, and traffic reduction initially helped some gateway nodes to return to a healthy state, and stay healthy, other gateway nodes are routinely getting into a condition of being overloaded by request volume. To resolve this, there are two mitigation workstreams being run in parallel – in the short term, we are investigating automation to restart gateway nodes on a regular basis to avoid getting into an unhealthy state. In the long term, we are investigating a CRP gateway hotfix that will obviate the need for restarts and prevent each node from becoming unhealthy. Both these work streams are making good progress. At this stage, we believe that we have eliminated impact to most of the downstream services and are working with each team to confirm mitigation. We are also working with the last couple of services to mitigate them.Although we believe that external customers and partners are continuing to see improvements, as mentioned we are not declaring mitigation until error rates return to pre-incident levels. While mitigation efforts continue, we will continue to provide hourly updates to ensure that all impacted customers and partners are informed of progress. The next update will be provided by 03:00 UTC, April 9th, or as soon as we have an update to share.

Need to monitor Microsoft Azure outages?

One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.

Start Free Trial

Latest Microsoft Azure outages

Azure OpenAI GPT‑4 Mini service impact in Sweden Central - 4 days ago

Thermal event in West Europe region - 12 days ago

Resource creation in Australia East - 14 days ago

Azure Front Door - Connectivity issues - 19 days ago

Issues accessing the Azure Portal - about 1 month ago

The Status Page Aggregator Built for IT Teams

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4600 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook