Outage in Azure

Mitigated – Networking reduced availability in East US

Resolved Minor
March 18, 2025 - Started 9 days ago - Lasted about 18 hours

Need to monitor Azure outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Azure, and never miss an outage again.
Start Free Trial

Outage Details

What happened?Between 13:09 UTC and 18:51 UTC on 18 March 2025, a platform issue resulted in an impact to a subset of Azure customers in the East US region. Customers may have experienced intermittent connectivity loss and increased network latency sending traffic within as well as in and out of East US Region.At 23:21 UTC on 18 March 2025, another impact to network capacity occurred during the recovery of the underlying fiber that customers may have experienced the same intermittent connectivity loss and increased latency sending traffic within, to and from East US Region.This incident is now mitigatedWhat do we know so far?We identified multiple fiber cuts affecting a subset of datacenters in the East US region at 13:09 UTC on 18 March 2025. The fiber cut impacted capacity to those datacenters increasing the utilization for the remaining capacity serving the affected datacenters. At 13:55 UTC on 18 March 2025, we began mitigating the impact of the fiber cut by load balancing traffic and restoring some of the impacted capacity; customers should have started to see service recover starting at this time. The restoration of traffic was fully completed by 18:51 UTC on 18 March 2025 and the issue was mitigated.At 23:20 UTC on 18 March 2025, another impact was observed during the capacity repair process. This was due to a tooling failure during the recovery process that started adding traffic back into the network before the underlying capacity was ready. The impact was mitigated at 00:30 UTC on 19 March after isolating the capacity impacted by the tooling failure.At 01:52 UTC on 19 March, the underlying fiber cut has been fully restored. We continued to test and restore all capacity to pre-incident levels, these tasks completed at 6:50 UTC on 19 March. How did we respond?13:09 UTC on 18 March 2025 - Fiber cut in East US that caused packet drops. Our monitoring systems identified the impact.13:55 UTC on 18 March 2025 - Mitigation efforts begin with identifying the impacted data centers and redirecting traffic to healthier routes.15:07 UTC on 18 March 2025 - Outage declared; all East US customers notified of potential impact.18:51 UTC on 18 March 2025 - Mitigation efforts have been successfully completed. All devices affected by the fiber cut have been isolated.23:20 UTC on 18 March 2025 - An additional impact due to tooling failure was noted during the capacity repair process of the previous incident. It was anticipated that the capacity repair process would not impact customers.00:28 UTC on 19 March 2025 - The second impact was mitigated after isolating the capacity resources impacted by the tooling failure. At this stage most customers and services would have seen full mitigation.01:52 UTC on 19 March 2025 - The underlying fiber cut has been fully restored. We continued to monitor our capacity during the recovery process.06:50 UTC on 19 March 2025 - All restoration efforts have been completed. Incident mitigation has been confirmed and declared. What happens next?Our team will be completing an internal retrospective to understand the incident in more detail. We will publish a Preliminary Post Incident Review (PIR) within approximately 72 hours, to share more details on what happened and how we responded. After our internal retrospective is completed, generally within 14 days, we will publish a Final Post Incident Review with any additional details and learnings.To get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts .For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs .The impact times above represent the full incident duration, so are not specific to any individual customer. Actual impact to service availability may vary between customers and resources – for guidance on implementing monitoring to understand granular impact: https://aka.ms/AzPIR/Monitoring .Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness

Get notified immediately when third-party services go down

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3904 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook