Outage in Azure

Mitigated - Storage latency, timeouts, or HTTP 500 errors in South Central US

Resolved Minor
December 26, 2024 - Started 7 days ago - Lasted 1 day

Need to monitor Azure outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Azure, and never miss an outage again.
Start Free Trial

Outage Details

What happened?Between 18:44 UTC on 26 December 2024 and 19:30 UTC on 27 December 2024, a power incident in a single availability zone in the South Central US region impacted the availability for multiple Azure services, including: Service Bus, Log Analytics, Logic Apps, Azure Firewall, Azure Storage, Azure Application Gateway, Virtual Machines, Azure Cosmos DB, SQL DB, Postgres SQL, Azure Synapse Analytics, Azure Data Factory, Azure IoT Hub and App Services. Customers who have multiple availability zones configuration set up for these services in the region were not impacted due to this incident.  What went wrong and why?We identified that the incident was caused by a power failure in one of the colos in a single availability zone in the South Central US region, which led to a temporary loss of service availability for several Azure services with dependencies on the impacted infrastructure. The underlying power issue was identified and then isolated to allow the process of power restoration to begin. Power has been restored at 20:43 UTC. As part of this process additional cycles were required to fully restore power to a small number of the originally impacted nodes whose hardware was impacted due to the power incident. During this incident, we also identified two related issues which impacted a set of dependent services, this included: impact to a set of hardware network devices in the region; and impact to a networking service resulted in timeouts to a set of Virtual Machines. After recovering remaining impacted services, the incident was marked as mitigated at 19:30 UTC on 27 December 2024.   How did we respond?18:44 UTC on 26 December 2024 - Incident auto-detected with cause identified.20:30 UTC on 26 December 2024 - Power issue identified and isolated. Service recovery process underway.20:43 UTC on 26 December 2024 - Power was restored. Service recovery process underway.22:30 UTC on 26 December 2024 - Set of network devices impacted due to incident. Some dependent services are recovering.00:30 UTC on 27 December 2024 - Network devices continue recovery.02:30 UTC on 27 December 2024 - Network device recovered; additional, related, networking issue identified impacting dependent services.08:30 UTC on 27 December 2024 - Ongoing mitigation of additionally impacted services.13:00 UTC on 27 December 2024 - Mitigation to most affected services confirmed.19:30 UTC on 27 December 2024 - Mitigation has been confirmed. Telemetry shows load under normal operational thresholds. If you have a pre-configured failover option and failed over to a different Availability Zone or Region, it is now safe to fail back to this region. What happens next?We are still recovering some impacted instances of the affected services for which customers will continue to receive communications with the Azure Portal (https://aka.ms/ash-alerts.).Our team will be completing an internal retrospective to understand the incident in more detail. After our internal retrospective is completed, generally within 14 days, we will publish a Final Post Incident Review with any additional details and learnings.To get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alerts.For more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRs.Finally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness
source IsDown Possible Outage Indicated by User Reports Reports started about 1 hour before official outage was reported

Need to know when vendors go down? You’re in the right place

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3279 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Never again lose time looking in the wrong place

14-day free trial · No credit card required · No code required