Multiple services recovering after power/cooling issue - Australia East
Resolved
Minor
August 30, 2023 - Started about 1 year ago
- Lasted 1 day
Need to monitor Azure outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Azure, and never miss an outage again.
Start Free Trial →
Outage Details
Impact Statement: Starting at approximately 08:30 UTC on 30 August 2023, a utility power surge in the Australia East region tripped a subset of the cooling units offline in one datacenter, within one of the Availability Zones. While working to restore cooling, temperatures in the datacenter increased so we proactively powered down a small subset of selected compute and storage scale units, to avoid damage to hardware. Multiple downstream services were impacted, with targeted communications being distributed via Azure Service Health.Current Status: Storage infrastructure has recovered. A subset of services still experiencing residual impact are on the path to mitigation.Mitigation: We worked on recovering the failed cooling units and reducing the overall temperature within the impacted area. Once temperature levels were within operational thresholds, we began to restore power to the affected infrastructure and started a phased process to bring this infrastructure back online. Once storage infrastructure was fully restored, dependent compute scale units were then also restored to operation. As the underlying compute and storage scale units became healthy, compute and other dependent Azure services recovered. While we have broadly recovered, a small subset of services are still working on post recovery checks, and we are closely monitoring the datacenter metrics for storage and compute resources to ensure they continue to show as healthy. For any residual customers with services still in the recovery process, we will communicate directly to them through Service Health in the Azure portal, which also triggers Service Health alerts.