Outage in Microsoft Azure

Mitigated - Managed Identity - Australia East, Australia Southeast

Resolved Minor

July 12, 2024 - Started over 1 year ago - Lasted about 7 hours

Incident Report

What happened?Between at 00:55 UTC on 12 Jul 2024 and 06:28 UTC on 12 Jul 2024, you have been identified among a subset of customers using Managed Service Identity (MSI) for Azure resources who may experience failures when requesting tokens for managed identities associated with Virtual Machines or Virtual Machine Scale Sets, Windows Virtual Desktop, Azure Databricks and any other Azure service that relies on MSI. What do we know so far?We identified that a configuration change introduced in a recent deployment had caused this issue. We had to roll back the new change and restart to the last known good build.How did we respond?00:55 UTC on 12 July 2024 – Customer impact began.01:10 UTC on 12 July 2024 – Service monitoring detected decreasing availability on some storage scale units in the region.01:14 UTC on 12 July 2024 – our team engaged and started the investigation.02:58 UTC on 12 July 2024 – Recent configuration change was identified and we started a deployment to roll back the change.06:05 UTC on 12 July 2024 – We completed rolling back on one Availability Zone (AZ) and verified that our telemetry looks good on this AZ and started with other AZs. We also failed over the other availability zones where we were seeing signs of impact.06:23 UTC on 12 July 2024 – Service started to recover and customers should start seeing recovery at this point of time. We continue to apply recovery operations and monitoring recovery.06:28 UTC on 12 July 2024 – Rollback completed, and service showed full recovery from platform side. (customers may benefit recycling service if they are not fully mitigated) What happens next?Our team will be completing an internal retrospective to understand the incident in more detail. We will publish a Preliminary Post Incident Review (PIR) within approximately 72 hours, to share more details on what happened and how we responded. After our internal retrospective is completed, generally within 14 days, we will publish a Final Post Incident Review with any additional details and learnings.The impact times above represent the full incident duration, so are not specific to any individual customer. Actual impact to service availability varied between customers and resources – for guidance on implementing monitoring to understand granular impact: https://aka.ms/AzPIR/MonitoringTo get notified when that happens, and/or to stay informed about future Azure service issues, make sure that you configure and maintain Azure Service Health alerts – these can trigger emails, SMS, push notifications, webhooks, and more: https://aka.ms/ash-alertsFor more information on Post Incident Reviews, refer to https://aka.ms/AzurePIRsFinally, for broader guidance on preparing for cloud incidents, refer to https://aka.ms/incidentreadiness

Components affected

Microsoft Azure Australia East Microsoft Azure Virtual Machines Microsoft Azure Virtual Machine Scale Sets Microsoft Azure Azure Databricks Microsoft Azure Australia Southeast Microsoft Azure Virtual Machines Microsoft Azure Virtual Machine Scale Sets Microsoft Azure Azure Databricks

Need to monitor Microsoft Azure outages?

Monitor all your external dependencies in one place
Get instant alerts when outages are detected
Be the first to know if service is down
Show real-time status on private or public status page
Keep your team informed

Start monitoring for free

Latest Microsoft Azure outages

Active – Azure SQL Managed Instance intermittent start and failover Issues in multiple regions - 1 day ago

Active – Azure SQL Managed Instance impacting event in multiple regions. - 1 day ago

Alert for Azure Communication Services in Australia East and Southeast Asia - 15 days ago

Partial Recovery of ongoing Service Degradation in West US region - 20 days ago

Ongoing Service Degradation in West US region - Partial Recovery Observed - 20 days ago

The Status Page Aggregator with Early Outage Detection

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 6020 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook