Use cases
Software Products E-commerce MSPs Schools Development & Marketing DevOps Agencies Help Desk
Company
Internet Status Blog Pricing Log in Get started free

Azure Databricks Outage History

Every past Azure Databricks outage tracked by IsDown, with detection times, duration, and resolution details.

There were 175 Azure Databricks outages since January 2023. The 78 outages from the last 12 months are summarized below, with incident details, duration, and resolution information.

Minor May 13, 2026

May 2026: ES-1916375

Detected May 13, 2026 8:12 PM EDT · Resolved May 14, 2026 4:25 AM EDT · Duration about 8 hours

Azure Databricks experienced a platform infrastructure issue in the West Central US region lasting 8.2 hours, affecting multiple services including Databricks SQL, compute clusters, jobs, Unity Catalog, and workspace management operations. Customers reported cluster creation failures, SQL warehouse query failures, job execution issues, Unity Catalog errors, and unresponsive workspace UI. The engineering team identified the root cause and applied mitigations, with most services recovering by the end of the incident.

Minor May 5, 2026

May 2026: ES-1895876

Detected May 5, 2026 3:37 AM EDT · Resolved May 5, 2026 6:54 AM EDT · Duration about 3 hours

Azure Databricks Classic Compute clusters failed to start or experienced extended initialization times when init scripts were configured, affecting multiple regions from 03:33 UTC on May 5, 2026. The issue was caused by an outage at an external Ubuntu package repository that clusters depend on for initialization. The external provider restored service by 08:06 UTC, resolving the 3.3-hour incident after some intermittent recovery periods.

Minor May 3, 2026

May 2026: ES-1893276

Detected May 3, 2026 6:21 AM EDT · Resolved May 3, 2026 6:45 AM EDT · Duration 24 minutes

Azure Databricks experienced an issue with the Lakebase service that caused delays in project startup. The incident was classified as minor and lasted 24 minutes. The service team actively investigated the issue and provided regular updates during the resolution process.

Minor April 28, 2026

April 2026: ES-1885183

Detected Apr 28, 2026 1:48 PM EDT · Resolved Apr 28, 2026 2:32 PM EDT · Duration 44 minutes

Azure Databricks experienced a 44-minute service incident affecting the Jobs Service, specifically impacting Lakeflow Jobs and Lakeflow Spark Declarative Pipelines in Azure Government regions. Users experienced job runs failing to start or remaining pending, pipeline updates failing to execute, and unexpected interruptions to active workloads. The incident began at 17:21 UTC on April 28, 2026, with Databricks engineering actively investigating the issue.

Minor April 27, 2026

April 2026: ES-1852958

Detected Apr 27, 2026 4:04 PM EDT · Resolved Apr 27, 2026 7:17 PM EDT · Duration about 3 hours

Azure Databricks experienced a 3.2-hour incident affecting Serverless Compute resources in the East US 2 region, starting at 18:43 UTC on April 27, 2026. Customers observed serverless workloads failing to start or execute, compute requests returning errors or timing out, and degraded performance for notebooks and pipelines using Serverless Compute. The root cause was identified and remediation was coordinated with the cloud provider to restore service.

Major April 27, 2026

April 2026: ES-1882222

Detected Apr 27, 2026 1:40 PM EDT · Resolved Apr 27, 2026 2:40 PM EDT · Duration about 1 hour

Azure Databricks experienced a major incident starting at 17:20 UTC on April 27, 2026, affecting Classic Compute and Serverless Compute clusters across multiple regions for approximately 1 hour. Customers encountered cluster start failures, unresponsive existing clusters, and inability to connect notebooks, jobs, and SQL workloads to their clusters. The root cause was identified and remediation efforts were underway as of the last update.

Minor April 24, 2026

April 2026: ES-1877104

Detected Apr 24, 2026 9:57 AM EDT · Resolved Apr 24, 2026 8:50 PM EDT · Duration about 11 hours

Azure Databricks experienced compute service failures in the East US region for 10.9 hours, affecting Classic Compute, Jobs Compute, and later Serverless Compute workloads. Customers encountered cluster launch failures, job run failures during startup, and Serverless workloads failing to start. The incident was caused by an underlying Azure cloud provider issue, with Databricks engineering working directly with Azure to implement mitigations and achieve partial recovery.

Minor April 19, 2026

April 2026: ES-1862841

Detected Apr 19, 2026 6:23 AM EDT · Resolved Apr 19, 2026 11:12 AM EDT · Duration about 5 hours

Azure Databricks customers using Declarative Automation Bundles experienced CI/CD pipeline failures and deployment issues for 4.8 hours due to an expired GPG key from HashiCorp that prevented proper Terraform binary verification. The incident affected multiple components including the User Interface, Databricks SQL, Account Console, Compute Service, and Jobs Service, causing widespread inability to perform automated deployments. The issue was resolved by releasing a fix in Databricks CLI version 0.297.2 that incorporates a new public key to bypass the expired GPG key check.

Minor April 17, 2026

April 2026: ES-1856982

Detected Apr 17, 2026 2:44 AM EDT · Resolved Apr 17, 2026 3:48 AM EDT · Duration about 1 hour

Azure Databricks customers in multiple regions experienced delays or failures when launching clusters and running jobs that used init-scripts dependent on Ubuntu package repositories (archive.ubuntu.com and security.ubuntu.com) starting at 01:15 UTC on April 17, 2026. The issue affected the Compute Service and caused cluster launch failures and job execution delays for workloads relying on these package repositories. The incident lasted 1.1 hours with the team actively investigating and monitoring for full recovery as the issue appeared to be stabilizing.

Minor April 16, 2026

April 2026: ES-1856982

Detected Apr 16, 2026 3:45 AM EDT · Resolved Apr 16, 2026 5:17 AM EDT · Duration about 2 hours

Azure Databricks experienced a compute service issue lasting 1.6 hours where customers across multiple regions encountered cluster launch delays and failures starting at 05:54 UTC on April 16, 2026. Jobs dependent on the affected clusters also failed or experienced delays as a result. The issue appeared to be stabilizing by the end of the incident period with the team monitoring for full recovery.