Use cases
Software Products E-commerce MSPs Schools Development & Marketing DevOps Agencies Help Desk
Company
Internet Status Blog Pricing Log in Get started free

Azure Databricks Outage History

Every past Azure Databricks outage tracked by IsDown, with detection times, duration, and resolution details.

There were 180 Azure Databricks outages since January 2023. The 80 outages from the last 12 months are summarized below, with incident details, duration, and resolution information.

Minor June 5, 2026

June 2026: ES-1964129

Detected Jun 5, 2026 3:30 PM EDT · Resolved Jun 5, 2026 4:10 PM EDT · Duration 40 minutes

Azure Databricks experienced a service incident affecting workspace management operations, where customers encountered failures and errors when attempting to create, update, or delete Databricks workspaces. The issue impacted the User Interface component and prevented workspace CRUD operations from completing successfully, with operations either failing outright or returning error responses. The incident lasted 40 minutes and was resolved by the engineering team without requiring any customer action.

Minor May 29, 2026

May 2026: ES-1946417

Detected May 29, 2026 5:12 PM EDT · Resolved May 29, 2026 5:44 PM EDT · Duration 32 minutes

Azure Databricks experienced a compute service issue that lasted 32 minutes. The incident affected the Databricks compute functionality and was classified as minor severity. The service team actively investigated the problem, though the resolution details are not specified in the available information.

Minor May 29, 2026

May 2026: ES-1945162

Detected May 29, 2026 2:22 AM EDT · Resolved May 29, 2026 2:20 PM EDT · Duration about 12 hours

Azure Databricks experienced a 12-hour service disruption starting at 05:25 UTC on May 29, 2026, caused by underlying Azure cloud provider infrastructure issues. The incident affected Classic Compute, Serverless Compute, Databricks Apps, Unity Catalog, and Jobs Service, with customers experiencing cluster launch failures, job execution failures, and degraded application availability. Services showed gradual recovery throughout the incident as Databricks worked with Azure to resolve the underlying infrastructure problems.

Minor May 26, 2026

May 2026: ES-1940071

Detected May 26, 2026 3:05 PM EDT · Resolved May 26, 2026 5:13 PM EDT · Duration about 2 hours

Azure Databricks experienced an issue with the Jobs Service affecting Lakeflow Spark Declarative Pipelines, causing pipeline start and restart failures along with pipeline updates terminating shortly after launch. The incident lasted 2.1 hours and was classified as minor severity. Engineering teams actively investigated the root cause while recommending that customers manually stop and restart pipelines as a workaround.

Minor May 21, 2026

May 2026: ES-1931399

Detected May 21, 2026 3:00 AM EDT · Resolved May 21, 2026 4:20 AM EDT · Duration about 1 hour

Azure Databricks experienced a service issue affecting the Compute Service for 1.3 hours, causing cluster start failures, job run failures, and degraded availability across Classic Compute, Serverless Compute, and Jobs. Customers encountered errors when launching or running workloads, with compute provisioning failures preventing normal operations. The cause was identified and recovery was being monitored across all impacted services.

Minor May 13, 2026

May 2026: ES-1916375

Detected May 13, 2026 8:12 PM EDT · Resolved May 14, 2026 4:25 AM EDT · Duration about 8 hours

Azure Databricks experienced a platform infrastructure issue in the West Central US region lasting 8.2 hours, affecting multiple services including Databricks SQL, compute clusters, jobs, Unity Catalog, and workspace management operations. Customers reported cluster creation failures, SQL warehouse query failures, job execution issues, Unity Catalog errors, and unresponsive workspace UI. The engineering team identified the root cause and applied mitigations, with most services recovering by the end of the incident.

Minor May 5, 2026

May 2026: ES-1895876

Detected May 5, 2026 3:37 AM EDT · Resolved May 5, 2026 6:54 AM EDT · Duration about 3 hours

Azure Databricks Classic Compute clusters failed to start or experienced extended initialization times when init scripts were configured, affecting multiple regions from 03:33 UTC on May 5, 2026. The issue was caused by an outage at an external Ubuntu package repository that clusters depend on for initialization. The external provider restored service by 08:06 UTC, resolving the 3.3-hour incident after some intermittent recovery periods.

Minor May 3, 2026

May 2026: ES-1893276

Detected May 3, 2026 6:21 AM EDT · Resolved May 3, 2026 6:45 AM EDT · Duration 24 minutes

Azure Databricks experienced an issue with the Lakebase service that caused delays in project startup. The incident was classified as minor and lasted 24 minutes. The service team actively investigated the issue and provided regular updates during the resolution process.

Minor April 28, 2026

April 2026: ES-1885183

Detected Apr 28, 2026 1:48 PM EDT · Resolved Apr 28, 2026 2:32 PM EDT · Duration 44 minutes

Azure Databricks experienced a 44-minute service incident affecting the Jobs Service, specifically impacting Lakeflow Jobs and Lakeflow Spark Declarative Pipelines in Azure Government regions. Users experienced job runs failing to start or remaining pending, pipeline updates failing to execute, and unexpected interruptions to active workloads. The incident began at 17:21 UTC on April 28, 2026, with Databricks engineering actively investigating the issue.

Minor April 27, 2026

April 2026: ES-1852958

Detected Apr 27, 2026 4:04 PM EDT · Resolved Apr 27, 2026 7:17 PM EDT · Duration about 3 hours

Azure Databricks experienced a 3.2-hour incident affecting Serverless Compute resources in the East US 2 region, starting at 18:43 UTC on April 27, 2026. Customers observed serverless workloads failing to start or execute, compute requests returning errors or timing out, and degraded performance for notebooks and pipelines using Serverless Compute. The root cause was identified and remediation was coordinated with the cloud provider to restore service.