Use cases
Software Products E-commerce MSPs Schools Development & Marketing DevOps Agencies Help Desk
Company
Internet Status Blog Pricing Log in Get started free

Azure Databricks Outage History

Every past Azure Databricks outage tracked by IsDown, with detection times, duration, and resolution details.

There were 180 Azure Databricks outages since January 2023. The 80 outages from the last 12 months are summarized below, with incident details, duration, and resolution information.

Major April 27, 2026

April 2026: ES-1882222

Detected Apr 27, 2026 1:40 PM EDT · Resolved Apr 27, 2026 2:40 PM EDT · Duration about 1 hour

Azure Databricks experienced a major incident starting at 17:20 UTC on April 27, 2026, affecting Classic Compute and Serverless Compute clusters across multiple regions for approximately 1 hour. Customers encountered cluster start failures, unresponsive existing clusters, and inability to connect notebooks, jobs, and SQL workloads to their clusters. The root cause was identified and remediation efforts were underway as of the last update.

Minor April 24, 2026

April 2026: ES-1877104

Detected Apr 24, 2026 9:57 AM EDT · Resolved Apr 24, 2026 8:50 PM EDT · Duration about 11 hours

Azure Databricks experienced compute service failures in the East US region for 10.9 hours, affecting Classic Compute, Jobs Compute, and later Serverless Compute workloads. Customers encountered cluster launch failures, job run failures during startup, and Serverless workloads failing to start. The incident was caused by an underlying Azure cloud provider issue, with Databricks engineering working directly with Azure to implement mitigations and achieve partial recovery.

Minor April 19, 2026

April 2026: ES-1862841

Detected Apr 19, 2026 6:23 AM EDT · Resolved Apr 19, 2026 11:12 AM EDT · Duration about 5 hours

Azure Databricks customers using Declarative Automation Bundles experienced CI/CD pipeline failures and deployment issues for 4.8 hours due to an expired GPG key from HashiCorp that prevented proper Terraform binary verification. The incident affected multiple components including the User Interface, Databricks SQL, Account Console, Compute Service, and Jobs Service, causing widespread inability to perform automated deployments. The issue was resolved by releasing a fix in Databricks CLI version 0.297.2 that incorporates a new public key to bypass the expired GPG key check.

Minor April 17, 2026

April 2026: ES-1856982

Detected Apr 17, 2026 2:44 AM EDT · Resolved Apr 17, 2026 3:48 AM EDT · Duration about 1 hour

Azure Databricks customers in multiple regions experienced delays or failures when launching clusters and running jobs that used init-scripts dependent on Ubuntu package repositories (archive.ubuntu.com and security.ubuntu.com) starting at 01:15 UTC on April 17, 2026. The issue affected the Compute Service and caused cluster launch failures and job execution delays for workloads relying on these package repositories. The incident lasted 1.1 hours with the team actively investigating and monitoring for full recovery as the issue appeared to be stabilizing.

Minor April 16, 2026

April 2026: ES-1856982

Detected Apr 16, 2026 3:45 AM EDT · Resolved Apr 16, 2026 5:17 AM EDT · Duration about 2 hours

Azure Databricks experienced a compute service issue lasting 1.6 hours where customers across multiple regions encountered cluster launch delays and failures starting at 05:54 UTC on April 16, 2026. Jobs dependent on the affected clusters also failed or experienced delays as a result. The issue appeared to be stabilizing by the end of the incident period with the team monitoring for full recovery.

Minor April 2, 2026

April 2026: ES-1822769

Detected Apr 2, 2026 5:51 PM EDT · Resolved Apr 2, 2026 6:23 PM EDT · Duration 32 minutes

Azure Databricks experienced a 32-minute incident affecting Serverless Compute in the westcentralus region, where customers encountered failures when starting or provisioning compute resources and errors when running workloads. The engineering team identified the root cause and worked with the cloud provider to restore service functionality.

Minor March 28, 2026

March 2026: ES-1810870

Detected Mar 28, 2026 4:39 PM EDT · Resolved Mar 28, 2026 5:11 PM EDT · Duration 32 minutes

Azure Databricks experienced a service incident affecting the Compute Service, Jobs Service, Databricks SQL, and Account Console components. The incident lasted 32 minutes and was classified as minor severity. The service team actively investigated the platform issue, though the specific resolution details were not provided in the available updates.

Minor March 24, 2026

March 2026: ES-1799855

Detected Mar 24, 2026 8:41 PM EDT · Resolved Mar 24, 2026 11:33 PM EDT · Duration about 3 hours

Azure Databricks experienced a compute service outage in the West US 3 region lasting 2.9 hours, affecting Serverless Compute and SQL Warehouses. Customers encountered failures when starting compute clusters, connection timeouts, and SQL Warehouse query execution failures. A mitigation was applied and service levels improved, with the team continuing to monitor for full restoration.

Minor March 24, 2026

March 2026: ES-1798419

Detected Mar 24, 2026 7:46 AM EDT · Resolved Mar 24, 2026 10:39 AM EDT · Duration about 3 hours

Azure Databricks experienced degraded availability with Serverless Compute starting at 11:00 UTC on March 24, 2026, lasting 2.9 hours. Customers encountered cluster launch failures with driver unreachable errors, job execution failures with DRIVER_UNRESPONSIVE errors, clusters stuck in restart loops, and failed new cluster creation during spin-up. The incident was classified as minor and the service was actively investigating the compute service issues.

Minor March 12, 2026

March 2026: ES-1778037

Detected Mar 12, 2026 11:57 PM EDT · Resolved Mar 13, 2026 2:49 AM EDT · Duration about 3 hours

Azure Databricks experienced a compute service issue affecting Classic Compute in the Azure US Gov Virginia region, where customers encountered failures when creating or starting clusters. The incident lasted 2.9 hours and was linked to an underlying cloud provider issue. The service was actively investigating the cluster failures with no customer action required during the incident.