Use cases
Software Products E-commerce MSPs Schools Development & Marketing DevOps Agencies Help Desk
Company
Internet Status Blog Pricing Log in Get started free

AWS Databricks Outage History

Every past AWS Databricks outage tracked by IsDown, with detection times, duration, and resolution details.

There were 99 AWS Databricks outages since January 2023. The 42 outages from the last 12 months are summarized below, with incident details, duration, and resolution information.

Minor May 8, 2026

May 2026: ES-1903956

Detected May 8, 2026 3:23 AM +03 · Resolved May 9, 2026 5:34 AM +03 · Duration 1 day

AWS Databricks experienced a 7.2-hour service disruption in the us-east-1 region starting May 7, 2026 at 23:59 UTC, caused by an underlying AWS cloud provider incident affecting an availability zone. The outage impacted Unity Catalog operations, Classic Compute clusters, Lakeflow Jobs, Databricks SQL, and Spark Declarative Pipelines, with customers experiencing general service unavailability and job failures. Databricks engineering teams resolved the issue by moving services out of the impacted availability zone, with services gradually returning to normal operation though some customers continued to experience increased latency and intermittent failures during regional stabilization.

Minor May 5, 2026

May 2026: ES-1895876

Detected May 5, 2026 10:41 AM +03 · Resolved May 5, 2026 1:58 PM +03 · Duration about 3 hours

AWS Databricks Classic Compute experienced cluster startup failures across multiple regions for 3.3 hours starting at 03:33 UTC on May 5, 2026, specifically affecting clusters with configured init scripts. The root cause was an outage in an external Ubuntu package repository, which was restored by the external provider at 08:06 UTC. Customers experienced clusters failing to initialize, not starting at all, or extended startup times during the incident.

Major April 27, 2026

April 2026: ES-1882222

Detected Apr 27, 2026 8:40 PM +03 · Resolved Apr 27, 2026 9:44 PM +03 · Duration about 1 hour

AWS Databricks experienced a major incident affecting Classic Compute and Serverless Compute clusters across multiple regions starting at 17:20 UTC on April 27, 2026. Customers encountered cluster start failures, unresponsive existing clusters, and inability to connect notebooks, jobs, and SQL workloads to their clusters. The root cause was identified and remediation efforts were underway as of the last update.

Major April 23, 2026

April 2026: ES-1873238

Detected Apr 23, 2026 12:17 AM +03 · Resolved Apr 23, 2026 1:25 AM +03 · Duration about 1 hour

AWS Databricks experienced a major service incident affecting Unity Catalog in multiple regions (us-west-2, us-west-1, and us-east-1) starting at 20:47 UTC on April 22, 2026. Customers encountered failures when accessing Unity Catalog resources and errors when querying or managing data governed by Unity Catalog, with workload failures for services dependent on Unity Catalog. The incident lasted 1.1 hours with Databricks actively investigating the degradation, though no resolution details were provided in the available updates.

Major April 21, 2026

April 2026: ES-1868544

Detected Apr 21, 2026 11:50 PM +03 · Resolved Apr 22, 2026 12:18 AM +03 · Duration 28 minutes

AWS Databricks experienced a major service incident affecting Unity Catalog in the CA-Central-1 region, where requests failed to process starting at 20:10 UTC on April 21, 2026. Users encountered errors and timeouts when accessing Unity Catalog and failures when managing or querying catalog objects, tables, schemas, or permissions. The engineering team identified the root cause and was actively working on restoration, with the incident lasting 28 minutes.

Minor April 20, 2026

April 2026: ES-1864534

Detected Apr 20, 2026 6:02 PM +03 · Resolved Apr 20, 2026 8:38 PM +03 · Duration about 3 hours

AWS Databricks experienced an issue affecting Lakeflow Spark Declarative Pipelines on Serverless Compute in the EU-West-1 region, starting at 12:13 UTC on April 20, 2026, with error rates significantly increasing at 14:40 UTC. Customers experienced pipeline launch failures, timeouts, and increased launch times for new pipeline runs. The engineering team was actively investigating and implementing remediation steps, with the incident lasting approximately 2.6 hours.

Minor April 19, 2026

April 2026: ES-1862841

Detected Apr 19, 2026 1:19 PM +03 · Resolved Apr 19, 2026 6:12 PM +03 · Duration about 5 hours

AWS Databricks experienced a 4.9-hour incident affecting Declarative Automation Bundles for CI/CD deployments, caused by an expired HashiCorp GPG key used for Terraform binary verification. Customers experienced failures in CI/CD pipelines and were unable to perform automated deployments due to GPG checksum verification errors. The issue was resolved by releasing a fix in Databricks CLI version 0.297.2 that incorporates the new public key, with patch versions for older CLI versions being actively developed.

Minor April 17, 2026

April 2026: ES-1856982

Detected Apr 17, 2026 9:56 AM +03 · Resolved Apr 17, 2026 10:48 AM +03 · Duration about 1 hour

AWS Databricks experienced a 52-minute service incident where customers across multiple regions encountered delays or failures when launching clusters and running jobs that used init-scripts dependent on Ubuntu package repositories (archive.ubuntu.com and security.ubuntu.com). The issue affected the Compute Service, causing cluster launch failures and job execution problems for workloads relying on these package dependencies. The incident appeared to stabilize as the team monitored for full recovery, with customers advised to temporarily use alternative mirror repositories as a workaround.

Minor April 16, 2026

April 2026: ES-1856982

Detected Apr 16, 2026 10:49 AM +03 · Resolved Apr 16, 2026 12:17 PM +03 · Duration about 1 hour

AWS Databricks experienced a compute service issue starting at 05:54 UTC on April 16, 2026, causing cluster launch delays and failures across multiple regions for 1.5 hours. Jobs dependent on affected clusters also failed or experienced delays during this period. The issue appeared to be stabilizing by the end of the incident with teams monitoring for full recovery.

Minor April 14, 2026

April 2026: ES-1852558

Detected Apr 14, 2026 9:52 PM +03 · Resolved Apr 14, 2026 11:32 PM +03 · Duration about 2 hours

AWS Databricks experienced compute service failures in the us-gov-west-1 region, affecting Classic Compute and Jobs Compute from 17:50-19:21 UTC and Serverless Compute starting at 19:06 UTC. Customers encountered cluster startup failures and job execution issues due to clusters being unable to start. Classic Compute was fully recovered by 19:21 UTC, while Serverless Compute issues were still being mitigated with an identified root cause.