Use cases
Software Products E-commerce MSPs Schools Development & Marketing DevOps Agencies Help Desk
Company
Internet Status Blog Pricing Log in Get started free

AWS Databricks Outage History

Every past AWS Databricks outage tracked by IsDown, with detection times, duration, and resolution details.

There were 103 AWS Databricks outages since January 2023. The 44 outages from the last 12 months are summarized below, with incident details, duration, and resolution information.

Minor May 31, 2026

May 2026: ES-1948471

Detected May 31, 2026 2:03 AM UTC · Resolved May 31, 2026 7:40 AM UTC · Duration about 6 hours

AWS Databricks experienced a 5.6-hour service disruption starting at 00:00 UTC on May 31, 2026, affecting Declarative Pipelines and Lakebase functionality within the Jobs Service. Customers encountered pipeline run failures, connectivity errors when querying Lakebase instances, and issues with synced table pipelines completing successfully. The engineering team identified the root cause and worked on a resolution, with a temporary workaround of restarting Lakebase instances to restore connectivity.

Minor May 26, 2026

May 2026: ES-1940071

Detected May 26, 2026 7:01 PM UTC · Resolved May 26, 2026 9:13 PM UTC · Duration about 2 hours

AWS Databricks experienced an issue with their Jobs Service affecting Lakeflow Spark Declarative Pipelines, causing pipeline start and restart failures that would terminate shortly after launch. The incident lasted 2.2 hours and was classified as minor severity. Customers could work around the issue by manually stopping and restarting their pipelines.

Minor May 19, 2026

May 2026: ES-1924741

Detected May 19, 2026 12:57 AM UTC · Resolved May 19, 2026 3:22 AM UTC · Duration about 2 hours

AWS Databricks experienced a service disruption starting at approximately 00:34 UTC on May 19, 2026, affecting Serverless Compute workloads for a subset of customers. Users encountered failures starting or running Serverless Compute jobs, cluster allocation timeouts, and unexpected workload terminations. The incident lasted 56 minutes and affected both the Compute Service and Jobs Service components.

Major May 14, 2026

May 2026: ES-1917659

Detected May 14, 2026 6:05 AM UTC · Resolved May 14, 2026 8:05 AM UTC · Duration about 2 hours

AWS Databricks experienced a major service incident in the us-west-2 region lasting 2 hours, affecting the Workspace UI, Classic Compute, Databricks SQL, AI/BI Dashboards, and MLflow services. Customers encountered errors when opening notebooks, browsing workspace folders, starting clusters, running jobs, loading SQL queries and dashboards, and calling APIs. The cause was identified and mitigation efforts were underway at the time of the last update.

Minor May 8, 2026

May 2026: ES-1903956

Detected May 8, 2026 12:23 AM UTC · Resolved May 9, 2026 2:34 AM UTC · Duration 1 day

AWS Databricks experienced a 7.2-hour service disruption in the us-east-1 region starting May 7, 2026 at 23:59 UTC, caused by an underlying AWS cloud provider incident affecting an availability zone. The outage impacted Unity Catalog operations, Classic Compute clusters, Lakeflow Jobs, Databricks SQL, and Spark Declarative Pipelines, with customers experiencing general service unavailability and job failures. Databricks engineering teams resolved the issue by moving services out of the impacted availability zone, with services gradually returning to normal operation though some customers continued to experience increased latency and intermittent failures during regional stabilization.

Minor May 5, 2026

May 2026: ES-1895876

Detected May 5, 2026 7:41 AM UTC · Resolved May 5, 2026 10:58 AM UTC · Duration about 3 hours

AWS Databricks Classic Compute experienced cluster startup failures across multiple regions for 3.3 hours starting at 03:33 UTC on May 5, 2026, specifically affecting clusters with configured init scripts. The root cause was an outage in an external Ubuntu package repository, which was restored by the external provider at 08:06 UTC. Customers experienced clusters failing to initialize, not starting at all, or extended startup times during the incident.

Major April 27, 2026

April 2026: ES-1882222

Detected Apr 27, 2026 5:40 PM UTC · Resolved Apr 27, 2026 6:44 PM UTC · Duration about 1 hour

AWS Databricks experienced a major incident affecting Classic Compute and Serverless Compute clusters across multiple regions starting at 17:20 UTC on April 27, 2026. Customers encountered cluster start failures, unresponsive existing clusters, and inability to connect notebooks, jobs, and SQL workloads to their clusters. The root cause was identified and remediation efforts were underway as of the last update.

Major April 22, 2026

April 2026: ES-1873238

Detected Apr 22, 2026 9:17 PM UTC · Resolved Apr 22, 2026 10:25 PM UTC · Duration about 1 hour

AWS Databricks experienced a major service incident affecting Unity Catalog in multiple regions (us-west-2, us-west-1, and us-east-1) starting at 20:47 UTC on April 22, 2026. Customers encountered failures when accessing Unity Catalog resources and errors when querying or managing data governed by Unity Catalog, with workload failures for services dependent on Unity Catalog. The incident lasted 1.1 hours with Databricks actively investigating the degradation, though no resolution details were provided in the available updates.

Major April 21, 2026

April 2026: ES-1868544

Detected Apr 21, 2026 8:50 PM UTC · Resolved Apr 21, 2026 9:18 PM UTC · Duration 28 minutes

AWS Databricks experienced a major service incident affecting Unity Catalog in the CA-Central-1 region, where requests failed to process starting at 20:10 UTC on April 21, 2026. Users encountered errors and timeouts when accessing Unity Catalog and failures when managing or querying catalog objects, tables, schemas, or permissions. The engineering team identified the root cause and was actively working on restoration, with the incident lasting 28 minutes.

Minor April 20, 2026

April 2026: ES-1864534

Detected Apr 20, 2026 3:02 PM UTC · Resolved Apr 20, 2026 5:38 PM UTC · Duration about 3 hours

AWS Databricks experienced an issue affecting Lakeflow Spark Declarative Pipelines on Serverless Compute in the EU-West-1 region, starting at 12:13 UTC on April 20, 2026, with error rates significantly increasing at 14:40 UTC. Customers experienced pipeline launch failures, timeouts, and increased launch times for new pipeline runs. The engineering team was actively investigating and implementing remediation steps, with the incident lasting approximately 2.6 hours.