Outage in AWS Databricks

ES-1059099

Resolved Minor
February 28, 2024 - Started over 1 year ago - Lasted about 4 hours

Incident Report

We are investigating an issue with one of the Databricks services.

Incident Details:
- Workspace authentication requests may fail or timeout.
- Cluster start/resize/termination requests may fail or time out.
- Jobs relying on cluster start/resize/termination may not execute.
- Jobs submitted through APIs/Schedulers may not execute.
- UI and Databricks SQL queries may time out.
- Users may experience failures launching Databricks Serverless SQL Warehouses.
- Users may not be able to access UC APIs.

Incident Start Time: 18:36 UTC February 28 2024

We will provide an update in the next hour, or as soon as the issue has been identified.

One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.

Try IsDown risk-free 14-day free trial · No credit card required
Latest Updates ( sorted recent to last )
over 1 year ago - at 02/28/2024 06:54PM

We are investigating an issue with one of the Databricks services.

Incident Details:
- Workspace authentication requests may fail or timeout.
- Cluster start/resize/termination requests may fail or time out.
- Jobs relying on cluster start/resize/termination may not execute.
- Jobs submitted through APIs/Schedulers may not execute.
- UI and Databricks SQL queries may time out.
- Users may experience failures launching Databricks Serverless SQL Warehouses.
- Users may not be able to access UC APIs.

Incident Start Time: 18:36 UTC February 28 2024

We will provide an update in the next hour, or as soon as the issue has been identified.

over 1 year ago - at 02/28/2024 07:07PM

We have identified the problem with the Databricks service. Our team is working on a mitigation.

Incident Details:
- Workspace authentication requests may fail or timeout.
- Cluster start/resize/termination requests may fail or time out.
- Jobs relying on cluster start/resize/termination may not execute.
- Jobs submitted through APIs/Schedulers may not execute.
- UI and Databricks SQL queries may time out.
- Users may experience failures launching Databricks Serverless SQL Warehouses.
- Users may not be able to access UC APIs.

Incident Start Time: 18:36 UTC February 28 2024

We will provide an update in the next hour, or as soon as the issue has been mitigated.

over 1 year ago - at 02/28/2024 07:42PM

We are seeing a sign of recovery. Our team is actively monitoring the system to ensure the full mitigation. Rest assured, we are diligently working to maintain this positive trajectory. Thank you for your patience.

Incident Details:
- Workspace authentication requests may fail or timeout.
- Cluster start/resize/termination requests may fail or time out.
- Jobs relying on cluster start/resize/termination may not execute.
- Jobs submitted through APIs/Schedulers may not execute.
- UI and Databricks SQL queries may time out.
- Users may experience failures launching Databricks Serverless SQL Warehouses.
- Users may not be able to access UC APIs.

Incident Start Time: 18:36 UTC February 28 2024

We will provide an update in the next hour, or as soon as the issue has been mitigated.

over 1 year ago - at 02/28/2024 09:25PM

Mitigation has been applied, and The issue has been successfully mitigated, although you may notice some latency. It’s important to note that this latency does not impact production services. Our team continues to monitor the situation closely to ensure optimal performance.

Incident Details:
- Workspace authentication requests may fail or timeout.
- Cluster start/resize/termination requests may fail or time out.
- Jobs relying on cluster start/resize/termination may not execute.
- Jobs submitted through APIs/Schedulers may not execute.
- UI and Databricks SQL queries may time out.
- Users may experience failures launching Databricks Serverless SQL Warehouses.
- Users may not be able to access UC APIs.

Incident Start Time: 18:36 UTC February 28 2024
Incident End Time: 19:08 UTC February 28 2024

We will continue to monitor for continued stability and provide a final update in the next two hours.

Latest AWS Databricks outages

ES-1613032 - 5 days ago
ES-1596201 - 23 days ago
ES-1596190 - 23 days ago
ES-1586721 - about 1 month ago
ES-1570676 - about 1 month ago

The Status Page Aggregator Built for IT Teams

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4522 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook