Outage in Scale AI

ML Pod Startup Failures

Resolved Minor
March 30, 2023 - Started over 2 years ago
Official incident page

Incident Report

There was an outage with a service that manages the metadata for models from 7:06am PT to 7:50am PT. It prevented model services from starting up new pods. ML services that received security updates, deployed a new version, or attempted to scale up were impacted. There was an additional 8 to 12 minutes of service degradation after restoring the impacted ML services to catch up on missed requests.

Need to monitor Scale AI outages?

One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.

Latest Updates ( sorted recent to last )
RESOLVED over 2 years ago - at 03/30/2023 05:13PM

There was an outage with a service that manages the metadata for models from 7:06am PT to 7:50am PT. It prevented model services from starting up new pods. ML services that received security updates, deployed a new version, or attempted to scale up were impacted. There was an additional 8 to 12 minutes of service degradation after restoring the impacted ML services to catch up on missed requests.

Latest Scale AI outages

Degraded performance - 10 months ago
Site Outage - over 1 year ago
Nucleus Degraded Performance - over 1 year ago
Cloudflare Outage - almost 2 years ago
Donovan Web Application Outage - almost 2 years ago

The Status Page Aggregator Built for IT Teams

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4522 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook