Outage in Google Cloud

Vertex AI custom training jobs failing if using more than 2GB ephemeral storage

Resolved Minor
August 16, 2024 - Started about 1 year ago - Lasted about 4 hours

Incident Report

Summary: Vertex AI custom training jobs failing if using more than 2GB ephemeral storage Description: Mitigation work is currently underway by our engineering team. We do not have an ETA for mitigation at this point. We will provide more information by Friday, 2024-08-16 17:00 US/Pacific. Diagnosis: Custom Vertex AI training jobs running on GKE and using more than 2GB of ephemeral storage may fail with the error ""Pod ephemeral local storage usage exceeds the total limit of containers 2Gi." Workaround: None at this time.

Need to monitor Google Cloud outages?

One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.

The Status Page Aggregator Built for IT Teams

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4522 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook