Outage in Grafana

Metrics Write Outage in Multiple Cells

Resolved Major
November 17, 2025 - Started 1 day ago - Lasted about 3 hours
Official incident page

Incident Report

We are investigating a partial write outage affecting multiple metrics cells, beginning around 19:30 UTC. Some customers may see intermittent write failures or delays, but most requests should succeed after retries and recent metrics may appear late as a result. Querying previously ingested data remains unaffected. Engineering is continuing to investigate and will provide further updates as more information becomes available.

Need to monitor Grafana outages?

One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.

Latest Updates ( sorted recent to last )
MONITORING 1 day ago - at 11/17/2025 09:22PM

We are seeing improvement on the metrics side, with write performance recovering.

We continue to investigate the remaining impact to Synthetic Monitoring and are working to determine the underlying cause.

Monitoring will continue as recovery progresses, and we’ll provide further updates as we learn more.

IDENTIFIED 1 day ago - at 11/17/2025 08:50PM

Our teams have been alerted that Synthetic Monitoring will also be affected by this outage.

Users may see gaps in their Synthetic Monitoring metrics as well as missed alerts as a result of this.

We continue to investigate and will provide further updates as they become available.

IDENTIFIED 1 day ago - at 11/17/2025 08:34PM

We’ve re-evaluated the situation and this issue is still ongoing. Although we initially observed signs of recovery, write errors continue to occur in the affected cells.

Mitigation work is still in progress, and we’re treating the incident as identified again while we work toward a sustained resolution. We’ll provide further updates as we confirm stabilization.

MONITORING 1 day ago - at 11/17/2025 08:11PM

Mitigation has been applied and Mimir write performance is beginning to recover in the affected cells.

prod-us-central-0.cortex-prod-10 appears to have recovered as of 19:52 UTC, and prod-us-central-5.cortex-dedicated-06 is showing signs of recovery as of 20:00 UTC.

We are continuing to monitor both cells closely to ensure the mitigation is effective and that the systems remain stable.

INVESTIGATING 1 day ago - at 11/17/2025 08:02PM

We are investigating a partial write outage affecting multiple metrics cells, beginning around 19:30 UTC. Some customers may see intermittent write failures or delays, but most requests should succeed after retries and recent metrics may appear late as a result.

Querying previously ingested data remains unaffected. Engineering is continuing to investigate and will provide further updates as more information becomes available.

The Status Page Aggregator Built for IT Teams

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4600 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook