Outage in Grafana

Grafana Cloud Metrics - Intermittent Write Latency in prod-us-central, prod-us-central-5, and prod-eu-west-0

Resolved Minor

February 25, 2026 - Started about 2 months ago - Lasted 20 days
Official incident page

Incident Report

Summary AI Generated

Grafana Cloud Metrics experienced intermittent write latency spikes in the prod-us-central-0, prod-us-central-5, and prod-eu-west-0 regions due to communication issues with a backend cloud service provider. The issue affected ingestion components and caused delayed write operations for some customers, though not all traffic was impacted. The incident was resolved by changing the connection strategy to the backend cloud service and migrating all tenants back to multi-zone write paths using the more reliable connectivity method.

Since February 19, we have been investigating an intermittent issue causing increased write latency in the prod-us-central-0 and prod-us-central-5 regions. The issue does not affect all traffic but may result in delayed write operations for some customers. Our engineering team is actively working to identify the root cause and stabilize performance. We will share additional updates as progress is made.

Components affected

Grafana GCP Belgium - prod-eu-west-0: Ingestion Grafana GCP Belgium - prod-eu-west-0: Ingestion Grafana EU-WEST: Ingestion Grafana GCP US Central - prod-us-central-0: Ingestion Grafana US-CENTRAL: Ingestion

Trusted by 1,000+ teams

The Status Page Aggregator with Early Outage Detection

Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.

Start Free Trial

No credit card
14-day trial
2-minute setup

Latest Updates ( sorted recent to last )

RESOLVED about 1 month ago - at 03/17/2026 06:22PM

This incident is now resolved.

During the incident the Cloud Metrics platform experienced intermittent latency spikes communicating with a backend cloud service in the prod-us-central-0 and prod-us-central-5 regions. During the incident the internal CSP-facing issue was escalated to a P1. After determining the scope of the latency spikes was limited to only one availability zone, the team mitigated the situation by migrating all write traffic from to the single nearly unaffected availability zone.

As the CSP service team attempted to remedy the situation, the situation became worse and began affecting the previously unaffected zone. Given this, another mitigation path was needed. Changing the connection strategy employed by Cloud Metrics to a different method was deployed to all environments, stabilizing the write path once again as we found the different connection method was more reliable and not affected by these increases in latency.

We have migrated all tenants back to multi-zone write paths and are happy with and confident in the current method of connectivity to the backend cloud service, which is the one we migrated to during the course of the incident. We have no immediate plans to use the previous problematic connectivity method for the foreseeable future.

MONITORING about 1 month ago - at 03/06/2026 09:44PM

We are rolling out a mitigation across the environments in these regions, and preemptively where possible to ensure it doesn’t spread elsewhere.

MONITORING about 1 month ago - at 03/06/2026 08:53PM

We have seen an increase in latency in our cloud providers services, and are rolling out a change to mitigate the issue. We are monitoring.

MONITORING about 1 month ago - at 03/05/2026 10:22PM

We are continuing to investigate this issue alongside the CSP, and have taken steps to escalate through the appropriate channels. The mitigation in place continues to work as expected, and any notable updates will continue to be shared here for tracking.

MONITORING about 2 months ago - at 02/27/2026 10:05PM

We are continuing to investigate this issue alongside the CSP. Any notable updates will continue to be shared here for tracking.

MONITORING about 2 months ago - at 02/27/2026 02:55PM

We've implemented mitigation in place and are continuing to monitoring and investigating this issue.

INVESTIGATING about 2 months ago - at 02/26/2026 04:23PM

We have begun rolling out mitigation steps to reduce write latency in the prod-us-central-0 and prod-us-central-5 regions. While these measures are expected to improve performance, we are continuing to investigate the underlying root cause of the issue. We will provide additional updates as more information becomes available.

INVESTIGATING about 2 months ago - at 02/25/2026 07:54PM

Latest Grafana outages

Query Caching - Degraded Performance - 1 day ago

Issues on Stack creation - 2 days ago

Degraded Ticket Visibility in Support System - 3 days ago

K6 Sporadic DNS Issues - 5 days ago

Grafana Cloud Logs - Write degradation in us-east-3 - 8 days ago

The Status Page Aggregator with Early Outage Detection

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 6320 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook