Grafana Outage History

Minor April 1, 2026

April 2026: AWS integration Degraded Performance

Detected Apr 1, 2026 4:17 PM EDT · Resolved Apr 1, 2026 5:05 PM EDT · Duration about 1 hour

Grafana's AWS integration experienced degraded performance starting around 18:15 UTC, causing scrapes to hit rate limits and resulting in missing data points for serverless integrations. The issue intermittently affected all customers using the AWS integration across all regions. The incident was resolved after 48 minutes.

Minor April 1, 2026

April 2026: Query degradation and possible rule evaluation failure on prod-eu-west-0.cortex-prod-01

Detected Apr 1, 2026 5:56 AM EDT · Resolved Apr 1, 2026 5:13 PM EDT · Duration about 11 hours

Grafana's prod-eu-west-0.cortex-prod-01 metrics cell experienced data ingestion delays that caused partial query results and failed rule evaluations, affecting querying and ingestion services in the EU-WEST region and GCP Belgium. The incident lasted 11.3 hours before being resolved with an implemented fix. Users in the affected regions would have experienced incomplete or failed metric queries and alerting rule failures during this period.

Major March 31, 2026

March 2026: Synthetic Monitoring: Some Check Creations & Updates Might be Blocked.

Detected Mar 31, 2026 10:01 AM EDT · Resolved Mar 31, 2026 10:29 AM EDT · Duration 28 minutes

Grafana's Synthetic Monitoring service experienced a major incident where scripted and browser check creation and updates were blocked through the plugin app across multiple global probe locations. The issue was isolated to the plugin app interface only, with Terraform and direct API access remaining functional. The incident was resolved after 28 minutes once the team identified the root cause and implemented a fix.

Major March 31, 2026

March 2026: Some of the CloudWatch queries are failing

Detected Mar 31, 2026 5:48 AM EDT · Resolved Mar 31, 2026 6:25 AM EDT · Duration 37 minutes

Grafana experienced a major incident where CloudWatch queries were failing across multiple regions including US, EU, Asia-Pacific, and Brazil deployments. The issue affected 26 components globally, impacting users' ability to retrieve monitoring data from AWS CloudWatch. The incident was resolved after 37 minutes of downtime, with monitoring continuing to ensure stability.

Major March 27, 2026

March 2026: Some Grafana Instances Unavailable

Detected Mar 27, 2026 9:36 AM EDT · Resolved Mar 27, 2026 4:50 PM EDT · Duration about 7 hours

Grafana experienced a major outage lasting 7.2 hours that primarily affected Free tier users across all global regions, with impacted users encountering an indefinite "your Grafana instance is loading" message when trying to access their instances. The incident affected numerous AWS, Azure, and GCP regions worldwide, as well as the play.grafana.org service. The team identified the root cause after several hours of investigation, implemented a fix, and fully resolved the issue.

Minor March 24, 2026

March 2026: Prometheus writes, Logs, and Synthetic Monitoring in prod-eu-west-3 are degraded

Detected Mar 24, 2026 5:08 AM EDT · Resolved Mar 25, 2026 8:52 AM EDT · Duration 1 day

Grafana experienced degraded Prometheus writes in the prod-eu-west-3 region starting at 08:45Z, which later expanded to impact Logs and Synthetic Monitoring services. The incident affected ingestion, API, and public probes, causing errors in check execution metrics, potential missed alerts for Synthetic Monitoring, and gaps in recording rules for Logs due to delayed remote writes to Mimir. The issue was resolved after 27.7 hours with a fix implemented by the engineering team.

Major March 23, 2026

March 2026: Grafana Assistant Unavailable in prod-us-east-0

Detected Mar 23, 2026 1:03 PM EDT · Resolved Mar 23, 2026 2:49 PM EDT · Duration about 2 hours

Grafana Assistant was completely unavailable in the prod-us-east-0 region for 1.8 hours, with users encountering a Terms of Service acceptance prompt that failed to function properly. The engineering team identified the root cause and implemented a fix to restore full service functionality.

Major March 20, 2026

March 2026: Authentication API Database Down in prod-eu-west-2 and prod-eu-west-4

Detected Mar 20, 2026 11:00 AM EDT · Resolved Mar 20, 2026 11:41 AM EDT · Duration 40 minutes

Grafana's Authentication API database experienced failures in the prod-eu-west-2 and prod-eu-west-4 regions, with database writes failing while reads remained operational. The incident affected AWS Germany regions and impacted users' ability to perform authentication-related operations that required database writes. The issue was resolved after 40 minutes of downtime.

Major March 19, 2026

March 2026: Various Datasource Issues

Detected Mar 19, 2026 12:46 PM EDT · Resolved Mar 19, 2026 2:45 PM EDT · Duration about 2 hours

Grafana Cloud experienced a major incident affecting multiple datasources including CloudWatch, Aurora, Opensearch, X-Ray, Timestream, Redshift, and Sitewise, causing failures in the Integrations component. The issue was identified and a fix was rolled out, with recovery observed for affected datasources. The incident was resolved after 2 hours of monitoring and remediation efforts.

Major March 19, 2026

March 2026: Degraded performance of Grafana Cloud k6 test runs

Detected Mar 19, 2026 7:17 AM EDT · Resolved Mar 19, 2026 2:13 PM EDT · Duration about 7 hours

Grafana Cloud k6 experienced degraded performance and errors affecting certain v6 API endpoints for 6.9 hours. Customers encountered performance issues and errors when using these endpoints during the incident. The service team investigated and resolved the API endpoint problems.