Grafana Outage History

Minor March 7, 2026

March 2026: Outage for prod-eu-central-0 due to AWS S3 outage.

Detected Mar 7, 2026 3:07 PM EST · Resolved Mar 9, 2026 5:00 AM EDT · Duration 1 day

Grafana services in the prod-eu-central-0 region experienced elevated error rates and outages for 36.9 hours due to an AWS S3 outage in that region. Multiple Grafana services were affected by the underlying AWS infrastructure failure. The incident was resolved after AWS S3 recovered and Grafana services returned to normal operation.

Major March 6, 2026

March 2026: Some Grafana Instances Unavailable

Detected Mar 6, 2026 10:03 AM EST · Resolved Mar 6, 2026 11:34 AM EST · Duration about 2 hours

A major issue caused some Grafana instances to become unavailable across multiple regions including US, EU, Australia, Brazil, Canada, Germany, India, Singapore, Sweden, Netherlands, UK, and Belgium, affecting both cloud providers (AWS, Azure, GCP) and the play.grafana.org demo site. The incident lasted approximately 1.5 hours before being resolved by the engineering team.

Major March 5, 2026

March 2026: Write failures in prod-eu-west-0

Detected Mar 5, 2026 5:27 PM EST · Resolved Mar 5, 2026 6:37 PM EST · Duration about 1 hour

A major incident occurred in Grafana's prod-eu-west-0 region starting at 21:05 UTC on March 5, 2026, affecting the data read path and rule execution systems. Customers in the EU-West region experienced write failures and delays in rule evaluation, with both ingestion and querying services impacted across GCP Belgium infrastructure. The incident lasted 10 minutes with engineering teams actively investigating and resolving the issue.

Minor March 4, 2026

March 2026: Elevated rate of errors for Fleet Management in prod-us-central-0

Detected Mar 4, 2026 2:47 AM EST · Resolved Mar 4, 2026 4:31 AM EST · Duration about 2 hours

Grafana's Fleet Management service in the prod-us-central-0 region experienced elevated error rates when users attempted to fetch configurations. The incident lasted 1.7 hours and was resolved after implementing a fix and monitoring the results.

Minor March 3, 2026

March 2026: Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)

Detected Mar 3, 2026 7:07 AM EST · Resolved Mar 5, 2026 1:33 PM EST · Duration 2 days

Grafana Cloud Logs experienced degraded log ingestion in the Azure Netherlands (eu-west-3) cluster starting at 11:55 UTC, with additional impact to Faro performance in the same region. The incident lasted 54.4 hours, during which the engineering team worked with their cloud service provider to implement mitigation efforts that gradually reduced the impact to slight intermittency before achieving full resolution. The service has been fully restored and the incident is now resolved.

Major March 2, 2026

March 2026: Write outage for logs in prod-eu-west-3

Detected Mar 2, 2026 2:37 AM EST · Resolved Mar 2, 2026 10:49 AM EST · Duration about 8 hours

Grafana experienced a write outage for logs in the prod-eu-west-3 region (Azure Netherlands), which escalated from increased write latency to a complete write outage. The incident affected log ingestion capabilities for users in that region for 8.2 hours before being resolved.

Minor February 27, 2026

February 2026: Trace querying issue in all Tempo clusters

Detected Feb 27, 2026 8:46 AM EST · Resolved Feb 27, 2026 6:38 PM EST · Duration about 10 hours

Grafana experienced a trace querying issue across all Tempo clusters where portions of data became temporarily unretrievable, affecting a small percentage of tenants globally. The issue impacted querying functionality across all major cloud regions including AWS, Azure, and GCP deployments. The incident was resolved after 9.9 hours once the team identified the root cause and implemented a fix.

Minor February 27, 2026

February 2026: Incorrect pipeline assignment after custom attributes are assigned

Detected Feb 27, 2026 7:57 AM EST · Resolved Feb 27, 2026 10:28 AM EST · Duration about 3 hours

Grafana experienced issues with incorrect pipeline assignment occurring after custom attributes were assigned to the service. The incident lasted 2.5 hours, during which users likely encountered data processing errors or misrouted information through incorrect pipelines. The issue was identified and resolved with a fix deployed to correct the pipeline assignment logic.

Minor February 26, 2026

February 2026: Grafana Cloud Faro slowness of listing and uploading sourcemaps in all regions.

Detected Feb 26, 2026 8:00 AM EST · Resolved Feb 26, 2026 9:52 PM EST · Duration about 14 hours

Grafana Cloud Faro experienced slowness when uploading and listing sourcemaps across all regions, with the issue particularly impacting users with large sourcemap files. The incident affected 19 regions across AWS, Azure, and GCP platforms for 13.9 hours. The team identified and resolved the root cause, with uploads restored first followed by a complete resolution of the listing timeout issues.

Minor February 25, 2026

February 2026: Grafana Cloud Metrics - Intermittent Write Latency in prod-us-central, prod-us-central-5, and prod-eu-west-0

Detected Feb 25, 2026 2:54 PM EST · Resolved Mar 17, 2026 2:23 PM EDT · Duration 20 days

Grafana Cloud Metrics experienced intermittent write latency spikes in the prod-us-central-0, prod-us-central-5, and prod-eu-west-0 regions due to communication issues with a backend cloud service provider. The issue affected ingestion components and caused delayed write operations for some customers, though not all traffic was impacted. The incident was resolved by changing the connection strategy to the backend cloud service and migrating all tenants back to multi-zone write paths using the more reliable connectivity method.