Grafana Outage History

Minor March 13, 2026

March 2026: Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)

Detected Mar 13, 2026 6:28 AM EDT · Resolved Mar 18, 2026 3:16 AM EDT · Duration 5 days

Grafana Cloud Logs experienced degraded log ingestion in the Azure Netherlands (eu-west-3) cluster due to issues with the Loki write path. The incident affected customers' ability to ingest logs into that specific regional cluster for approximately 117 hours. The issue was resolved after the engineering team worked with their cloud service provider to restore stability and confirmed normal operations.

Major March 13, 2026

March 2026: Increased number of Aborted-by-Systems with a k6 binary building errors

Detected Mar 13, 2026 3:41 AM EDT · Resolved Mar 13, 2026 2:14 PM EDT · Duration about 11 hours

Grafana experienced an increased number of "Aborted-by-Systems" errors related to k6 binary building that was blocking some customers from using the service. The issue, which first occurred on March 9th, was identified and resolved after 10.6 hours with a fix that was monitored before full resolution.

Major March 11, 2026

March 2026: Rule Evaluation Outage in prod-us-west-0

Detected Mar 11, 2026 1:10 PM EDT · Resolved Mar 13, 2026 2:18 PM EDT · Duration 2 days

Grafana experienced a major outage affecting rule evaluation for customers in the prod-us-west-0 (AWS US West) region that lasted 49.1 hours. The incident impacted the ability for affected customers to properly evaluate their monitoring rules and alerts. The issue was resolved after implementing a fix and monitoring the results over approximately 2 days.

Minor March 11, 2026

March 2026: Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)

Detected Mar 11, 2026 4:31 AM EDT · Resolved Mar 12, 2026 12:32 PM EDT · Duration 1 day

Grafana Cloud Logs experienced write path issues in the Azure Netherlands (eu-west-3) cluster, causing degraded log ingestion and Faro performance impact in that region. The incident affected the prod-eu-west-3 cluster components and lasted 32 hours before being resolved. The engineering team worked to restore service throughout the incident period.

Major March 10, 2026

March 2026: Various Issues with HG Pages

Detected Mar 10, 2026 2:06 PM EDT · Resolved Mar 10, 2026 3:21 PM EDT · Duration about 1 hour

Grafana experienced a major service incident affecting HG pages across all regions and cloud providers globally, including US, EU, Australia, Brazil, Canada, Germany, India, Singapore, Sweden, Netherlands, Belgium, and UK deployments. The issues impacted multiple components spanning AWS, Azure, and GCP infrastructure along with the play.grafana.org service. The incident was resolved after 1.2 hours of investigation and remediation by the engineering team.

Major March 10, 2026

March 2026: Some Write Failures in prod-eu-west-3.

Detected Mar 10, 2026 2:00 PM EDT · Resolved Mar 11, 2026 5:51 PM EDT · Duration 1 day

Grafana experienced elevated write failures affecting a subset of users in the prod-eu-west-3 (Azure Netherlands) region, with data ingestion impacted while read operations remained unaffected. The incident lasted 27.8 hours, during which users experienced intermittent transient write failures. The issue was resolved after implementing a fix and establishing monitoring safeguards to prevent recurrence.

Minor March 10, 2026

March 2026: Service degradation on Logs Read path in AWS US West (us-west-0)

Detected Mar 10, 2026 11:26 AM EDT · Resolved Mar 10, 2026 4:41 PM EDT · Duration about 5 hours

Grafana's Loki log reading service experienced degradation in the AWS US West region, causing customers to encounter timeouts and 5xx errors when querying logs. The issue was a reoccurrence of previous problems that began around 17:15 UTC on the 9th and lasted approximately 5.3 hours before being resolved on March 10th at 20:39 UTC.

Minor March 9, 2026

March 2026: Metrics write path outage in prod-us-central-0 and prod-us-central-5

Detected Mar 9, 2026 2:03 PM EDT · Resolved Mar 10, 2026 5:17 PM EDT · Duration 1 day

Grafana's metrics write path experienced elevated latency and error rates in the prod-us-central-0 and prod-us-central-5 regions during two separate periods: 15:30-15:45 UTC and 16:53-17:03 UTC on March 9, 2026. The incident affected the GCP US Central prod-us-central-0 component and was resolved on March 10, 2026 at 21:17 UTC.

Minor March 9, 2026

March 2026: Fleet Managment Elevanted Rate of Errors

Detected Mar 9, 2026 10:20 AM EDT · Resolved Mar 10, 2026 4:57 PM EDT · Duration 1 day

Grafana's Fleet Management service experienced elevated error rates when users attempted to fetch configurations in the prod-us-central-0 region. The issue affected some users' ability to retrieve configurations properly during the 30.6-hour incident period. Engineering teams investigated and resolved the problem, restoring normal service functionality.

Minor March 8, 2026

March 2026: Service degradation on Logs Read path in AWS US West (us-west-0)

Detected Mar 8, 2026 10:17 AM EDT · Resolved Mar 8, 2026 4:31 PM EDT · Duration about 6 hours

Grafana experienced service degradation on the Loki logs read path in AWS US West starting around 13:25 UTC, causing timeouts and 5xx errors when customers attempted to query logs. The incident affected the prod-us-west-0 cluster and lasted approximately 6.2 hours before being resolved. Services began stabilizing around 16:35 UTC with continued monitoring, and the issue was considered fully resolved by 20:31 UTC after observing sustained stability.