Grafana experienced degraded Prometheus writes in the prod-eu-west-3 region starting at 08:45Z, which later expanded to impact Logs and Synthetic Monitoring services. The incident affected ingestion, API, and public probes, causing errors in check execution metrics, potential missed alerts for Synthetic Monitoring, and gaps in recording rules for Logs due to delayed remote writes to Mimir. The issue was resolved after 27.7 hours with a fix implemented by the engineering team.
This incident has been resolved.
This is also now impacting Logs and Synthetic Monitoring in prod-eu-west-3.
For Synthetic Monitoring, users might observe errors pushing check execution metrics, and this can eventually lead to missing data.
In addition, users might observe errors evaluating Synthetic Monitoring provisioned alert rule evaluations, and this can lead to missed alerts.
For Logs, there is no immediate impact on alerts, however, remote writes to Mimir is delayed which means users may see gaps in their recording rules.
We are moving this back to 'Investigating' as we are now observing a substantial drop in successful ingestion and increase in write path errors, and elevated rule evaluation latency and error. Reads are mostly fine. Our Engineering team is actively investigating this and we will provide further updates as our investigation progresses.
We have not observed any recent errors, but we will continue to monitor while we work with our CSP.
A fix has been implemented and we are monitoring the results.
We are currently experiencing degraded writes for mimir-prod-22 in prod-eu-west-3 since 08:45Z.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6020 services available
Integrations with