We're still looking into some spikes in query patterns that have been impacting ClickHouse. We've adjusted some retry behavior in our applications, which has alleviated some of the impact. We will continue monitoring until we have a clearer picture.
We're still seeing intermittent spikes and query failures. We are continuing to investigate the root cause. We're monitoring closely and engineers from multiple teams are working together to stabilize performance.
Problem: A sharp spike in query volume caused a surge in failed queries and high load on database hosts, leading to errors when loading dashboards and running queries.
Impact: Some customers in the US region could not load dashboards or run queries. Event ingestion lag also built up during the incident.
Cause: Still investigating
Steps to resolve: We restarted the affected service, which restored query and dashboard functionality. System metrics show recovery in progress, but we're keeping an eye on it and continuing to investigate.
We’re investigating reports of some queries and dashboards failing to load.
Event ingestion lag is also accumulating. Operators are currently investigating the root cause.
We’re investigating reports of some queries and dashboards failing to load.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6020 services available
Integrations with