PostHog experienced event ingestion processing delays for 7.6 hours due to increased latency in the ingestion event queue, causing degraded throughput and accumulating lag of 30-40 minutes. The issue was resolved by upgrading the underlying compute infrastructure for the queueing system to handle increased event throughput. No data was lost during the incident, and the system recovered with full catch-up expected within 3-4 hours after throughput was restored.
Ingestion throughput has recovered and the system is currently working through the backlog.
Current throughput estimates suggest we will be all caught up in the next 3-4 hours.
Operators will be monitoring the system as it fully recovers.
The underlying compute for our queueing system is currently being upgraded to handle the increased event throughput the system is currently experiencing. We expect this process to take another two hours. We will continue experiencing degraded ingestion performance as this upgrade operation is completed.
No data has been lost, our systems are just behind 30-40 minutes.
Latency producing and consuming from the ingestion event queue has increased resulting in degraded throughput of the ingestion pipeline. Lag is accumulating. Operators are investigating the cause. There is no data loss.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6020 services available
Integrations with