This incident is now resolved, and normal operations have resumed across all affected services:
— Routing: Fully operational; all data syncing to cloud data lakes/warehouses in real-time
— Third-Party Data & Audience Imports: Fully operational
— Audience & Campaign Insights: Fully operational
— Audience Planning Insights, Live Audience Sizes, and AI Recommendations: Fully operational
— Server-Side Integrations: Fully operational
Most queued data has been successfully processed, with some remaining queued data to be processed over the next few hours.
No data was lost during this incident, but we appreciate that customers experienced delayed data delivery for some of the impacted services.
Client-side segmentation and activation continued to operate fully throughout the incident.
We will continue to monitor this infrastructure closely over the coming days as a precaution. We appreciate your patience throughout this incident.
We are continuing to work on resolving this issue.
Our engineering team is performing additional infrastructure work to improve stability.
To allow the infrastructure to stabilise, some services have been temporarily scaled down again. This is expected and part of the recovery process.
Current impact:
- Routing - A large number of integrations have caught up, but ingestion has been paused again; data delays expected
- Third-Party Data & Audience Imports - Delays in data ingestion; some processing paused
- Audience & Campaign Insights - Available and fully operational
- Audience Planning Insights, LAS & AI Recommendations - Experiencing some volatility in performance. Some features may be temporarily unavailable while the platform stabilises
- Server-Side Integrations - Delays in segment activation data being sent to partners
We will continue to provide updates as the situation progresses.
We have made significant progress overnight. Infrastructure load has reduced significantly, and under-replicated data is back in sync.
Routing is now fully back online and processing its backlog. We are bringing the remaining services back online in a phased approach throughout the day, monitoring system health at each stage.
Topic retention has been extended to ensure no data loss during recovery.
We will provide further updates as recovery progresses.
We are continuing to work on resolving this issue. Our engineering team has taken additional steps to accelerate infrastructure recovery:
- Deployed changes to redirect traffic away from the impacted infrastructure
- Initiated partition leader reassignment across the cluster (currently ~50% complete)
- Applied replication throttling to reduce load on affected nodes
Affected systems and current impact:
- Routing - Severe delays in data delivery to cloud data lakes/warehouses
- Third-Party Data & Audience Imports - Delays in data ingestion and processing
- Live Audience Size (LAS) & Insights - Unreliable numbers in the dashboard due to ingestion delays, including event counts and latest events samples
- Server-Side Integrations - Delays in segment activation data being sent to partners
No data has been lost during this incident. All data is being queued and will be processed once systems have recovered. We are working to restore normal processing as quickly as possible.
We will provide further updates as the situation develops.
Our engineering team has identified the root cause as certain infrastructure components reaching capacity limits.
We have brought additional infrastructure online and are actively migrating data to distribute the load more effectively.
We are seeing improvements on some affected systems and expect continued progress as migrations complete.
We will provide further updates as the situation develops.
We appreciate your patience and apologise for the ongoing disruption.
We are continuing to work on a fix for this issue.
Permutive Routing is currently experiencing severe delays. Customers who use Routing to sync data to their cloud data lake or warehouse will have delays in data delivery since 18th January 9:46 PM UTC.
Relatedly, customers will be experiencing delays in third-party data imports.
We are working to process all data and restore real-time delivery.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 5450 services available
Integrations with