One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.
We implemented a mitigation strategy to relieve pressure on the data synchronization queues. We observed that performance returned to normal levels as of 16:56 UTC. We are currently validating a permanent fix and will continue to monitor the service to ensure stability.
We'll provide an update as soon as additional information becomes available.
The platform’s limited capacity to process the exceptionally high volume of inbound traffic is the primary constraint, with the Retry Queue identified as the main bottleneck. The previously planned code change to reduce retry attempts is halted after performance validation revealed that it could introduce an excessive load on the messaging service and risk further instability.
To address this optimally, we are implementing a revised mitigation approach. A new code change is being prepared that selectively routes affected customer messages from the pending queues directly to the unused queue, preserving audit requirements while allowing the primary delivery queues to drain more effectively. In parallel, we are assessing the safety and feasibility of temporarily pausing the Retry Service to prevent reprocessing pressure as backlog reduction continues.
Additional platform optimizations, scaling adjustments, and coordination with third-party messaging service providers are in progress.
We'll provide an update in 120 minutes or sooner if additional information becomes available.
Here’s a brief summary of the incident so far:
We continue to address feature degradation impacting the WhatsApp messaging feature in multiple instances. A subset of customers may experience issues sending messages or initiating new messaging sessions.
A specific service struggled to process a sudden surge in session status events, resulting in downstream pressure and an increased message backlog. We are accelerating recovery by migrating messages to an unused queue, which helps drain the backlog faster while maintaining audit and compliance requirements. Initial testing confirmed this approach is feasible.
In parallel, we are validating code changes that reduce message retry attempts, with deployment targeted within the next two hours following successful testing. An additional change is also being pursued to route a specific message type directly to the unutilized queue, further reducing system pressure. Platform services are currently stable after scaling adjustments, and we are actively monitoring key service metrics.
We'll provide an update in 120 minutes or sooner if additional information becomes available.
We have coordinated with the third-party messaging service provider to address Application Programming Interface (API) throughput limits that were contributing to the issue. We are continuing to closely monitor system performance and work to process the remaining messages.
As part of the remediation steps, we have come up with a configuration change to improve the stability and reliability of the Common Gateway service. We are currently validating the change in a test environment. Once the validation is complete and successful, the same will be implemented in the production environment.
We'll provide an update in 120 minutes or sooner if additional information becomes available.
We continue to address the database communication issue impacting third-party messaging services. Capacity has been increased, optimized session limits and scaled messaging gateway PODs and increased downstream service capacity. The message backlog is stabilising, with steady success rates and no new spikes in failure rates. We have replaced the high CPU service component and continue to monitor platform stability.
We'll provide an update in 120 minutes or sooner if additional information becomes available.
We continue to actively manage recovery following the capacity expansion. Core messaging services remain stable under load, and backlog processing is ongoing.
To protect overall platform health while maintaining forward progress, we have completed a controlled configuration adjustment and are continuing targeted service scaling and optimisation to increase processing throughout.
Due to previously queued traffic, some customers may continue to experience intermittent delays in message delivery while the backlog is progressively cleared. We remain engaged, closely monitoring performance and implementing additional improvements as needed.
We will provide another update within 120 minutes, or sooner as additional information is available.
We have increased messaging capacity and scaled core services, and platform stability remains strong under the higher load.
Backlog processing is actively underway. Due to previously queued traffic, some customers may continue to experience delayed or intermittent message delivery. In parallel, we are implementing additional targeted capacity optimisations to further accelerate recovery ahead of upcoming regional business hours, with continuous monitoring and safeguards in place to protect the platform's health.
We will provide another update within 120 minutes, or sooner if significant progress is achieved.
We completed the increase in service capacity limits and continue to see stable performance across core systems. Messaging services have improved, and recovery is underway. Due to the previously queued traffic, some customers may continue to experience delays or intermittent message delivery while the backlog is progressively cleared. Our teams remain actively engaged, monitoring performance and continuing optimisation to support sustained recovery.
We will provide another update within 90 minutes or sooner as progress continues.
We have now completed the increase to service capacity limits and additional scale-up actions. Core messaging services are stable under the higher load, and the platform is processing traffic at significantly increased throughout.
Customers may continue to experience delays or intermittent failures while the system processes a large backlog of queued messages and active conversations.
We continue to closely monitor recovery progress and will provide another update in 90 minutes or sooner as sustained improvement becomes visible.
We are in the final stages of executing the recovery actions previously outlined. Key production updates are currently being deployed, after which we will complete the next phase of remediation by increasing service capacity limits to better support the current high volume of messages.
Customers may continue to experience intermittent delays in message delivery while these changes are being finalised.
We will provide another update in 90 minutes or sooner if additional information becomes available.
We are actively executing our previously shared two-phase recovery approach. We have completed a significant expansion of service capacity, and additional updates and stabilisation actions remain in progress as changes move safely toward production. In parallel, we are preparing the second phase of remediation, which includes carefully adjusting service limits to better accommodate high-volume scenarios.
Customers may continue to experience intermittent delays while these improvements are being finalised.
We will provide another update in 90 minutes or sooner if additional information becomes available.
We are actively implementing a two-phase recovery approach: first, we are scaling our messaging infrastructure to increase capacity, and second, we are adjusting service limits to better accommodate high-volume scenarios.
We'll provide another update in 60 minutes or sooner if additional information becomes available.
We are continuing our work to scale service capacity and restore normal message delivery.
We'll provide another update in 60 minutes or sooner if additional information becomes available.
We initiated a comprehensive scaling operation to increase service capacity. We’re systematically expanding infrastructure resources and evaluating capacity adjustments to restore normal message delivery performance.
We'll provide another update in 60 minutes or sooner if additional information becomes available.
We are actively scaling our messaging infrastructure to restore full service capacity. Our teams have implemented service scaling measures and continue to closely monitor system performance.
We'll provide another update in 60 minutes or sooner if additional information becomes available.
We are continuing to increase capacity for the affected services following the recent service restarts to remediate the issue.
We'll provide another update in 60 minutes or sooner if additional information becomes available.
We have identified a service that was timing out when processing inbound messages, and completed a service restart. However, we have observed that a small percentage of third-party messages are still failing to receive responses from our systems. We are continuing to investigate the cause while exploring possible mitigation strategies.
We'll provide another update in 60 minutes or sooner if additional information becomes available.
At 03:57 UTC, on December 31, 2025, we became aware of a performance degradation impacting the WhatsApp messaging feature on multiple instances. During this time, a subset of customers may experience issues sending outbound messages or initiating new messaging sessions.
We are actively investigating this issue and will provide an update in 60 minutes or sooner if additional information becomes available.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 5200 services available
Integrations with