OneSignal experienced email delivery delays and failures for 3.1 hours due to an unhealthy notifications API instance that caused 27 delivery jobs to become stuck in a terminating state. The incident began with increased lag in email deliveries around 9am and escalated to completely stuck delivery jobs that were not processing. The team identified the root cause, worked to retry the failed deliveries, and restored normal email delivery functionality.
We have identified a small number of notifications that have been impacted and the delivery jobs have failed. The team is working on retrying those deliveries.
The source of the incident is from one of our notifications api instances became unhealthy, causing 27 jobs to remain in a terminating state, and thus they were not able to retry.
The incident status is being updated to monitoring, and any new deliveries going out should succeed.
The investigation is still ongoing. So far we have 27 delivery jobs that we know are stuck, but we are still still determining the scope of the delays.
We are still investigating the issue. Around 9am this morning, we noticed an increase in lag on email deliveries. That has now transitioned into stuck delivery jobs that are not processing.
The team is investigating a way to get the delivery jobs to continue, but until then expect potential delays and a potential need to resend emails.
We will provide another update in 30 minutes.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6020 services available
Integrations with