Outage in One Signal

We are experiencing delays with activating and processing Journeys

Resolved Major
November 26, 2024 - Started 26 days ago - Lasted 1 day
Official incident page

Need to monitor One Signal outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including One Signal, and never miss an outage again.
Start Free Trial

Outage Details

We are investigating an increased lag while processing the state of journeys. This is impacting the ability to active new journeys and existing journeys will experience a delay
Components affected
One Signal Offline Job Processing
Latest Updates ( sorted recent to last )
RESOLVED 24 days ago - at 11/27/2024 10:18PM

This incident has been resolved. We will continue to monitor our systems closely.

MONITORING 24 days ago - at 11/27/2024 08:51PM

On November 26th and 27th, customers may have experienced issues creating journeys or delays in messages being sent from journeys.

At this time, all Journeys functionality is restored and your messages are being sent in real-time once again.

My team and I sincerely apologize for any disruption this may have caused. We understand the critical role this service plays in your business, especially during this busy holiday season, and we are committed to providing reliable and uninterrupted service.

We're conducting a thorough post-mortem analysis of the incident. Here are some current insights into the issue and the steps we’re taking to prevent it from occurring again:

**Root Cause:**
On Tuesday, Nov 26th, one of our primary Journey data stores encountered an issue during a planned scaling operation in preparation for Black Friday. As a result, some Journeys failed to launch, and processing was delayed.
On Wednesday, Nov 27th, we experienced a separate incident related to fanning out updates to a large number of subscriptions under a single user record. To mitigate this issue, we blocked the problematic user records. Subsequently, the system began to recover, and services gradually returned to normal operations.

Our engineering team has now successfully scaled all services to accelerate recovery and has restored the full functionality of Journeys.

Measures our team has taken to prevent further disruption during this holiday period:

Proactive User Record Management: Implemented measures to proactively prevent user records from exceeding a very large number of subscriptions.

Enhanced Monitoring and Alerting: Increased the sensitivity of our monitoring and alerting systems for critical Journey services, lowering the paging threshold to expedite response times for potential issues.

Scaled Infrastructure: Maintained a scaled and over-provisioned infrastructure throughout the US Thanksgiving holiday week to accommodate increased traffic and ensure optimal performance.

Increased On-Call Support: Assigned additional engineers to on-call duty during the week to provide immediate support and address any potential issues that may arise.

Infrastructure Update Moratorium: Temporarily restricted any non-critical infrastructure updates during the week to minimize the risk of unintended disruptions to the Journey service.

Thank you for your understanding and patience.

MONITORING 24 days ago - at 11/27/2024 07:41PM

Less than 1% of users in any given Journey will experience 5m+ delay in processing, all other operations have resumed as normal.

We are still monitoring current progress and still consider this issue to be open. If progress continues as it currently is, this incident will be fully resolved within the hour.

Customers are good to activate their Journeys and resume business as usual.

MONITORING 24 days ago - at 11/27/2024 07:29PM

Lag across a majority of partitions in Journeys is near realtime, with a single partition still experiencing lag greater than 5m.

This means that a fraction of the users in a Journey will experience processing delays, but majority of customers should be able to use Journeys as normal now. Specifically, right now, it’s only ~10% of users will experience 5+ mins of latency in journeys.

MONITORING 25 days ago - at 11/27/2024 06:51PM

Processing of backlog for jobs is up and we are working through the backlog. Team is monitoring closely, some Journeys are able to be activated at this point, but processing for Journeys is still delayed.

MONITORING 25 days ago - at 11/27/2024 05:12PM

We have identified other issues and have applied more fixes. We are currently monitoring its progress closely and adjusting as we go.

MONITORING 25 days ago - at 11/27/2024 09:27AM

We are continuing to monitor for any further issues.

MONITORING 25 days ago - at 11/27/2024 01:01AM

We have applied a fix and are currently monitoring the results.

IDENTIFIED 25 days ago - at 11/27/2024 12:05AM

We are continuing to work on a fix for this issue.

IDENTIFIED 25 days ago - at 11/26/2024 08:34PM

We are continuing to work on a fix for this issue.

IDENTIFIED 25 days ago - at 11/26/2024 07:35PM

We are continuing to work on a fix for this issue.

IDENTIFIED 26 days ago - at 11/26/2024 03:36PM

The issue has been identified and a fix is being implemented.

INVESTIGATING 26 days ago - at 11/26/2024 01:53PM

We are continuing to investigate this issue.

INVESTIGATING 26 days ago - at 11/26/2024 12:47PM

We are continuing to investigate this issue.

INVESTIGATING 26 days ago - at 11/26/2024 10:48AM

We are investigating an increased lag while processing the state of journeys. This is impacting the ability to active new journeys and existing journeys will experience a delay

Latest One Signal outages

CMS Outage - 2 days ago
SMS Outage - 2 days ago
Active Journey Incident - about 1 month ago

Be the first to know when One Signal and other third-party services go down

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3278 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime