Need to monitor One Signal outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including One Signal, and never miss an outage again.
Start Free Trial
This incident has been resolved. We will continue to monitor our systems closely.
On November 26th and 27th, customers may have experienced issues creating journeys or delays in messages being sent from journeys.
At this time, all Journeys functionality is restored and your messages are being sent in real-time once again.
My team and I sincerely apologize for any disruption this may have caused. We understand the critical role this service plays in your business, especially during this busy holiday season, and we are committed to providing reliable and uninterrupted service.
We're conducting a thorough post-mortem analysis of the incident. Here are some current insights into the issue and the steps we’re taking to prevent it from occurring again:
**Root Cause:**
On Tuesday, Nov 26th, one of our primary Journey data stores encountered an issue during a planned scaling operation in preparation for Black Friday. As a result, some Journeys failed to launch, and processing was delayed.
On Wednesday, Nov 27th, we experienced a separate incident related to fanning out updates to a large number of subscriptions under a single user record. To mitigate this issue, we blocked the problematic user records. Subsequently, the system began to recover, and services gradually returned to normal operations.
Our engineering team has now successfully scaled all services to accelerate recovery and has restored the full functionality of Journeys.
Measures our team has taken to prevent further disruption during this holiday period:
Proactive User Record Management: Implemented measures to proactively prevent user records from exceeding a very large number of subscriptions.
Enhanced Monitoring and Alerting: Increased the sensitivity of our monitoring and alerting systems for critical Journey services, lowering the paging threshold to expedite response times for potential issues.
Scaled Infrastructure: Maintained a scaled and over-provisioned infrastructure throughout the US Thanksgiving holiday week to accommodate increased traffic and ensure optimal performance.
Increased On-Call Support: Assigned additional engineers to on-call duty during the week to provide immediate support and address any potential issues that may arise.
Infrastructure Update Moratorium: Temporarily restricted any non-critical infrastructure updates during the week to minimize the risk of unintended disruptions to the Journey service.
Thank you for your understanding and patience.
Less than 1% of users in any given Journey will experience 5m+ delay in processing, all other operations have resumed as normal.
We are still monitoring current progress and still consider this issue to be open. If progress continues as it currently is, this incident will be fully resolved within the hour.
Customers are good to activate their Journeys and resume business as usual.
Lag across a majority of partitions in Journeys is near realtime, with a single partition still experiencing lag greater than 5m.
This means that a fraction of the users in a Journey will experience processing delays, but majority of customers should be able to use Journeys as normal now. Specifically, right now, it’s only ~10% of users will experience 5+ mins of latency in journeys.
Processing of backlog for jobs is up and we are working through the backlog. Team is monitoring closely, some Journeys are able to be activated at this point, but processing for Journeys is still delayed.
We have identified other issues and have applied more fixes. We are currently monitoring its progress closely and adjusting as we go.
We are continuing to monitor for any further issues.
We have applied a fix and are currently monitoring the results.
We are continuing to work on a fix for this issue.
We are continuing to work on a fix for this issue.
We are continuing to work on a fix for this issue.
The issue has been identified and a fix is being implemented.
We are continuing to investigate this issue.
We are continuing to investigate this issue.
We are investigating an increased lag while processing the state of journeys. This is impacting the ability to active new journeys and existing journeys will experience a delay
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 3278 services available
Integrations with
How much time you'll save your team, by having the outages information close to them?
14-day free trial · No credit card required · Cancel anytime