Need to monitor Braze outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Braze, and never miss an outage again.
Start Free Trial
The overwhelming majority of customers across US 01 and US 03 have had their backlogs processed and are back to real-time data processing & message sending. All services are functioning as expected. We are considering this incident resolved.
We apologize for this incident and will provide a detailed Root Cause Analysis (RCA) report soon.
US01 Data Processing, Outbound Messages, and SDK Data Collection are fully operational.
US03 Data Processing and SDK Data Collection is fully operational.
We are still actively processing a backlog of Outbound Messages for a small subset of customers in US03.
US01 Data Processing and SDK Data Collection are fully operational.
We are still actively processing a backlog of Outbound Messages for a small subset of customers in US01.
US03 SDK Data Collection is fully operational.
We are still actively processing a backlog of Outbound Messages for a small subset of customers in US03.
We are still actively processing a backlog of Data Processing jobs in US03.
US08 has been marked as operational. The messaging and data processing backlogs on that cluster have been fully processed, and all other services are operational. We can consider that cluster in a "monitoring" status.
Providing a number of meaningful updates to US01, and US03:
Dashboards and REST API processing are fully operational in both US01 and US03.
SDK Data collection is fully operational in 03, and we are scaling up in US01.
Data Processing and Message Sending are still experiencing sporadic latency as we work through the backlogs, but all health measures are improving rapidly.
US06 has been marked as operational. The messaging and data processing backlogs on that cluster have been fully processed, and all other services are operational. We can consider that cluster in a "monitoring" status.
We are continuing to work on a fix for this issue.
US04 and US05 have been marked as operational. The messaging and data processing backlogs on those clusters have been fully processed, and all other services are operational. We can consider those clusters in a "monitoring" status.
We are actively processing backlogs of both messaging and data across all clusters. Our Database, SRE, and Networking teams are continuing to increase overall throughput as the recovery continues and individual clusters catch back up to real-time.
Currents is operational across all clusters, and has been processing all events as they are cleared from the backlogs.
At this point we have completed both backlogs in US02 and US07. We have also completed the full message sending backlog in US04, and are more than 75% through backlogs in US05 and US06. US01 and US03 are continuing to ramp their pace of recovery. The next update will provide continued status updates on backlog processing and recovery.
At this point, Dashboard access is available for all clusters.
We are processing through the backlog of messages to send and data to process across all clusters.
We'll continue to provide hourly updates.
US02 and US07 have been marked as operational. The messaging and data processing backlogs on those clusters have been fully processed.
On our larger clusters, this will take longer, and we don't yet have a cluster-by-cluster ETA, but we are tracking toward resolution.
We continue to see service restoration across several clusters:
Data Processing and Messaging have resumed in US05, and US07.
We continue to see service restoration across several clusters:
Dashboard services are resumed on US04, US05, US06, US07.
Data Processing and Messaging have resumed in US04.
We are seeing Dashboard access, Data Processing, and Messaging resuming in US02. There is a backlog of work to process, and once it is fully caught up, we will update the status to operational.
We are working through the rest of the US clusters and will provide updates in real-time as we have them.
We continue working to resolve a network issue in our US data centers.
We continue to work through checkout, and our remediation steps are showing success across various services.
Our next update will be in 30 minutes or once we have more detailed information about the resolution.
We continue working to resolve a network issue in our US data centers.
Senior leaders in our Engineering organization have implemented code designed to ensure that Quiet Hours are respected where required, to the extent this feature was properly configured by customers in Campaigns and Canvases, before this incident.
We have completed the restoration of services to a pilot customer successfully, and are now working through restoration across all US Clusters.
Our next update will be in 30 minutes or less.
We continue working to resolve a network issue in our US data centers.
We have no material update since our last post. We continue to work through restoring connectivity to those databases.
Our next update will be in 30 minutes or less.
We are continuing to work to resolve a network issue in our US data centers. As mentioned, the rolling restart of our database containers with Rackspace, our database hosting provider, was completed. We are now working through restoring connectivity to those databases. Senior leaders in our engineering organization are working to ensure that Quiet Hours will be respected in the countries where they are required and as configured in campaigns.
We will provide a full RCA and postmortem once this is resolved.
Our next update will be in 30 minutes or less.
We are continuing to work to resolve a network issue in our US data centers. The rolling restart of our database containers with Rackspace, our database hosting provider, is complete. Services are gradually returning online, and we are currently processing the backlog of data and messages accumulated during the incident.
We will provide a full RCA and postmortem once this is resolved.
Our next update will be in 30 minutes or less.
We are continuing to resolve a network issue in our US data centers. The rolling restart of database containers with Rackspace, our database hosting provider, is progressing and we are approximately 75% complete. Once these restarts are complete, we will begin returning services and processing data and messaging backlogs. Our next update will be in 30 minutes or less.
We have identified the root cause and are working to resolve a network issue in our US data centers. We are actively performing a rolling restart of database containers with Rackspace, our database hosting provider. We do not expect data loss, and further expect that all messages will be sent once the services are up and running. Our next update will be in 30 minutes or less.
We are continuing to work on a fix for this issue.
Work is ongoing by Engineers and our database provider to restore service.
Engineers are continuing to work alongside our Database provider to restore service.
Engineers are actively working with our Database provider to restore service.
We have identified a third-party networking issue.
Engineers are investigating an issue impacting multiple services on all US clusters.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 3278 services available
Integrations with
How much time you'll save your team, by having the outages information close to them?
14-day free trial · No credit card required · Cancel anytime