Outage in Alloy

Increased API Errors - Cloud Provider Issues

Resolved Major
October 20, 2025 - Started 5 days ago - Lasted about 21 hours
Official incident page

Incident Report

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

Need to monitor Alloy outages?

One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.

Try IsDown risk-free 14-day free trial · No credit card required
Latest Updates ( sorted recent to last )
RESOLVED 4 days ago - at 10/21/2025 03:52AM

As of 11:44 PM ET, the webhook backlog has been fully processed, and real-time webhook processing has been restored.

All major Alloy services are now fully operational, with performance metrics and latency levels remaining stable.

Our teams continue to work closely with AWS and the third-party vendor to complete a comprehensive post-mortem. We will share a Root Cause Analysis (RCA) once that review is finalized. If you’d like to receive the RCA once it's available, please email support@alloy.com
.
We appreciate your patience and understanding throughout this incident.

MONITORING 4 days ago - at 10/21/2025 01:39AM

We’re pleased to share that normal operations have been restored for all major components. System latencies and performance metrics have returned to expected levels. All internal tests are passing, and we are no longer seeing errors related to the third-party component.

Our teams work closely with AWS and the third-party vendor to conduct a comprehensive post mortem of the incident. We will share a Root Cause Analysis report once this review is complete. Please email support@alloy.com if you would like the RCA once it’s available.

Please note that real-time processing of Webhooks is still delayed. Until our queue clears there will be delayed Webhooks. The queue is at about 60% cleared.

We appreciate your patience and understanding throughout this event.

MONITORING 4 days ago - at 10/21/2025 12:46AM

We’ve completed a release to production after confirming that all errors were resolved in staging. The production environment is now showing signs of recovery. We are actively monitoring progress.

Additionally, we are still waiting for the webhook backlog to clear. Real-time processing of webhooks remains delayed until the queue is fully processed.

INVESTIGATING 4 days ago - at 10/21/2025 12:16AM

Following a best-practice recommendation from our third-party vendor, we are deploying a change to our staging environment for validation and testing. We’ll proceed to production once we confirm the fix is effective. Next update to come after we complete the staging release and confirm results - we’ll aim for an update in about 30 minutes.

Additionally, the webhook backlog continues to clear. Real-time processing of webhooks remains delayed until the queue is fully processed.

INVESTIGATING 4 days ago - at 10/20/2025 10:42PM

We believe we have identified a third-party component that is throttling requests as part of recovery efforts to mitigate the ongoing impact from the AWS outage. This throttling is contributing to elevated network latency across Alloy services.

We are optimistic that we can mitigate some of this impact with changes on our side and are actively working on those adjustments now. We will post another update at 8 PM ET or once substantial updates become available.

INVESTIGATING 4 days ago - at 10/20/2025 09:57PM

AWS continues to experience residual latency. All components are currently seeing elevated latency.

Our engineering team is recycling service pods, and we are beginning to see signs of recovery. We are also investigating additional areas to further mitigate the impact.

We will continue to closely monitor performance and share updates as they become available.

INVESTIGATING 4 days ago - at 10/20/2025 09:01PM

At this point, we again unfortunately have no substantial update to share.

We continue to investigate this with AWS at the highest priority.

INVESTIGATING 4 days ago - at 10/20/2025 08:32PM

We are continuing to investigate connectivity issues within our cloud services. Customers may also experience Dashboard slowness while the issue persists.
We’ll provide further updates as more information becomes available.

INVESTIGATING 4 days ago - at 10/20/2025 08:21PM

We are continuing to investigate connectivity issues within our cloud services. Customers may experience API timeouts while the issue persists.
The dashboard remains fully operational. Webhooks are not impacted by the connectivity issues, but are still experiencing real-time processing delays as we continue clearing the backlog.
We’ll provide further updates as more information becomes available.

INVESTIGATING 4 days ago - at 10/20/2025 07:51PM

Our monitoring has detected an increase in automated test failures, and our engineering team is currently investigating.

MONITORING 5 days ago - at 10/20/2025 07:05PM

We are continuing to monitor for any further issues.

MONITORING 5 days ago - at 10/20/2025 07:02PM

Network connectivity issues have been resolved. The Alloy Engineering team has rebuilt internal queues at 2:40 PM ET, and we are now seeing internal tests are now passing.

- Webhooks: Backlog is still clearing. Realtime webhooks will continue to experience significant delays until the backlogged queue clears.
- Dashboard: Has been restored to full functionality.
- Journeys: Recovery is in progress. Journey Applications are starting to progress.
- API: API latencies have resolved.

We are continuing to monitor system performance as services come back online. We will be doing a root cause analysis into the full scope of impact.

Further updates will be shared as recovery progresses.

INVESTIGATING 5 days ago - at 10/20/2025 06:32PM

AWS has reported progress in EC2 recovery, but Alloy’s full recovery remains dependent on AWS systems stabilizing. At this time, we are observing the following:

- Webhooks: Backlog processing has begun, but a large queue remains. Real-time webhook delivery continues to experience delays.
- Journeys: There continue to be intermittent Journeys failures. Some Journey Applications are failing to write to S3 - those will need to be retried once the incident is resolved.
- Customer Dashboard: Intermittent latency may occur when loading or navigating the dashboard.
- API: Intermittent latency persists for some requests.

Our team continues to closely monitor AWS recovery and system performance. We’ll provide further updates as additional progress is made as close to 30 minute intervals as possible.

INVESTIGATING 5 days ago - at 10/20/2025 05:37PM

At this point, we unfortunately have no substantial update to share.

We continue to investigate this with AWS at the highest priority.

Thank you for the ongoing patience. We will keep you updated every 30 minutes or as more substantial information becomes available.

INVESTIGATING 5 days ago - at 10/20/2025 05:07PM

AWS continues to investigate ongoing EC2 and networking issues. Alloy services remain impacted.

Our team remains in contact with AWS and will share updates every 30 minutes or as more substantial information becomes available.

INVESTIGATING 5 days ago - at 10/20/2025 04:31PM

We are still investigating the issue.

AWS has implemented additional mitigation steps and is observing some recovery, but operations remain degraded and have not been fully restored.

Alloy systems have not yet stabilized, and we are working closely with AWS support to restore full functionality.

INVESTIGATING 5 days ago - at 10/20/2025 04:02PM

We are still addressing the issues following the recent AWS outage - you can follow their status at https://health.aws.amazon.com/health/status.

Customers may experience issues in the following areas:

- API: We are investigating intermittent unavailability in the API. Requests may return 5XX errors.
- Journeys and Webhooks: Ongoing latency as the Alloy Engineering team continues to work on resolution. If there were any Journey Applications in a pending state at 11:29 AM ET they will not complete successfully, and any associated asynchronous tasks have been lost. If a Journey Application was run during this time and is not in a terminal state (e.g., Approved or Denied) - such as those waiting on step-ups or webhook actions - it will remain in an incomplete state. API calls for these applications will need to be re-run.
- Logging into Alloy and Dashboard: Intermittent may occur while logging in, navigating the dashboard, or submitting reviews. Some users may intermittently see gateway timeout errors or blank screens.
- Third party integrations: Services that depend on AWS may also experience degraded performance, causing applications to fail and result in partial results.

Our engineering team is actively working to restore all services to full capacity as quickly as possible. We’ll continue to provide updates as more information becomes available

INVESTIGATING 5 days ago - at 10/20/2025 03:26PM

We are still addressing the issues following the recent AWS outage. Customers may experience issues in the following areas:

- API: We are investigating unavailability in the API. Requests may return 5XX errors.
- Journeys and Webhooks: Ongoing latency as the Alloy Engineering team continues to work on resolution.
- Logging into Alloy and Dashboard: Latency may occur while logging in, navigating the dashboard, or submitting reviews. Some users may intermittently see gateway timeout errors.
- Data Vendor: Services that depend on AWS may also experience degraded performance, causing applications to fail and result in partial results.

We appreciate your patience as our team continues to work toward full restoration of service.

INVESTIGATING 5 days ago - at 10/20/2025 02:27PM

We are still addressing the issues after the AWS outage. Alloy Engineering team is working on resolving the ongoing latency issues with Journeys and Webhooks.
Any vendors that depend on AWS may also be experiencing degraded performance at the moment.

INVESTIGATING 5 days ago - at 10/20/2025 01:02PM

Clients on /evaluations are not impacted. Clients implemented on Journeys may experience delayed journey API response and delayed webhooks that will self resolve. The is no data loss and no need for retries at this time

INVESTIGATING 5 days ago - at 10/20/2025 12:51PM

We are still working on resolving issues related to Webhooks processing. Customers may experience issues with Journey Application processing in this time

INVESTIGATING 5 days ago - at 10/20/2025 11:09AM

We are investigating ripple effects of the cloud outage on our outgoing webhooks and journeys processing.

MONITORING 5 days ago - at 10/20/2025 09:48AM

Our cloud provider confirmed there were service outages and have since released a fix. We are monitoring as systems recover.

IDENTIFIED 5 days ago - at 10/20/2025 07:43AM

We have identified an issue with our cloud provider and are waiting for additional information

INVESTIGATING 5 days ago - at 10/20/2025 07:15AM

Our API integration tests have encountered an increase in errors. We are currently investigating. Stay tuned for updates.

The Status Page Aggregator Built for IT Teams

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4522 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook