Outage in AWS

Increased Error Rates and Latencies - N. Virginia

Resolved Minor
October 28, 2025 - Started about 11 hours ago - Lasted about 7 hours

Incident Report

Earlier today some EC2 launches within the use1-az2 Availability Zone (AZ) experienced increased latencies for EC2 instance launches. We communicated with affected customers via the AWS Personal Health Dashboard shortly after the issue began. This issue has been resolved and EC2 instance launches are operating normally, however some request throttles are currently in place for the use1-az2 Availability Zone (AZ), which are gradually being removed. Customers may experience “request limit exceeded” in this AZ while these throttles are in place; retries should resolve the issue. Currently we are investigating task launch failure rates for ECS tasks for both EC2 and Fargate for a subset of customers in the US-EAST-1 Region. Customers may also see their container instances disconnect from ECS which can cause tasks to stop in some circumstances. ECS operates cells in the Region and a small number of these cells are currently experiencing elevated error rates launching new tasks and existing tasks may stop unexpectedly. When creating an ECS cluster to run tasks, the cluster is assigned to a specific cell. Customers with a cluster in impacted cells are seeing impact across all Availability Zones in the Region. At this time, we recommend customers who can, create new clusters to ensure that the cluster is assigned to a healthy cell. Existing clusters in the remaining healthy cells are not affected. We have identified actions to restore the impacted cells to full service but do not have an estimated time of recovery. Customers who use EMR Serverless are also affected by this issue. We will provide an update by 4:15 PM PDT or as soon as more information becomes available.

Need to monitor AWS outages?

One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.

Try IsDown risk-free 14-day free trial · No credit card required
Latest Updates ( sorted recent to last )
UPDATE about 4 hours ago - at 10/29/2025 04:52AM

We are observing significant recovery for services impacted by ECS. For ECS itself, we have recovered two of the three impacted cells and continue to work towards recovering the remaining cell. We have lifted throttles for the two recovered cells but throttling remains in effect for the third cell. The vast majority of customer applications should be recovered. We will continue to provide updates as we have additional information available, or by 11:00 PM.

UPDATE about 5 hours ago - at 10/29/2025 03:54AM

We are seeing significant signs of recovery and continue to work toward full resolution.

UPDATE about 6 hours ago - at 10/29/2025 03:08AM

We have made additional progress. For EMR Serverless, we have completed refreshing the warm pool with healthy clusters. We recommend customers restart their existing applications. We are also observing ECS task launches are beginning to succeed. While we work toward full resolution of the underlying issue, some requests will be throttled. Our current best estimate of an ETA to full recovery is an additional 1-2 hours away. As we make additional progress, success rates for affected operations are continuing to improve. We will continue to provide updates as we have additional information available, or by 9:05 PM.

UPDATE about 7 hours ago - at 10/29/2025 01:50AM

We continue to make additional progress towards ongoing mitigation efforts. While we have not fully recovered, we can confirm we are seeing positive signs of improvement for ECS clusters task launches on the impacted cells in the US-EAST-1 Region. For customers who need immediate recovery, we recommend recreating impacted ECS Clusters using a different identifier for clusterName. We continue to work toward full recovery. For EKS, impact is limited to Fargate launches only.

For MWAA environments that are stuck in an unhealthy state, or that are impaired, we recommend customers perform an update to the environment without changing the current configuration.

For EMR serverless, we have made substantial progress to refresh the warm pool with healthy clusters and continue to work toward full recovery. Once we have fully refreshed these warm pools we will provide additional guidance for required action to mitigate impact.

Our current best estimate of an ETA to full recovery is an additional 2-4 hours away. As we make additional progress, success rates for affected operations will improve. We will continue to provide updates as we have additional information available, or by 7:45 PM.

UPDATE about 9 hours ago - at 10/29/2025 12:31AM

We want to provide an update on EMR Serverless. EMR Serverless maintains a warm pool of ECS clusters to support customer requests, and some of these clusters are operating in the impacted ECS cells. In order to reduce EMR Serverless error rates, we are actively working on refreshing these warm pools with healthy clusters. For ECS, we continue to make progress on recovering impacted ECS cells, but progress is not visible externally. ECS has stopped new launches and tasks on the affected clusters. Some services (such as Glue) are observing recovery for error rates, but may still be experiencing increased latency. Our current best estimate of an ETA is 2-3 hours away. As we make additional progress, success rates for affected operations will improve. We will continue to provide updates as we have additional information available, or by 6:30 PM.

UPDATE about 10 hours ago - at 10/28/2025 11:31PM

For EMR Serverless, some jobs continue to experience increased execution delays or failures. For EC2, we continue to throttle some requests (new instance launches and other networking related mutating API calls) in a single Availability Zone (use1-az2) in the US-EAST-1 Region. These throttles will remain until we have fully mitigated all issues, and have a high degree of confidence that the issue will not reoccur. Existing instances are unaffected by this issue. For ECS, we are continuing to make progress toward recovering the impacted ECS cells, but this has not yet resulted in customer visible improvements. Customers that are experiencing task launch errors and latencies in the impacted ECS cells are not yet observing improvement. While we do not have a firm ETA, we expect full recovery is 2-3 hours away. As we make additional progress, success rates for affected operations will improve. We will continue to provide updates as we have additional information available, or by 5:30 PM.

UPDATE about 11 hours ago - at 10/28/2025 10:36PM

Earlier today some EC2 launches within the use1-az2 Availability Zone (AZ) experienced increased latencies for EC2 instance launches. We communicated with affected customers via the AWS Personal Health Dashboard shortly after the issue began. This issue has been resolved and EC2 instance launches are operating normally, however some request throttles are currently in place for the use1-az2 Availability Zone (AZ), which are gradually being removed. Customers may experience “request limit exceeded” in this AZ while these throttles are in place; retries should resolve the issue.

Currently we are investigating task launch failure rates for ECS tasks for both EC2 and Fargate for a subset of customers in the US-EAST-1 Region. Customers may also see their container instances disconnect from ECS which can cause tasks to stop in some circumstances. ECS operates cells in the Region and a small number of these cells are currently experiencing elevated error rates launching new tasks and existing tasks may stop unexpectedly. When creating an ECS cluster to run tasks, the cluster is assigned to a specific cell. Customers with a cluster in impacted cells are seeing impact across all Availability Zones in the Region. At this time, we recommend customers who can, create new clusters to ensure that the cluster is assigned to a healthy cell. Existing clusters in the remaining healthy cells are not affected. We have identified actions to restore the impacted cells to full service but do not have an estimated time of recovery. Customers who use EMR Serverless are also affected by this issue. We will provide an update by 4:15 PM PDT or as soon as more information becomes available.

The Status Page Aggregator Built for IT Teams

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4522 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook