Need to monitor AWS outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including AWS, and never miss an outage again.
Start Free Trial
Kinesis Data Streams and Cloudwatch Logs error rates have fully recovered and are operating normally within the US-EAST-1 Region. Other services, including ECS Fargate, API Gateway, and Lambda have also recovered. While we would expect recovery for the vast majority of customer applications, we’re continuing to work towards full recovery.
We are seeing significant recovery for most AWS Services at this stage. While we are not yet fully recovered, most AWS Services are observing recovery. We are seeing full recovery for Fargate launches at this time. As we recover we expect to see new CloudWatch logs showing as they become available. We continue to work toward full recovery for remaining AWS Services. We continue to expect full recovery to be within the next 2 hours.
We continue to work toward recovery, though progress is occurring slower than originally anticipated. We are seeing some improvements internally, though they may not be visible externally. Some Services (like Cloudwatch Logs) may not observe recovery until we have fully resolved the underlying issue within the Kinesis subsytem. In parallel to our mitigation efforts, we are actively working to speed up the recovery process. At this time, we still expect full recovery to be 1-2 hours away. We will continue to share updates as we have additional information to share, or within the next 60 minutes.
We continue to work on resolving the increased error rates and latencies for Kinesis APIs in the US-EAST-1 Region. We wanted to provide you with more details on what is causing the issue. Starting at 2:45 PM PDT, a subsystem within Kinesis began to experience increased contention when processing incoming data. While this had limited impact for most customer workloads, it did cause some internal AWS services - including CloudWatch, ECS Fargate, and API Gateway to experience downstream impact. Engineers have identified the root cause of the issue affecting Kinesis and are working to address the contention. While we are making progress, we expect it to take 2 -3 hours to fully resolve.
As a result of this issue, CloudWatch logs is experiencing increased error rates and latencies when processing incoming logs. Any customer using the CloudWatch logs APIs may experience elevated errors. CloudWatch metrics extraction from these logs may be delayed and alarms may transition into "INSUFFICIENT_DATA" state if set on delayed metrics.
ECS Fargate is experiencing failures when attempting to launch new tasks, also because of a dependency on CloudWatch logs. We are currently working on a change to remove this dependency and have also taken steps to reduce the likelihood of task retirement.
API Gateway continues to process requests correctly but is seeing errors when sending logs to CloudWatch. Some customers may also experience error when using Lambda with API Gateway, but we believe this is related to failures within the Lambda function code itself, such as attempts to invoke CloudWatch logs APIs.
AWS Lambda continues process invocations correctly but is unable to send logs to CloudWatch logs. As a result, customers may not be able to see the logs of their asynchronous Lambda invocations.
We have also seen periods of elevated failures with IAM Identity Center and Organizations as a result of this issue.
We will continue to provide updates every 30-60 minutes, or sooner if we have additional information to share.
We continue to work on resolving the increased error rates and latencies for Kinesis APIs in the US-EAST-1 Region. We have identified the root cause and are actively working on multiple parallel paths to mitigate the issue. As a result of this issue, CloudWatch logs continues to see delayed log delivery but metrics continue to operate normally. Some customers may also be experiencing elevated failures with IAM Identity Center and Organizations as a result of this issue. We will continue to provide updates as we make progress.
We can confirm increased error rates and latencies for Kinesis APIs within the US-EAST-1 Region. We have identified the root cause and are actively working to resolve the issue. As a result of this issue, other services, such as CloudWatch, are also experiencing increase error rates and delayed Cloudwatch log delivery. We will continue to keep you updated as we make progress in resolving the issue.
We are seeing increased error rates and latencies for some service APIs within the US-EAST-1 region.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 3260 services available
Integrations with
How much time you'll save your team, by having the outages information close to them?
14-day free trial · No credit card required · Cancel anytime