Outage in AWS

Increased API Error Rates

Resolved Minor
November 25, 2020 - Started almost 5 years ago - Lasted over 1 year

Incident Report

6:36 AM PST We are investigating increased error rates for Kinesis Data Streams APIs in the US-EAST-1 Region.

7:50 AM PST We are continuing to investigate increased Kinesis Data Streams API errors, and are working on identifying root cause.

8:12 AM PST Kinesis Data Streams customers are still experiencing increased API errors. This is also impacting other services, including ACM, Amplify Console, API Gateway, AppStream2, AppSync, Athena, Cloudformation, Cloudtrail, CloudWatch, Cognito, Connect, DynamoDB, EventBridge, IoT Services, Lambda, LEX, Managed Blockchain, Resource Groups, SageMaker, Support Console, and Workspaces. We are continuing to work on identifying root cause.

8:52 AM PST The Kinesis Data Streams API is severely impaired. This is also impacting other services, including ACM, Amplify Console, API Gateway, AppStream2, AppSync, Athena, CloudFormation, CloudTrail, CloudWatch, Cognito, Connect, DynamoDB, EventBridge, IoT Services, Lambda, LEX, Managed Blockchain, Resource Groups, SageMaker, Support Console, and Workspaces. We are actively working towards resolution.

9:32 AM PST The Kinesis Data Streams API is currently impaired in the US-EAST-1 Region. As a result customers are not able to write or read data published to Kinesis streams. CloudWatch metrics and events are also affected, with elevated PutMetricData API error rates and some delayed metrics. While EC2 instances and connectivity remain healthy, some instances are experiencing delayed instance health metrics, but remain in a healthy state. AutoScaling is also experiencing delays in scaling times due to CloudWatch metric delays. The issue is also affecting other services, including ACM, Amplify Console, API Gateway, AppMesh, AppStream2, AppSync, Athena, Batch, CloudFormation, CloudTrail, Cognito, Connect, DynamoDB, EventBridge, Glue, IoT Services, Lambda, LEX, Managed Blockchain, Marketplace, Personalize, RDS, Resource Groups, SageMaker, Support Console, Well Architected, and Workspaces. For further details on each of these services, please see the Personal Health Dashboard. Other services, like S3, remain unaffected by this event. This issue has also affected our ability to post updates to the Service Health Dashboard. We are continuing to work towards resolution.

11:23 AM PST We continue to work towards recovery of the issue affecting the Kinesis Data Streams API in the US-EAST-1 Region. For Kinesis Data Streams, the issue is affecting the subsystem that is responsible for handling incoming requests. The team has identified the root cause and we continue to make progress in addressing the root cause. We are seeing some improvement in error rates, but continue to work towards full resolution. The issue also affects other services, or parts of these services, that utilize Kinesis Data Streams within their workflows. While features of multiple services are impacted, some services have seen broader impact and service-specific impact details are included within Recent Events on the Service Health Dashboard.

1:59 PM PST Kinesis Data Streams API requests are still significantly impaired. We have identified a mitigation for this issue, and are actively working towards resolution.

2:49 PM PST Kinesis Data Streams API requests are still impaired but are starting to see recovery. We continue to actively work towards resolution.

4:42 PM PST Kinesis Data Streams API operations are seeing gradual recovery but customers may continue to experience increased latencies and failure rates. We continue to actively work towards resolution.

6:32 PM PST We have now fully mitigated the impact to the subsystem within Kinesis that is responsible for the processing of incoming requests and are no longer seeing increased error rates or latencies. However, we are not yet taking the full traffic load and are working to relax request throttles on the service. Over the next few hours we expect to relax these throttles to previous levels. We expect customers to begin seeing recovery as these throttles are relaxed over this timeframe.

8:53 PM PST We are continuing to relax the request throttles for Kinesis Data Streams and are gradually increasing the traffic into the service. We have not yet enabled requests to Kinesis Data Streams from VPC Endpoints. The Kinesis Data Streams subsystem continues to operate normally, and we expect incremental recovery over the next few hours.

9:26 PM PST We have now enabled a subset of requests to Kinesis Data Streams using VPC Endpoints.

10:06 PM PST We have now enabled all requests to Kinesis Data Streams through Internet-facing endpoints. We are continuing to work to re-enable all requests to Kinesis Data Streams using VPC Endpoints.

11:00 PM PST We have now enabled all requests to Kinesis Data Streams through both Internet-facing endpoints and VPC Endpoints.

Nov 26, 12:03 AM PST Between 5:15 AM and 11:10 PM PST customers experienced a significant impairment to their Amazon Kinesis Data Streams API operations. We have identified the root cause and have completed immediate actions to prevent recurrence. The issue has been resolved and the service is operating normally.

Need to monitor AWS outages?

One place to monitor all your cloud vendors. Get instant alerts when an outage is detected.

Try IsDown risk-free 14-day free trial · No credit card required

The Status Page Aggregator Built for IT Teams

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4522 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook