Outage in AWS

Increased Error Rates and Latencies

Resolved Minor
June 13, 2023 - Started 4 months ago - Lasted about 4 hours

Outage Details

We are investigating increased error rates and latencies in the US-EAST-1 Region.
Components affected
AWS Organizations Amazon SageMaker (us-east-1) Amazon Augmented AI (us-east-1) Amazon SQS (us-east-1) Amazon Comprehend (us-east-1) AWS Directory Service (us-east-1) AWS Outposts (us-east-1) Amazon Pinpoint (us-east-1) Amazon FreeRTOS (us-east-1) Amazon EKS (us-east-1) Amazon OpenSearch Service (us-east-1) Amazon WorkMail (us-east-1) Amazon Managed Service for Prometheus (us-east-1) Amazon Connect (us-east-1) Amazon GameLift (us-east-1) Amazon Location Service (us-east-1) AWS CodePipeline (us-east-1) AWS Cloud9 (us-east-1) AWS Glue (us-east-1) AWS IAM AWS Elemental (us-east-1) AWS Lake Formation (us-east-1) Amazon SES (us-east-1) AWS Certificate Manager (us-east-1) AWS Management Console AWS Config (us-east-1) Amazon DocumentDB (us-east-1) Amazon Managed Streaming for Apache Kafka (us-east-1) Amazon RDS (us-east-1) AWS Service Catalog (us-east-1) AWS VPCE PrivateLink (us-east-1) Amazon WorkSpaces (us-east-1) Amazon ELB (us-east-1) Amazon Athena (us-east-1) Amazon ECR (us-east-1) AWS CodeStar (us-east-1) Amazon Inspector (us-east-1) Amazon Kinesis Video Streams (us-east-1) EC2 Image Builder (us-east-1) Amazon Lightsail (us-east-1) Amazon Managed Grafana (us-east-1) Amazon MQ (us-east-1) AWS Lambda (us-east-1) AWS Support Center Amazon AppStream 2.0 (us-east-1) Amazon Braket (us-east-1) Amazon Chime Amazon CloudFront Amazon CloudWatch (us-east-1) Amazon Cognito (us-east-1) Amazon EMR (us-east-1) Amazon Kendra (us-east-1) AWS AppSync (us-east-1) Amazon EventBridge (us-east-1) Amazon Managed Workflows for Apache Airflow (us-east-1) AWS CloudFormation (us-east-1) AWS Fault Injection Simulator (us-east-1) Amazon Kinesis Firehose (us-east-1) AWS Batch (us-east-1) Amazon API Gateway (us-east-1) Amazon Quantum Ledger Database (us-east-1) Amazon Redshift (us-east-1) AWS DataSync (us-east-1) AWS IoT SiteWise (us-east-1) Multiple services (us-east-1) AWS License Manager (us-east-1) AWS CodeCommit (us-east-1) AWS Migration Hub Strategy Recommendations (us-east-1) Amazon Interactive Video Service (us-east-1) Amazon FSx (us-east-1) Amazon Elastic File System (us-east-1) AWS Global Accelerator Amazon GuardDuty (us-east-1) AWS Single Sign-On (us-east-1) AWS Data Exchange (us-east-1) AWS Ground Station (us-east-1) Amazon Transcribe (us-east-1) AWS Resource Groups AWS Account Management Amazon MemoryDB for Redis (us-east-1) AWS QuickSight (us-east-1) AWS Control Tower (us-east-1) Amazon ElastiCache (us-east-1) AWS Amplify Admin (us-east-1) AWS Amplify (us-east-1) Amazon Route 53 Amazon DevOps Guru (us-east-1) AWS Marketplace Amazon CodeGuru Reviewer (us-east-1) Amazon CodeGuru Profiler (us-east-1) AWS Secrets Manager (us-east-1) Amazon ECS (us-east-1) Amazon Translate (us-east-1)
Latest Updates ( sorted recent to last )
UPDATE 4 months ago - at 06/13/2023 09:49PM

We are working to accelerate the rate at which Lambda asynchronous invocations are processed, and now estimate that the queue will be fully processed over the next hour. We expect that all queued invocations will be executed.

UPDATE 4 months ago - at 06/13/2023 09:29PM

Lambda synchronous invocation APIs have recovered. We are still working on processing the backlog of asynchronous Lambda invocations that accumulated during the event, including invocations from other AWS services (such as SQS and EventBridge). Lambda is working to process these messages during the next few hours and during this time, we expect to see continued delays in the execution of asynchronous invocations.

UPDATE 4 months ago - at 06/13/2023 09:00PM

Many AWS services are now fully recovered and marked Resolved on this event. We are continuing to work to fully recover all services.

UPDATE 4 months ago - at 06/13/2023 08:48PM

Beginning at 11:49 AM PDT, customers began experiencing errors and latencies with multiple AWS services in the US-EAST-1 Region. Our engineering teams were immediately engaged and began investigating. We quickly narrowed down the root cause to be an issue with a subsystem responsible for capacity management for AWS Lambda, which caused errors directly for customers (including through API Gateway) and indirectly through the use by other AWS services. We have associated other services that are impacted by this issue to this post on the Health Dashboard.

Additionally, customers may experience authentication or sign-in errors when using the AWS Management Console, or authenticating through Cognito or IAM STS. Customers may also experience intermittent issues when attempting to call or initiate a chat to AWS Support.

We are now observing sustained recovery of the Lambda invoke error rates, and recovery of other affected AWS services. We are continuing to monitor closely as we work towards full recovery across all services.

UPDATE 4 months ago - at 06/13/2023 08:38PM

We are beginning to see an improvement in the Lambda function error rates. We are continuing to work towards full recovery.

UPDATE 4 months ago - at 06/13/2023 08:14PM

We are continuing to work to resolve the error rates invoking Lambda functions. We're also observing elevated errors obtaining temporary credentials from the AWS Security Token Service, and are working in parallel to resolve these errors.

UPDATE 4 months ago - at 06/13/2023 07:36PM

We are continuing to experience increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. We have identified the root cause as an issue with AWS Lambda, and are actively working toward resolution. For customers attempting to access the AWS Management Console, we recommend using a region-specific endpoint (such as: https://us-west-2.console.aws.amazon.com). We are actively working on full mitigation and will continue to provide regular updates.

UPDATE 4 months ago - at 06/13/2023 07:26PM

We have identified the root cause of the elevated errors invoking AWS Lambda functions, and are actively working to resolve this issue.

UPDATE 4 months ago - at 06/13/2023 07:19PM

AWS Lambda function invocation is experiencing elevated error rates. We are working to identify the root cause of this issue.

UPDATE 4 months ago - at 06/13/2023 07:08PM

We are investigating increased error rates and latencies in the US-EAST-1 Region.

