Outage in AWS
Increased API Error Rates and Latency
Resolved
Minor
June 29, 2021 - Started over 3 years ago
- Lasted 9 months
Need to monitor AWS outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including AWS, and never miss an outage again.
Start Free Trial →
Outage Details
12:12 PM PDT We are investigating increased API latency and error rates in the US-EAST-1 Region.
12:36 PM PDT We can confirm increased API latency and increased API error rates for the ACM APIs in the US-EAST-1 Region. During this time, you may be unable to Request new certificates, and may also observe errors when attempting to List and/or nodify existing certificates. This issue impacts both the AWS Management Console, and the ACM APIs. Additionally, you may also receive API errors when attempting to associate new resources. Existing associated resources are unaffected, and continue to operate as normal. We have identified the root cause of the issue and are working toward mitigation and resolution. We will provide further updates as we have more information to share.
1:14 PM PDT We continue to work toward mitigating the affected subsystem responsible for the increase in API Errors and Latencies for the ACM APIs. Other AWS Services (such as ClientVPN) who attempt to create or associate new certificates may also be impacted by this issue. Existing resources remain unaffected by this issue and continue to operate normally.
2:13 PM PDT We are continuing to drive to root cause and work toward mitigating the affected subsystem responsible for the increase in API Errors and Latencies for the ACM APIs. Other AWS Services (such as ClientVPN) who attempt to create or associate new certificates may also be impacted by this issue. Existing resources remain unaffected by this issue and continue to operate normally.
2:56 PM PDT We have identified some workloads on the affected subsystem of the ACM API that may be causing the increase in API errors and latency, and we are reviewing and testing procedures to mitigate their impact. We do not have an ETA at this time. This issue does affect services like CloudFront and ElasticSearch that rely on ACM for their certificate needs. It would also impact CloudFormation workflows that either directly or indirectly need to manipulate ACM certificates.
All workflows that depend on ACM certificates that are already created are not impacted by this event, and continue to operate normally.
3:33 PM PDT We continue to work toward mitigating the increased latencies and error rates affecting the ACM APIs. Until this point, some requests and retries have been succeeding. At this time, we are temporarily not accepting additional API requests, in order to help accelerate mitigation and recovery. Once we begin accepting new API requests, requests will be throttled. We will continue to provide updates as we progress.
4:49 PM PDT We are starting to see some ACM API calls succeed for CloudFront and ELB and we are starting to propagate changes for CloudFront distributions to our edge locations. Customer facing APIs are still throttled. ACM is continuing to make progress towards recovery.
5:47 PM PDT We are seeing recovery for customers and throttling has been removed from most APIs. We are working through the final changes to unblock the following APIs: RequestCertificate, ListCertificates, and ImportCertificate and expect to have those final changes in-place shortly. We will update as we make progress towards full recovery.
7:05 PM PDT We are seeing recovery for customers and throttling has been removed from most APIs. We have unblocked RequestCertificate for most use cases and are working to have the final changes in-place shortly. We will update as we make progress towards full recovery.
7:46 PM PDT Between 11:45 AM and 7:42 PM PDT, customers experienced increased ACM API errors and latency in the US-EAST-1 Region that impacted the ability to issue new certificates, import certificates and retrieve information about certificates from ACM. Existing certificates that were already vended to services such as CloudFront and ELB continued to operate and were unaffected. This issue also impacted provisioning and scaling workflows for services that depend on ACM for certificate management needs, such as CloudFront and ELB, as well as CloudFormation operations that involve mutating ACM certificates. This issue was caused by a previously unknown limit in an ACM storage subsystem. We have identified the limit issue and have mitigated it. The issue has been fully resolved and all ACM API requests are being answered normally. During this time, all existing resources that had a configured ACM certificate (such as ELB load balancers and CloudFront distributions) continued to operate normally, and were not impaired by this issue.