Outage in AWS
Impaired EC2 Instances
Resolved
Minor
July 10, 2022 - Started over 2 years ago
- Lasted about 3 hours
Need to monitor AWS outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including AWS, and never miss an outage again.
Start Free Trial →
Outage Details
10:56 AM PDT We are investigating instance impairments in a single Availability Zone (EUW2-AZ1) in the EU-WEST-2 Region. Other Availability Zones are not affected by the event and we are working to resolve the issue.
11:13 AM PDT Some instances in a single Availability Zone are currently impaired or have lost power in the EU-WEST-2 Region. The root cause is a thermal event within a data center in the affected Availability Zone that we are working to resolve. Some instances may also be experiencing network connectivity issues in the affected Availability Zone. Elastic Load Balancing has shifted traffic away from the Availability Zone. All multi-AZ databases have failed away from the affected Availability Zone, however single-AZ databases will remain affected until we see full recovery. We do not yet have a an ETA for full recovery but expect it to be more than an hour. We will provide further guidance as soon as we have it. For customers that are able to fail away from the affected Availability Zone, we recommend doing so.
11:55 AM PDT We continue to investigate instance impairments to a single Availability Zone in the EU-WEST-2 Region. We have experienced an increase in temperatures within a single data center in the affected Availability Zone, which in some cases has caused impairments for instances in the Availability Zone. We have engineers within the affected data center and are working to resolve the issue. ELB has shifted traffic away from the affected Availability Zone, and API Gateway has mitigated the majority of impact and continues to work on the remaining endpoints. Elastic File System (EFS) is experiencing errors within the affected Availability Zone. RedShift, OpenSearch, and ElastiCache are experiencing error rates for clusters within the affected Availability Zone. RDS has successfully mitigated impact for all multi-AZ databases, however single-AZ databases will remain affected until we see recovery. Lambda is largely unaffected by the event, however a very small number of functions may be experiencing invocation errors within the affected Availability Zone. The EC2 APIs are experiencing increased error rates within the affected Availability Zone, but instance launches continue to work on other Availability Zones. Some services, like EMR and Fargate, are seeing delays in provisioning new instances in the affected Availability Zone due to the EC2 API impact in that Availability Zone. Our engineering teams continue to work towards identifying the root cause of the thermal event and resolving it. At this stage, we do not have an ETA but still expect it to be more than an hour. We continue to recommend using other Availability Zones in the EU-WEST-2 Region, which remain unaffected by this event.
12:25 PM PDT We have resolved the root cause of the thermal event and are starting to see recovery for impaired EC2 instances within the EU-WEST-2 Region. At this stage, the vast majority of EC2 instances have recovered and we continue to work on the instances that are still affected. ELB and API Gateway have shifted traffic away from the affected Availability Zone. EFS is only experiencing error rates for one zone filesystems within the affected Availability Zone; standard filesystems, which use multiple Availability Zones, are not affected. Customers should now be seeing recovery for instance impairments, and we expect to see recovery for the vast majority of instances within the next hour. We will continue to provide updates as recovery continues.
1:01 PM PDT We continue to make progress in resolving the EC2 impaired instances in a single Availability Zone in the EU-WEST-2 Region. At this stage, the vast majority of affected instances have recovered and we continue to work towards full recovery. ELB and API Gateway remain weighted away from the affected Availability Zone for now. EFS has recovered the availability of all One Zone file systems in the affected Availability Zone. RDS multi-AZ databases remain available and we are starting to see recovery for single-AZ databases within the affected Availability Zone. EKS pods within the affected Availability Zone are starting to see recovery, however EKS did experience failures during cluster creation during the event. RedShift, OpenSearch, and ElastiCache are starting to see recovery within the affected Availability Zone. EC2 APIs have fully recovered and new instances can once again be launched in the affected Availability Zone. Customers should be seeing most of their affected instances in recovery, although we continue to work on a small number of affected EC2 instances and EBS volumes that are still impaired. We will continue to provide updates as recovery progresses.
1:53 PM PDT We have resolved the impairments for the vast majority of EC2 instances in the affected Availability Zone (euw2-az1) in the EU-WEST-2 Region. There are a small number of EC2 instances and EBS volumes that were hosted on hardware that have been affected by the loss of power during the thermal event. For these EC2 instances and EBS volumes, we will be opening Personal Health Dashboard notices to track recovery. We have seen full recovery for a number of AWS services, including AWS Transit Gateway, Amazon Connect, Amazon Relational Database Service, Amazon ElastiCache, Amazon Elastic Container Service, Amazon Elastic File System, Amazon, Elastic Kubernetes Service, Amazon Elastic MapReduce, Amazon OpenSearch Service, and Amazon Redshift. The remaining AWS services, including Amazon API Gateway, Amazon CloudFront, Amazon Elastic Load Balancing are very close to full recovery at this stage. Elastic Load Balancing and API Gateway will be shifting traffic back into the affected Availability Zone shortly. We’re working through the remaining EC2 instances and EBS volumes and expect to see full recovery in the next 30 minutes.