Outage in AWS

Redshift cluster reboot and degraded performance

Resolved Minor
November 09, 2021 - Started over 3 years ago - Lasted 5 months

Need to monitor AWS outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including AWS, and never miss an outage again.
Start Free Trial

Outage Details

2:23 PM PST We are investigating cluster reboots and degraded performance for Redshift Clusters in the US-EAST-1 Region.

3:47 PM PST We continue to investigate cluster reboots and degraded performance for Redshift RA3 Clusters in the US-EAST-1 Region. The cluster reboots are being triggered by writes that are being impacted by the elevated S3 put latencies. Redshift attempts to retry these writes automatically but if they are unsuccessful after extended attempts, the cluster may restart. If you are able to pause write workloads while we work towards resolving the S3 put latency issue, your clusters will no longer restart and can serve read queries normally.

4:53 PM PST We have identified and continue to work on mitigating the root cause of the Redshift RA3 Clusters reboots and degraded performance in the US-EAST-1 Region. The cluster reboots are triggered by Redshift writes to S3 that have been impacted by the elevated S3 PUT API latencies. If you are impacted and able to pause Redshift write workloads on your RA3 clusters while we work towards recovery, your clusters will no longer restart and can serve read queries normally. As S3 API error rates and latencies continue to improve, we expect Redshift RA3 cluster restarts to decline as well.

7:19 PM PST While the S3 API error rates and latencies continue to hold steady, we continue to see a low rate of Redshift RA3 Cluster restarts. While we continue to take steps to mitigate and reduce the risk of a restart for affected Redshift clusters, we expect to see full recovery when the S3 error rates and latencies have fully recovered. Please refer to the S3 Service Health Dashboard updates for progress towards recovery.

7:47 PM PST As S3 has completed their mitigation and is operating normally, Redshift RA3 clusters are seeing writes succeed and clusters are no longer rebooting. For the vast majority of customers the issue has been resolved and the service is operating normally. We continue to work with a small number of impacted customers individually to recover their clusters and will reach out via the AWS Personal Health Dashboard and AWS Support.
Components affected
Amazon Redshift (us-east-1)

Stop Juggling Dozens of Status Pages – Monitor Them All in One Place

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4000 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook