Service has been fully restored. All impacted jobs have been requeued and are currently processing normally. We will be publishing a public post-mortem with additional details about this incident.
The revert of the change helped and most of the metrics are back to the pre incident levels. We are requeuing failed jobs and monitoring to make sure the issue doesn’t come back.
We identified a potential internal networking configuration that may have caused the incident. We have since reverted that change and it appears services are recovering.
We are still investigating the root cause for this incident. us-east-2 region isn’t receiving any network traffic at this point. We are also seeing some API request errors in other US regions, but not as high as us-east-2.
We continue to see increased levels of 500 errors across US-West and US-East regions. Our engineering team is investigating the issue.
The issue identified it as a problem in US-West with some impact in US-East and the impact seems to be primarily on reads rather than writes.
We have identified increasing 500 errors in some US regions and are actively investigating the cause.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 5850 services available
Integrations with