Outage in AWS

Increased Invoke Error Rates

Resolved Minor
September 28, 2022 - 2 months ago - Lasted about 4 hours
Components affected
Amazon API Gateway (Oregon)

Details

10:13 AM PDT We are investigating increased error rates for invokes in the US-WEST-2 Region.

10:33 AM PDT We are investigating increased error rates for invokes in the US-WEST-2 Region. We do not yet have a root cause, but are investigating multiple potential root causes in parallel. In addition, we are implementing filters on inbound traffic from a set of sources with recent significant traffic shifts, which may help mitigate the impact. We do not yet have a solid ETA, but will continue to provide updates as we progress.

10:59 AM PDT We continue to see elevated error rates and latencies for invokes on API Gateway endpoints in the US-WEST-2 Region. While engineers continue to work towards root cause, we have deployed traffic filters from sources with significant increases in traffic prior to the event. As a result of these traffic filters, we are seeing a reduction in error rates and latencies, but continue to work towards full recovery. Although error rates are improving, we do not yet have an ETA for full recovery. The issue is also affecting API requests to some AWS services, including those listed below. Amazon Connect is experiencing increased failures in handling new calls, chats, and tasks as well as issues with user login in the US-WEST-2 Region. We will continue to provide updates as we progress.

11:33 AM PDT We continue to work on resolving the elevated error rates and latencies for invokes on API Gateway endpoints in the US-WEST-2 Region. We continue to see a significant improvement in error rates, starting at 10:40 AM PDT, but are not seeing full recovery yet. The issue is caused by contention within the subsystem that is responsible for request processing within the API Gateway service. Engineers are engaged and have applied traffic filters as a precautionary measure, while they work to identify the root cause and resolve the issue. Engineers continue to work to reduce contention within the affected subsystem, which we believe will resolve the elevated error rates and latencies. Customers with applications that use API Gateway, or customers invoking Lambda functions via API Gateway, will be experiencing elevated error rates and latencies as a result of this issue. The AWS services listed below are also experiencing elevated error rates as a result of this issue. While we have seen improvements in error rates since 10:40 AM PDT, recovery has stalled and we do not have a clear ETA on full recovery. For customers that have dependencies on API Gateway and are experiencing error rates, we do not have any mitigations to recommend to address the issue on the customer side. We do expect error rates to continue to improve as contention with the affected subsystem resides, and will provide further updates as recovery progresses.

12:26 PM PDT We continue to see an improvement in error rates and latencies for invokes on API Gateway endpoints in the US-WEST-2 Region, but have not fully resolved the issue. While our mitigations have improved error rates and latencies, we have also identified the root cause of the event. The subsystem responsible for request processing experienced increased load, which ultimately led to contention of a component within the affected subsystem. Engineers have been working to resolve the contention of the affected component, which has led to a reduction of error rates and latencies. The path to full recovery involves addressing the contention across the subsystem, which we are currently doing. As that progresses over the next two hours, we expect recovery to continue to improve. Customers with applications that use API Gateway will be experiencing elevated error rates and latencies as a result of this issue. Lambda is not affected by this event, but customers using API Gateway as an HTTP endpoint for Lambda will experience increased error rates and latencies. Other AWS services listed below are also experiencing elevated error rates as a result of this issue. For customers that have dependencies on API Gateway and are experiencing error rates, we do not have any mitigations to recommend to address the issue on the customer side. We do expect error rates to continue to improve as contention with the affected subsystem resides, and will provide further updates as recovery progresses.

1:03 PM PDT Error rates and latencies for invokes on API Gateway endpoints in the US-WEST-2 Region, continue to hold steady. Engineers continue to work on resolving the contention affecting the subsystem responsible for request processing. We recently completed a mitigation that should help to reduce error rates and latencies to normal levels and will have further updates on the result of that change in the next update. Although Lambda function invocations are not affected by this issue, the Lambda console is experiencing some error rates which we are investigating. Other AWS services affected by this issue remain in much the same state, waiting on the recovery of API Gateway.

1:22 PM PDT Starting at 1:12 PM PDT, we saw a further reduction in error rates and latencies for invokes on API Gateway endpoints in the US-WEST-2 Region. This was a result of the latest mitigation which addressed contention within the component in the subsystem responsible for request processing within API Gateway. Error rates are now at levels where some customers may begin to see recovery, and retries will begin to work more consistently. We will be applying the mitigation to the remaining hosts affected by the contention issue and expect to see further recovery from them in the next 30 minutes.

1:42 PM PDT Starting at 1:31 PM PDT, error rates and latencies for invokes on API Gateway endpoints in the US-WEST-2 Region are now close to pre-event levels, and we continue to work on the remaining hosts that are affected by the contention issue. Several AWS services, including AWS Connect and Lambda are seeing signs of strong recovery. We expect all services to recover as API Gateway error rates and latencies return to normal levels. Customers should be seeing recovery at these error levels as well. We will continue to provide updates until the error rates and latencies have returned to normal levels.

Monitor AWS and all your third-party services in one dashboard

Have you ever missed an important outage from a third-party service? We've built IsDown, so you never miss another outage again. It's the easiest way to monitor all your SaaS and cloud providers and get alerted when an outage impacts your business.

Start free trial

No credit card required · Cancel anytime · 2024 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Are you able to monitor your cloud services in a real-time and consistent way?

Before
  • Subscribe to status pages one-by-one
  • Limited to 0 notification options
  • Can't monitor only the parts that matter
  • No bird's eye view over all your services
  • Losing time looking for problems elsewhere
  • No access to historical issues and stats
After
  • Easily subscribe to all status pages
  • Normalized notifications sent to your tools
  • Monitor what matters
  • Easy access to the status of all your services
  • Outages information where it's needed
  • Historical data of outages for all your providers

IsDown is the missing layer in your monitoring stack

Quickly identify external outages that impact your business. We are monitoring more than 2000 services in real time.

Birds-eye view over all your services statuses

Check the status page aggregated of all your services in one place. No more going to each of the status pages and managing them individually.

IsDown Dashboard

Outage monitoring in real time

We monitor 24 hours a day, 7 days a week and will notify you if there is an incident. No more wasting time trying to figure out why something isn't working.

Alerts in your favorite channels

Get instant notifications in your email, Slack, Teams, or Discord when we detect a service outage. Outage monitoring where you are already doing your work.

IsDown Integrations

Easily integrate with your current tools and workflows

Using Zapier or Webhooks, you can easily integrate notifications into your processes. PagerDuty integration is also available.

Avoid notifications clutter

Configure which notifications you want to receive from each service. Filter notifications by service components. You can opt to receive notifications only when a specific component is affected. You can also choose to receive notifications with a certain severity.

Notify By Components
Multiple Dashboards

Have multiple dashboards. Easily shareable with the world.

Create one dashboard for each of your teams/clients/projects. Monitor only the services that each uses. Dedicated dashboard with custom notification settings. Easily make your dashboard public and share it with the world.

Prepare for scheduled maintenances

Never again be caught off guard by unexpected maintenance from your services. A feed of the next scheduled maintenances is available.

Weekly Digest of the services' outages

Every Monday, you'll receive a weekly summary of what happened the previous week as well as the maintenance schedule for the following week.

Integrate with tools you already use and love

The data and notifications you need, in the tools you already use.

For every team in your company

DevOps & On-Call Teams

You already monitor your internal systems. What about the external services? Monitor the services your business depends on. Don't waste time looking elsewhere when external outages are the cause of issues.

IT Support Teams

Detect external outages before your clients tell you. Anticipate possible issues and make the necessary arrangements. Having proactive communication, builds trust over clients and prevents flow of support tickets.

5 minute setup,
instant value for your team

  1. Step 1 Create an account

    Start with a trial account that will allow you to try and monitor up to 40 services for 14 days.

  2. Step 2 Select your cloud services

    There are 2024 services to choose from and you can start monitoring, and we're adding more every week.

  3. Step 3 Set up notifications

    You can get notifications by email, Slack, and Discord. You can also use Zapier or Webhooks to build your workflows.

  4. Step 4 Done!

    You'll start getting alerts when we detect outages in your external dependencies! No more wasting time looking in the wrong place!

Frequently Asked Questions

Is AWS down right now? What is AWS current status?
AWS seems to be up and running. We've updated the status less than a minute ago.
Was AWS down today?
AWS is up and running now. In the last 24 hours there was 0 outages.
I'm having issues with AWS, but the status is OK. What's going on?
There are a few things you can try:
  • Check the official status page for more information.
  • Check the Twitter account for more information.
  • Check on the top of the page if there are any reported problems by other users.
AWS outage? How can I monitor AWS?
Why use IsDown instead of AWS status page?
IsDown is a status page aggregator, which means that we aggregate the status of multiple cloud services. Monitor all the services that impact your business. Get a dashboard with the health of all services and status updates. Set up notifications via email, Slack, or Discord when a service you monitor has issues or when maintenances are scheduled.
What happens when I create an IsDown account?
You'll have access to a 14-day trial in our Pro plan. You can cancel or delete your account anytime. After 14 days, you'll need to subscribe to continue to use the service and get notifications.
How can I pay for a subscription?
You can go to the Billing section in your account and choose one of the plans. We have monthly and yearly options. We accept all major credit cards, Apple Pay, and Google Play. We use Stripe for payments.
Can I get a refund?
We'll refund your subscription if you cancel it until ten days after the subscription has started. No questions asked.
Can't find a service/integration?
Just contact us, and we'll add it ASAP.

Setup in 5 minutes or less

Try it out! How much time you'll save your team, by having the outages information close to them?

  • 14-day free trial
  • No credit card required to start
  • Cancel anytime
  • +2000 services available