Outage in Fly.io

Degraded API Performance

Resolved Minor
November 25, 2024 - Started 27 days ago - Lasted about 12 hours
Official incident page

Need to monitor Fly.io outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Fly.io, and never miss an outage again.
Start Free Trial

Outage Details

We are investigating degraded API performance
Components affected
Fly.io Dashboard Fly.io API
Latest Updates ( sorted recent to last )
RESOLVED 26 days ago - at 11/26/2024 08:15AM

This incident has been resolved.

MONITORING 26 days ago - at 11/26/2024 05:43AM

We are scaling up our systems to handle the increased traffic

MONITORING 26 days ago - at 11/26/2024 03:42AM

All hosts have completed the restoration process and we are seeing our overall Corrosion cluster health and performance return to normal.

Machine API and GraphQL API error rates are improving, but some users may still see elevated rates of request timeouts and/or 504 errors when using the Machines API or Flyctl commands. We are continuing to monitor these services as they recover.

MONITORING 26 days ago - at 11/26/2024 02:31AM

The restore process has completed on the majority of hosts in our fleet and we are seeing overall Corrosion cluster health and performance return to normal.

There are a small number of hosts that are still being worked on, we aim to have them restored shortly.

IDENTIFIED 27 days ago - at 11/26/2024 02:06AM

We are running a restoration and reseed process to bring the Corrosion cluster back to a healthy, current state.
During this restoration process, you may see elevated error rates on machines or apps that have been recently updated.

IDENTIFIED 27 days ago - at 11/25/2024 11:58PM

The updates have been applied, however we are still not seeing recovery on all Corrosion nodes. We are continuing to work on a fix.

The machines API and proxy performance remains in a degraded state, especially with newly created and updated machines.

IDENTIFIED 27 days ago - at 11/25/2024 10:15PM

The Machines API issues stem from a propagation delay in our global state store, Corrosion.

We have completed deploying a configuration change to our Corrosion cluster and will be applying these changes to each node shortly. We expect improvement once the changes are applied.

In the meantime users may still see degraded machines API and proxy performance, especially with newly created machines

IDENTIFIED 27 days ago - at 11/25/2024 08:20PM

The issue has been identified and a fix is being implemented.

INVESTIGATING 27 days ago - at 11/25/2024 08:10PM

We are investigating degraded API performance

Latest Fly.io outages

Network Instability - 10 days ago
Networking issues in GDL - 19 days ago
sjc region capacity - 19 days ago

Need to know when vendors go down? You’re in the right place

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3278 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime