Outage in Fly.io

Increased API failures

Resolved Minor
October 22, 2024 - Started 17 days ago - Lasted about 8 hours
Official incident page

Need to monitor Fly.io outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Fly.io, and never miss an outage again.
Start Free Trial

Outage Details

We have identified the cause of an increase in API errors across the platform and are working on a fix.
Components affected
Fly.io Deployments Fly.io API
Latest Updates ( sorted recent to last )
RESOLVED 16 days ago - at 10/23/2024 02:22AM

This incident has been resolved.

MONITORING 16 days ago - at 10/23/2024 01:30AM

Our internal state is fully re-synchronized, and our metrics are returning to normal. We are continuing to monitor for potential ongoing issues.

IDENTIFIED 16 days ago - at 10/23/2024 12:07AM

Restoration of our state propagation system is complete. The system is now processing updates to re-synchronize back to the latest state. Services and APIs should start to recover once this process is completed.

IDENTIFIED 17 days ago - at 10/22/2024 11:08PM

Our state propagation system is significantly delayed. To speed up recovery, we will restore the system from the snapshot to clear the backlog. Your machine may be missing from fly m list and some other APIs, but all of your started machines will still be running. The state will re-synchronize back to latest once restoration is completed.

IDENTIFIED 17 days ago - at 10/22/2024 10:27PM

We are continuing to work on a fix for this issue.

IDENTIFIED 17 days ago - at 10/22/2024 09:15PM

Parts of our APIs should have resumed normal function. We are still applying a fix to the rest of the APIs.

IDENTIFIED 17 days ago - at 10/22/2024 08:28PM

We are continuing to apply the fix to all hosts in the fleet. Some hosts continue to see elevated API errors at this time.

IDENTIFIED 17 days ago - at 10/22/2024 07:25PM

We are currently in the process of rolling out a fix across our fleet.

IDENTIFIED 17 days ago - at 10/22/2024 06:19PM

We are continuing to work on a fix for this issue. Apps with autostart/autostop configured might also see an increased number of request errors.

IDENTIFIED 17 days ago - at 10/22/2024 06:06PM

We have identified the cause of an increase in API errors across the platform and are working on a fix.

Vendor Downtime? Keep Your Team Informed with an Internal Status Page

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3260 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime