Outage in Coveralls

Elevated 504 Timeout Errors

Resolved Minor
September 03, 2025 - Started about 1 month ago - Lasted 21 days
Official incident page

Need to monitor Coveralls outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Coveralls, and never miss an outage again.
Start Free Trial

Outage Details

We’re currently seeing elevated reports of 504 Timeout errors affecting some customers on a subset of Coveralls pages, including: - Source File pages - Repo pages - Add Repos pages All systems and pages are generally operational; a subset of customers are experiencing these errors, sometimes intermittently. There is a public tracking issue for the Source File timeout errors here: https://github.com/lemurheavy/coveralls-public/issues/1757 Fix in progress: We’re implementing a short-term fix over the next 24–48 hours, which should eliminate the timeouts. A longer-term fix is also planned, but will roll out over several weeks, but early phases of that implementation should also reduce the request times that were originally triggering the 504 timeouts. What you can do: If you're currently affected, we recommend following updates here, and subscribing to the public issue: https://github.com/lemurheavy/coveralls-public/issues/1757 If your issue pattern differs from above, or you suspect a different root cause, reach out to support@coveralls.io, and we'll verify for you.
Components affected
Coveralls .io Web Coveralls .io API
Latest Updates ( sorted recent to last )
RESOLVED 13 days ago - at 09/25/2025 12:53AM

500 Internal Server Errors on Uploads

The recent 500 error surfacing during some coverage uploads as:
> ⚠️ Internal server error. Please contact Coveralls team.

has been resolved. A full postmortem will be published here soon. In the meantime, you can find more detail in the main tracking issue:
https://github.com/coverallsapp/coverage-reporter/issues/180

Summary
The root cause was ultimately infrastructure-related, not a regression in recent coverage-reporter releases. The previous workaround of pinning your coverage-reporter version is therefore not required.

We have decided to close this incident, which we intentionally kept open for over a week to track a series of 504 and 5xx issues with overlapping root causes. In hindsight, the broadened scope made updates less clear than we'd hoped. With today’s resolution and the mitigations applied throughout the week, the occurrence of 504 errors during uploads (POSTs) has been significantly reduced. Going forward, any new 504 errors should be considered unexpected, isolated events.

At the same time, we continue work on several instances of intermittent GET-related 504 errors affecting:

- Source File pages
- Repo pages
- Add Repos pages

Progress on those issues will be reported separately here:
https://github.com/lemurheavy/coveralls-public/issues/1757

MONITORING 15 days ago - at 09/23/2025 07:14PM

Fix for unrelated 500 errors:

If you receive a `500` error with this error message format:
> ⚠️ Internal server error. Please contact Coveralls team.

Please know it is unrelated to the `504` errors being monitored in this open incident.

Those, intermittent `500` errors are caused by a regression in one of the latest coverage-reporter releases: `v0.6.16` or `v0.6.17`.

Workaround:
Pin your coverage-reporter-version to `v0.6.15` in your integration config.

For thorough instructions, see this public issue:
https://github.com/coverallsapp/coverage-reporter/issues/180

We’re investigating the root cause and will post updates once a fix is released.

MONITORING 16 days ago - at 09/22/2025 04:43PM

Mitigated – Monitoring

All systems operational.

Recent mitigations, including fleet expansion and autoscaling, have reduced 504 timeout reports significantly. The remaining reports are infrequent and occur mostly during overnight and weekend hours (PDT).

We are continuing to monitor closely and are working on a multi-part solution to eliminate all known causes. Until then, we are keeping this incident open in Monitoring. We will close it once 504 errors have returned to being unexpected, isolated events.

MONITORING 23 days ago - at 09/15/2025 06:57PM

Mitigation in place.

All systems operational.

This morning we deployed additional capacity and autoscaling measures to reduce 504 errors on coverage report uploads:

- Doubled our web server fleet (on top of the prior doubling when this issue began).
- Enabled autoscaling at the web layer, allowing the fleet to double again automatically when NGINX response times exceed thresholds.

The underlying trigger remains rare surges of upload requests from outlier repositories (750–1250 uploads per build). While we have paused processing for these repos, our HTTP servers must still handle the incoming requests until they stop.

Timezone coverage:
As a small team based in Los Angeles (PDT), our ability to respond in real time is most limited overnight (10p–6a PDT). Unfortunately, the primary outlier repos are in APAC, making this the window of highest risk. With these changes, we hope to reduce the occurrence of upload 504s during this window.

We will monitor results closely and continue tuning autoscaling thresholds. Please let us know if you continue to see 504 errors on uploads.

MONITORING 26 days ago - at 09/12/2025 03:44PM

All systems operational.

Earlier today (6:45–7:45 AM PDT), we received elevated reports of 504 timeout errors. We have not been able to reproduce the issue since, but if you are still experiencing errors, please contact us at support@coveralls.io.

The affected areas may include:

- Coverage Report Uploads (/api/v1/jobs)
- Add Repos Page
- Repo Page
- Source File Page

Fixes for the Add Repos, Repo, and Source File pages are scheduled to be deployed by end of day (PDT).

MONITORING 29 days ago - at 09/09/2025 03:24PM

All systems operational.

We have released one of two parts of a near-term solution into production resolving a minority subset of 504 errors. We are still working on releasing part two into production.

Subscribe for updates at this status page, or follow this public tracking issue for updates:
https://github.com/lemurheavy/coveralls-public/issues/1757

MONITORING 30 days ago - at 09/08/2025 06:57PM

All systems operational.

Continuing to keep this open until we have released our short-term fix into production.

Subscribe for updates at this status page, or follow this public tracking issue for updates:
https://github.com/lemurheavy/coveralls-public/issues/1757

MONITORING about 1 month ago - at 09/05/2025 04:05PM

We are still working on a near-term fix. We will post here, and here when complete:
https://github.com/lemurheavy/coveralls-public/issues/1757

MONITORING about 1 month ago - at 09/03/2025 06:00PM

We’re currently seeing elevated reports of 504 Timeout errors affecting some customers on a subset of Coveralls pages, including:

- Source File pages
- Repo pages
- Add Repos pages

All systems and pages are generally operational; a subset of customers are experiencing these errors, sometimes intermittently.

There is a public tracking issue for the Source File timeout errors here:
https://github.com/lemurheavy/coveralls-public/issues/1757

Fix in progress:
We’re implementing a short-term fix over the next 24–48 hours, which should eliminate the timeouts.

A longer-term fix is also planned, but will roll out over several weeks, but early phases of that implementation should also reduce the request times that were originally triggering the 504 timeouts.

What you can do:
If you're currently affected, we recommend following updates here, and subscribing to the public issue: https://github.com/lemurheavy/coveralls-public/issues/1757

If your issue pattern differs from above, or you suspect a different root cause, reach out to support@coveralls.io, and we'll verify for you.

Be the First to Know When Vendors Go Down

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4522 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook