Need to monitor Coveralls outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Coveralls, and never miss an outage again.
Start Free Trial
500 Internal Server Errors on Uploads
The recent 500 error surfacing during some coverage uploads as:
> ⚠️ Internal server error. Please contact Coveralls team.
has been resolved. A full postmortem will be published here soon. In the meantime, you can find more detail in the main tracking issue:
https://github.com/coverallsapp/coverage-reporter/issues/180
Summary
The root cause was ultimately infrastructure-related, not a regression in recent coverage-reporter releases. The previous workaround of pinning your coverage-reporter version is therefore not required.
We have decided to close this incident, which we intentionally kept open for over a week to track a series of 504 and 5xx issues with overlapping root causes. In hindsight, the broadened scope made updates less clear than we'd hoped. With today’s resolution and the mitigations applied throughout the week, the occurrence of 504 errors during uploads (POSTs) has been significantly reduced. Going forward, any new 504 errors should be considered unexpected, isolated events.
At the same time, we continue work on several instances of intermittent GET-related 504 errors affecting:
- Source File pages
- Repo pages
- Add Repos pages
Progress on those issues will be reported separately here:
https://github.com/lemurheavy/coveralls-public/issues/1757
Fix for unrelated 500 errors:
If you receive a `500` error with this error message format:
> ⚠️ Internal server error. Please contact Coveralls team.
Please know it is unrelated to the `504` errors being monitored in this open incident.
Those, intermittent `500` errors are caused by a regression in one of the latest coverage-reporter releases: `v0.6.16` or `v0.6.17`.
Workaround:
Pin your coverage-reporter-version to `v0.6.15` in your integration config.
For thorough instructions, see this public issue:
https://github.com/coverallsapp/coverage-reporter/issues/180
We’re investigating the root cause and will post updates once a fix is released.
Mitigated – Monitoring
All systems operational.
Recent mitigations, including fleet expansion and autoscaling, have reduced 504 timeout reports significantly. The remaining reports are infrequent and occur mostly during overnight and weekend hours (PDT).
We are continuing to monitor closely and are working on a multi-part solution to eliminate all known causes. Until then, we are keeping this incident open in Monitoring. We will close it once 504 errors have returned to being unexpected, isolated events.
Mitigation in place.
All systems operational.
This morning we deployed additional capacity and autoscaling measures to reduce 504 errors on coverage report uploads:
- Doubled our web server fleet (on top of the prior doubling when this issue began).
- Enabled autoscaling at the web layer, allowing the fleet to double again automatically when NGINX response times exceed thresholds.
The underlying trigger remains rare surges of upload requests from outlier repositories (750–1250 uploads per build). While we have paused processing for these repos, our HTTP servers must still handle the incoming requests until they stop.
Timezone coverage:
As a small team based in Los Angeles (PDT), our ability to respond in real time is most limited overnight (10p–6a PDT). Unfortunately, the primary outlier repos are in APAC, making this the window of highest risk. With these changes, we hope to reduce the occurrence of upload 504s during this window.
We will monitor results closely and continue tuning autoscaling thresholds. Please let us know if you continue to see 504 errors on uploads.
All systems operational.
Earlier today (6:45–7:45 AM PDT), we received elevated reports of 504 timeout errors. We have not been able to reproduce the issue since, but if you are still experiencing errors, please contact us at support@coveralls.io.
The affected areas may include:
- Coverage Report Uploads (/api/v1/jobs)
- Add Repos Page
- Repo Page
- Source File Page
Fixes for the Add Repos, Repo, and Source File pages are scheduled to be deployed by end of day (PDT).
All systems operational.
We have released one of two parts of a near-term solution into production resolving a minority subset of 504 errors. We are still working on releasing part two into production.
Subscribe for updates at this status page, or follow this public tracking issue for updates:
https://github.com/lemurheavy/coveralls-public/issues/1757
All systems operational.
Continuing to keep this open until we have released our short-term fix into production.
Subscribe for updates at this status page, or follow this public tracking issue for updates:
https://github.com/lemurheavy/coveralls-public/issues/1757
We are still working on a near-term fix. We will post here, and here when complete:
https://github.com/lemurheavy/coveralls-public/issues/1757
We’re currently seeing elevated reports of 504 Timeout errors affecting some customers on a subset of Coveralls pages, including:
- Source File pages
- Repo pages
- Add Repos pages
All systems and pages are generally operational; a subset of customers are experiencing these errors, sometimes intermittently.
There is a public tracking issue for the Source File timeout errors here:
https://github.com/lemurheavy/coveralls-public/issues/1757
Fix in progress:
We’re implementing a short-term fix over the next 24–48 hours, which should eliminate the timeouts.
A longer-term fix is also planned, but will roll out over several weeks, but early phases of that implementation should also reduce the request times that were originally triggering the 504 timeouts.
What you can do:
If you're currently affected, we recommend following updates here, and subscribing to the public issue: https://github.com/lemurheavy/coveralls-public/issues/1757
If your issue pattern differs from above, or you suspect a different root cause, reach out to support@coveralls.io, and we'll verify for you.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 4522 services available
Integrations with