We determined that because of recently adding thousands(!) of tools, one of our production components required a longer than expected startup process, meaning its health check endpoint was unavailable before the timeout expired. As a result, orchestrator was constantly restarting nodes, leading to poor availability of this critical component. Going forward, we plan to move these tasks out of component startup so this doesn't happen again and we can continue to support even larger numbers of tools.
Our prod services are now recovering. We are continuing to monitor and have identified the root cause of the issue.
We are continuing to investigate this issue.
We are currently investigating an issue with our production deployments. We are attempting to rollback to a stable version. Investigation of the issue continues.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 5850 services available
Integrations with