During this week, we started experiencing intermittent API failures within our Hosted API platform. We started to witness a failure rate of about 5% of requests responding with a 500 status code, which typically means that our servers are crashing and restarting. Our Deployed customers were not affected by this outage.
We identified several issues that caused these outages. All of them were caused by different form configurations that implemented custom javascript logic which made an assumption of only being executed within the browser. While all of our executions are evaluated within a virtual machine sandbox on our servers, there were cases where submitting forms with malformed javascript would cause our virtual machine sandboxes to crash, which in turn, would cause an uncaught exception and cause the server to reboot.
Our hosted deployment contains many different instances behind a load balancer, so the exposure for the reboots was minimal (we logged a 5% api failure rate), but this was still way higher than the point it should have been causing us to implement emergency patches to mitigate these failures. We deployed a series of patches for these failures on August 23, 2023 13:52:15 which has dramatically reduced the failure rate.
We are still experiencing some failures (around 0.1%) and the fixes for these failures were patched on August 30, 2023 around 10:20am CST.
We will continue to monitor our API performance and quickly respond as we see new failures.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 5850 services available
Integrations with