Trusted by 1,000+ teams
Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.
The compute nodes have not drained again overnight. We have identified some user jobs that failed to cancel cleanly as the original cause. This does not appear to be related to the current long waiters issue.
The compute nodes are stable for now. We are continuing to monitor, and analyse
The compute nodes have been reviewed, some errant Slurm processes have been killed. The nodes have been resumed in Slurm and are now available.
Root cause analysis continues
We are investigating what has caused the compute nodes to drain, and will reinstate them asap
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6320 services available
Integrations with