Trusted by 1,000+ teams
Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.
Continued monitoring shows the workaround is still working, so this incident is resolved.
SchedMD identified the bug that was impacting Farm. The workaround they provided seems to have resolved the outage. Admins will continue to monitor.
The issue is that slurmctld is not properly starting and completing jobs when under load. Admins have opened a Severity 1 ticket with SchedMD support and are waiting to hear back.
Admins are investigating an issue where most jobs on Farm are getting stuck in the completing stage.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6320 services available
Integrations with