Use cases
Software Products E-commerce MSPs Schools Development & Marketing DevOps Agencies Help Desk
Company
Internet Status Blog Pricing Log in Get started free

Outage in Farm HPC cluster

Jobs getting stuck in completing

Resolved Major
May 01, 2026 - Started about 2 months ago - Lasted 4 days
Official incident page

Incident Report

Admins are investigating an issue where most jobs on Farm are getting stuck in the completing stage.
Components affected
Farm HPC cluster Slurm

Trusted by 1,000+ teams

The Status Page Aggregator with Early Outage Detection

Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.

IsDown status aggregator dashboard
Latest Updates ( sorted recent to last )
RESOLVED about 2 months ago - at 05/05/2026 10:54PM

Continued monitoring shows the workaround is still working, so this incident is resolved.

MONITORING about 2 months ago - at 05/05/2026 12:35AM

SchedMD identified the bug that was impacting Farm. The workaround they provided seems to have resolved the outage. Admins will continue to monitor.

INVESTIGATING about 2 months ago - at 05/02/2026 05:34PM

The issue is that slurmctld is not properly starting and completing jobs when under load. Admins have opened a Severity 1 ticket with SchedMD support and are waiting to hear back.

INVESTIGATING about 2 months ago - at 05/01/2026 10:54PM

Admins are investigating an issue where most jobs on Farm are getting stuck in the completing stage.

The Status Page Aggregator with Early Outage Detection

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 6320 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook