This incident has been resolved.
Farm's slurmdbd is having intermittent issues. If you see an error like below, it means the problem has occurred again, and we will restart slurmdbd to bring it back into service.
"""sacctmgr: error: _open_persist_conn: failed to open persistent connection to host:monitoring-ib:6819: Connection timed out
sacctmgr: error: Sending PersistInit msg: Connection timed out"""
We have a support case open with SchedMD and will update this issue as we learn more.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6020 services available
Integrations with