Trusted by 1,000+ teams
Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
GPFS has been restarted on login-0 and the associated waiters have been cleared. However new waiters appeared overnight on login-1, and these have not cleared. A new status page will be raised for this.
We have an unkillable defunct process on login-0, and associated long GPFS waiters. A restart of GPFS on login-0 will be required to clear this situation. This is scheduled for 1200hrs tomorrow, Thursday 12th June. All processes on login-0 using the GPFS filesystems will likely die. Slurm jobs will remain unaffected.
Should the impact increase today we will bring the restart forward. Should the restart not clear the waiters, a full reboot of login-0 will be required.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6320 services available
Integrations with