Outage in AgResearch eRI

compute-[1-4] nodes affected by a GPFS long waiter

Resolved Minor

January 12, 2026 - Started 4 months ago - Lasted 6 days
Official incident page

Incident Report

Compute-[1-4] are all being affected by a long GPFS waiter on the storage cluster. However Slurm jobs continue to run there so we are attempting to resolve the issue without killing all the jobs. We need to restart GPFS on those nodes, so we are currently draining compute-1 and -4 as a first step. If the situation deteriorates further we may be forced to kill all jobs on those nodes so we can restart GPFS on all four nodes.

Trusted by 1,000+ teams

Never miss outages in third-party dependencies

Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.

Start Free Trial Learn More

Latest Updates ( sorted recent to last )

RESOLVED 4 months ago - at 01/18/2026 08:05PM

The compute-1 GPFS restart has now been completed and the associated waiter has been cleared. All nodes are now available to Slurm

MONITORING 4 months ago - at 01/14/2026 07:53PM

Compute-4 has now been restarted, and the storage side deadlock has now been cleared. Compute-1 has a different waiter problem so is still draining until we can restart GPFS there. We will continue to manage and communicate that status via this status page. All other compute nodes are now available

IDENTIFIED 4 months ago - at 01/13/2026 01:27AM

The deadlock on compute-3 has now been cleared, the node is available in Slurm

IDENTIFIED 4 months ago - at 01/13/2026 01:18AM

Compute-3 is now stuck in a completing state so we are going to attempt a restart of GPFS there. Any jobs still running there will unfortunately be killed

IDENTIFIED 4 months ago - at 01/12/2026 10:01PM

Latest AgResearch eRI outages

OnDemand is down - 5 days ago

Long waiter on compute-2 - 25 days ago

File lock issues are causing issues wth accessing the S: drive - 25 days ago

Slow Access to Data on S: Datasets on ERI for several Users - 3 months ago

Windows clients hanging when accessing dataseta - 4 months ago

Never miss outages in third-party dependencies

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 6320 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook