Outage in AgResearch eRI

Several compute nodes down

Resolved Minor

December 15, 2025 - Started 5 months ago - Lasted 1 day
Official incident page

Incident Report

We are currently investigating this issue.

Trusted by 1,000+ teams

Never miss outages in third-party dependencies

Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.

Start Free Trial Learn More

Latest Updates ( sorted recent to last )

RESOLVED 5 months ago - at 12/16/2025 11:16PM

This incident has been resolved.

MONITORING 5 months ago - at 12/16/2025 05:44AM

A fix has been implemented and we are monitoring the results.

IDENTIFIED 5 months ago - at 12/16/2025 05:07AM

The workaround is in place with most nodes back online and processing Slurm jobs again. Now dropping the impact as we bring more nodes back online.

IDENTIFIED 5 months ago - at 12/16/2025 02:28AM

We've uncovered the underlying problem and are now attempting to implement a workaround until it can be fully resolved.

IDENTIFIED 5 months ago - at 12/16/2025 12:13AM

More compute nodes have dropped off the network now so we are upgrading this to a major outage for the Slurm cluster. We're narrowing down the cause but may not be able to restore service until overseas L3 support engineers come online this evening. Apologies for the disruption!

IDENTIFIED 5 months ago - at 12/15/2025 09:25PM

Three compute nodes and both huge memory nodes are now down exhibiting the same network issue. We are still working to determine the cause.

IDENTIFIED 5 months ago - at 12/15/2025 08:32PM

It appears that around 6:45am this morning a network event has occurred and disconnected a handful of compute nodes from the cluster. This does not seem to be link to the overnight border networking maintenance, though we are still attempting to restore connectivity and will focus on RCA later.

INVESTIGATING 5 months ago - at 12/15/2025 07:44PM

We are currently investigating this issue.

Latest AgResearch eRI outages

OnDemand is down - 5 days ago

Long waiter on compute-2 - 25 days ago

File lock issues are causing issues wth accessing the S: drive - 25 days ago

Slow Access to Data on S: Datasets on ERI for several Users - 3 months ago

Windows clients hanging when accessing dataseta - 4 months ago

Never miss outages in third-party dependencies

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 6320 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook