Trusted by 1,000+ teams
Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.
This incident has been resolved.
A further fix to the routing configuration within the eRI's high-performance network was implemented by our L2 network support at approx 1440hrs. Connectivity between the eRI compute and filesystem (GPFS) has since been stable and performant.
All test have completed successfully, Nix is now running as expected.
We will continue to monitor for any further degradation.
We continue to suffer packet loss on the network causing slow response and disruption to Nix. Our L3 support partner has been engaged again and we hope for further progress tonight.
Nix clients have just been restarted on all compute nodes and login-0, and Nix is working for the moment.
Network packet loss has occurred again at 0930 causing further disruptions and slow response. We are investigating.
A fix to routing configuration with the eRI's high-performance network was implemented by our technology partner at approx 1800hrs. Connectivity between the eRI compute and filesystem (GPFS) has since been stable and performant.
We recognise the impact to users from this issue was major and have upgraded the incident here as a result (rest assured we were treating it as such regardless). If you observe any further issues following on from the maintenance work this week, please reach out to support.
We are still experiencing network issues but we have our L3 support engaged and expect some progress overnight.
Apologies for the ongoing frustrations.
We have now identified an underlying network issue causing packet loss and retransmissions between the compute and storage clusters. This will result in periodic slow responses, and at worst, a login or compute node being expelled from the cluster, which results in a longer period of recovery (15 - 30 mins). We are working hard to identify the exact fault, and are engaging our third-party network experts.
We have identified a GPFS issue occurred at the time slowness was reported. This has recovered and for now Nix test times are back to normal. Investigation continues
We are currently investigating this issue.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6320 services available
Integrations with