Outage in Georgia Tech IT

Phoenix project storage outage, impacting login

Resolved Major
July 20, 2025 - Started 17 days ago - Lasted 1 day

Need to monitor Georgia Tech IT outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Georgia Tech IT, and never miss an outage again.
Start Free Trial

Outage Details



Summary: An outage of the metadata servers on Phoenix project storage (Lustre) is preventing access to that storage and may also prevent login by ssh, access to Phoenix OnDemand, and some Globus access on Phoenix. The PACE team is working to repair the system.

Details: During the afternoon of Saturday, July 19, one of the metadata servers for Phoenix Lustre project storage stopped responding. The failover to the other metadata server was not successful. The PACE team has not yet been able restore access and has engaged our storage vendor.

Impact: Files on the Phoenix Lustre project storage system are not accessible, and researchers may not be able to log in to Phoenix by ssh nor via the OnDemand web interface. Globus on Phoenix may time out, but researchers can type another path into the Path box to bypass the home directory and enter a subdirectory directly (e.g., typing ~/scratch will allow access to the scratch storage). Research groups that have already migrated to VAST project storage may not be impacted. VAST project, scratch, and CEDAR storage may still be reachable this way.

Thank you for your patience as we work to restore access to Phoenix project storage. Please contact us at pace-support@oit.gatech.edu with any questions.
Components affected
Georgia Tech IT Academic Services
Latest Updates ( sorted recent to last )
17 days ago - at 07/20/2025 02:12AM



Summary: An outage of the metadata servers on Phoenix project storage (Lustre) is preventing access to that storage and may also prevent login by ssh, access to Phoenix OnDemand, and some Globus access on Phoenix. The PACE team is working to repair the system.

Details: During the afternoon of Saturday, July 19, one of the metadata servers for Phoenix Lustre project storage stopped responding. The failover to the other metadata server was not successful. The PACE team has not yet been able restore access and has engaged our storage vendor.

Impact: Files on the Phoenix Lustre project storage system are not accessible, and researchers may not be able to log in to Phoenix by ssh nor via the OnDemand web interface. Globus on Phoenix may time out, but researchers can type another path into the Path box to bypass the home directory and enter a subdirectory directly (e.g., typing ~/scratch will allow access to the scratch storage). Research groups that have already migrated to VAST project storage may not be impacted. VAST project, scratch, and CEDAR storage may still be reachable this way.

Thank you for your patience as we work to restore access to Phoenix project storage. Please contact us at pace-support@oit.gatech.edu with any questions.

17 days ago - at 07/20/2025 02:17AM

Investigation continues

17 days ago - at 07/20/2025 03:52AM

The issue is with one of the metadata volumes, reporting file structure errors. We continue working with the vendor to find a solution.

17 days ago - at 07/20/2025 12:45PM

Offline file system check of the affected metadata volume is in progress.

17 days ago - at 07/20/2025 01:58PM

Working with vendor to verify status of the complete file system. fsck on metadata volumes continue.

17 days ago - at 07/20/2025 07:52PM

File system check completed. All Lustre servers restarted and high-availability restored. Metadata targets are now balanced. Running checks across the cluster to confirm file system is accessible from all compute nodes.

16 days ago - at 07/20/2025 10:26PM

Lustre file system (/storage/coda1) is up and available. Head nodes, Globus-Phoenix and Ondemand-Phoenix are able to access the directories. The scheduler remains paused until all components are checked tomorrow morning.

Be the First to Know When Vendors Go Down

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4400 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook