Use cases
Software Products E-commerce MSPs Schools Development & Marketing DevOps Agencies Help Desk
Company
Internet Status Blog Pricing Log in Get started free

Outage in Georgia Tech IT

Phoenix project storage outage, impacting login

Resolved Major
July 19, 2025 - Started 9 months ago - Lasted 1 day

Incident Report



Summary: An outage of the metadata servers on Phoenix project storage (Lustre) is preventing access to that storage and may also prevent login by ssh, access to Phoenix OnDemand, and some Globus access on Phoenix. The PACE team is working to repair the system.

Details: During the afternoon of Saturday, July 19, one of the metadata servers for Phoenix Lustre project storage stopped responding. The failover to the other metadata server was not successful. The PACE team has not yet been able restore access and has engaged our storage vendor.

Impact: Files on the Phoenix Lustre project storage system are not accessible, and researchers may not be able to log in to Phoenix by ssh nor via the OnDemand web interface. Globus on Phoenix may time out, but researchers can type another path into the Path box to bypass the home directory and enter a subdirectory directly (e.g., typing ~/scratch will allow access to the scratch storage). Research groups that have already migrated to VAST project storage may not be impacted. VAST project, scratch, and CEDAR storage may still be reachable this way.

Thank you for your patience as we work to restore access to Phoenix project storage. Please contact us at pace-support@oit.gatech.edu with any questions.
Components affected
Georgia Tech IT Academic Services

Trusted by 1,000+ teams

The Status Page Aggregator with Early Outage Detection

Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.

Start Free Trial
  • No credit card
  • 14-day trial
  • 2-minute setup
IsDown status aggregator dashboard
Latest Updates ( sorted recent to last )
9 months ago - at 07/20/2025 02:12AM



Summary: An outage of the metadata servers on Phoenix project storage (Lustre) is preventing access to that storage and may also prevent login by ssh, access to Phoenix OnDemand, and some Globus access on Phoenix. The PACE team is working to repair the system.

Details: During the afternoon of Saturday, July 19, one of the metadata servers for Phoenix Lustre project storage stopped responding. The failover to the other metadata server was not successful. The PACE team has not yet been able restore access and has engaged our storage vendor.

Impact: Files on the Phoenix Lustre project storage system are not accessible, and researchers may not be able to log in to Phoenix by ssh nor via the OnDemand web interface. Globus on Phoenix may time out, but researchers can type another path into the Path box to bypass the home directory and enter a subdirectory directly (e.g., typing ~/scratch will allow access to the scratch storage). Research groups that have already migrated to VAST project storage may not be impacted. VAST project, scratch, and CEDAR storage may still be reachable this way.

Thank you for your patience as we work to restore access to Phoenix project storage. Please contact us at pace-support@oit.gatech.edu with any questions.

9 months ago - at 07/20/2025 02:17AM

Investigation continues

9 months ago - at 07/20/2025 03:52AM

The issue is with one of the metadata volumes, reporting file structure errors. We continue working with the vendor to find a solution.

9 months ago - at 07/20/2025 12:45PM

Offline file system check of the affected metadata volume is in progress.

9 months ago - at 07/20/2025 01:58PM

Working with vendor to verify status of the complete file system. fsck on metadata volumes continue.

9 months ago - at 07/20/2025 07:52PM

File system check completed. All Lustre servers restarted and high-availability restored. Metadata targets are now balanced. Running checks across the cluster to confirm file system is accessible from all compute nodes.

9 months ago - at 07/20/2025 10:26PM

Lustre file system (/storage/coda1) is up and available. Head nodes, Globus-Phoenix and Ondemand-Phoenix are able to access the directories. The scheduler remains paused until all components are checked tomorrow morning.

The Status Page Aggregator with Early Outage Detection

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 6320 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook