Need to monitor Georgia Tech IT outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Georgia Tech IT, and never miss an outage again.
Start Free Trial
Summary: An outage of the metadata servers on Phoenix project storage (Lustre) is preventing access to that storage and may also prevent login by ssh, access to Phoenix OnDemand, and some Globus access on Phoenix. The PACE team is working to repair the system.
Details: During the afternoon of Saturday, July 19, one of the metadata servers for Phoenix Lustre project storage stopped responding. The failover to the other metadata server was not successful. The PACE team has not yet been able restore access and has engaged our storage vendor.
Impact: Files on the Phoenix Lustre project storage system are not accessible, and researchers may not be able to log in to Phoenix by ssh nor via the OnDemand web interface. Globus on Phoenix may time out, but researchers can type another path into the Path box to bypass the home directory and enter a subdirectory directly (e.g., typing ~/scratch will allow access to the scratch storage). Research groups that have already migrated to VAST project storage may not be impacted. VAST project, scratch, and CEDAR storage may still be reachable this way.
Thank you for your patience as we work to restore access to Phoenix project storage. Please contact us at pace-support@oit.gatech.edu with any questions.
Investigation continues
The issue is with one of the metadata volumes, reporting file structure errors. We continue working with the vendor to find a solution.
Offline file system check of the affected metadata volume is in progress.
Working with vendor to verify status of the complete file system. fsck on metadata volumes continue.
File system check completed. All Lustre servers restarted and high-availability restored. Metadata targets are now balanced. Running checks across the cluster to confirm file system is accessible from all compute nodes.
Lustre file system (/storage/coda1) is up and available. Head nodes, Globus-Phoenix and Ondemand-Phoenix are able to access the directories. The scheduler remains paused until all components are checked tomorrow morning.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 4400 services available
Integrations with