Need to monitor Georgia Tech IT outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Georgia Tech IT, and never miss an outage again.
Start Free Trial
The Data Center Operations and the Databank team are investigating a sudden failure in the cooling systems of the Research hall causing inability to cool. The teams are investigating the issue at this time, more information will be provided as it comes in.
The Data Center Operations and the Databank team are investigating a sudden failure in the cooling systems of the Research hall causing inability to cool. The teams are investigating the issue at this time, more information will be provided as it comes in.
The Data Center Operations and the Databank team are investigating a sudden failure in the cooling systems of the Research hall causing inability to cool. The teams are investigating the issue at this time, more information will be provided as it comes in.
The Data Center Operations and the Databank team are investigating a sudden failure in the cooling systems of the Research hall causing inability to cool. The teams are investigating the issue at this time, more information will be provided as it comes in.
WHAT’S HAPPENING?
Due to an emergency with a cooling system at the Research Hall, all PACE clusters had to be shut down on the morning of Sunday, September 8, 2024.
Access to login nodes and filesystems (via Globus, OpenOndemand or director connection to login nodes) is still available.
WHEN IS IT HAPPENING?
Sunday, September 8, 2024, starting at 7.30 AM.EDT.
WHY IS IT HAPPENING?
PACE have been notified by IOC that the temperatures in the CODA building Research Hall are rising due to a failure of a water pump in the cooling system. Emergency shutdown had to be executed in order to protect equipment. The physical infrastructure provider for our datacenter is working on evaluating the situation.
WHO IS AFFECTED?
All PACE Users. Any running jobs on ALL PACE Clusters (Phoenix, Hive, Firebird, ICE, and Buzzard) had to be stopped at 7.30 AM. For Phoenix and Firebird, we will provide refunds for interrupted jobs on paid accounts only by default. Please let us know if this causes a significant loss of funds resulting in inability to continue work on your free-tier Phoenix allocation!
WHAT DO YOU NEED TO DO?
Wait patiently; we will communicate as soon as the clusters are ready to resume work.
WHO SHOULD YOU CONTACT FOR QUESTIONS?
For any questions, please contact PACE at pace-support@oit.gatech.edu.
The Databank team have identified the problem and estimating a time for repairs.
Due to a failure with the Data Center cooling system for the the Research Hall, all PACE cluster had to be shut down on the morning of Sunday, September 8, 2024. The Databank team have identified the problem and are working on the repairs. More update will be provided as we get an estimated time for repairs.
Due to a failure with the Data Center cooling system for the the Research Hall, all PACE cluster had to be shut down on the morning of Sunday, September 8, 2024. The Databank team have identified the problem and are working on the repairs. More update will be provided as we get an estimated time for repairs.
Due to an emergency with a cooling system at the Research Hall, all PACE clusters have been shut down since the morning of Sunday, September 8, 2024. While a time frame for resolution is currently unknown, we are actively working with the vendor, Data Bank, to resolve the issue and restore service to the data center as soon as possible. We will provide updates as they are available.
Due to an emergency with a cooling system at the Research Hall, all PACE clusters have been shut down since the morning of Sunday, September 8, 2024. The datacenter provider, Data Bank, has identified an alternate replacement part which has been brought onsite and is in the process of being deployed/tested. At this time, we estimate that Data Bank will have restored cooling
to the Research Hall by Tuesday, September 10, 2024, by close of business day. At which point, PACE will begin powering up, testing infrastructure and begin the process to bring services back online. We plan to provide additional updates on the restoration of services by Wednesday, September 11, 2024, evening.
During the process of restoring cooling, our data center hosting provider, DataBank, identified additional critical parts that were damaged and had to be replaced. Cooling was restored at 8:43 pm on Tuesday, September 10, 2024, and monitored throughout the night. DataBank gave an all-clear to PACE at 6:00 am on Wednesday, September 11, 2024, to bring systems back online. PACE has started powering up, testing infrastructure, and bringing clusters back online. Updates will be provided throughout the day as services are progressively restored.
ICE has returned to production, and compute jobs can run. Work continues on the research clusters.
Both nodes with AMD MI210 GPUs remain under repair after failing last week. All other ICE node architectures are available.
Hive and Firebird have returned to production, and compute jobs have resumed. Work continues on Phoenix and Buzzard.
Buzzard has returned to production, and compute jobs have resumed. Work continues on Phoenix.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 3242 services available
Integrations with
How much time you'll save your team, by having the outages information close to them?
14-day free trial · No credit card required · Cancel anytime