Pricing

Outage in Georgia Tech IT

Data center cooling issues

Resolved Minor

April 02, 2025 - Started about 1 month ago - Lasted 2 days

Need to monitor Georgia Tech IT outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Georgia Tech IT, and never miss an outage again.
Start Free Trial

Outage Details

A cooling controller failed at the data center. Shutting down PACE clusters.

Components affected

Georgia Tech IT Academic Services

Latest Updates ( sorted recent to last )

about 1 month ago - at 04/02/2025 09:08PM

A cooling controller failed at the data center. Shutting down PACE clusters.

about 1 month ago - at 04/02/2025 09:12PM

All Hive nodes are powered off. All jobs failed.
All Buzzard nodes are powered off. All jobs failed (though presumably requeued).
All new jobs on Phoenix are held.
All idle nodes on Phoenix are being turned off.
All Firebird nodes are powered off. All jobs failed.

about 1 month ago - at 04/02/2025 09:25PM

The controller for the system providing cooling to nodes in the Coda Research Hall has failed. To avoid damage, PACE has urgently shut down many compute nodes to reduce heat.

about 1 month ago - at 04/02/2025 09:51PM

Due to continued high temperatures, all Phoenix compute nodes have been turned off, and all running jobs were cancelled. Impacted jobs will be refunded at the end of April.

about 1 month ago - at 04/02/2025 10:25PM

Water pump controller failed, affecting the cooling of the research hall. Support vendor has been engaged and is assessing the situation.

about 1 month ago - at 04/03/2025 01:47AM

It has been determined that our water pump controller will need to be replaced, and we are currently coordinating with the support vendor on this replacement process.

about 1 month ago - at 04/03/2025 01:56PM

Our vendors are working to restore cooling capabilities to the datacenter by fully replacing the cooling system controller and expect to have the work completed by 7:00pm ET.

We hope to return all systems to service by tomorrow (Friday) evening, provided that all repairs to the cooling system are complete and after testing for stability after the shutdown. Clusters will be released as testing is completed for each system.

about 1 month ago - at 04/03/2025 03:07PM

Some compute nodes on ICE were accidentally powered off last night, which may have impacted some running jobs. We have restored a partial selection of those nodes to service so that all hardware types are available.
There was a brief pause in the scheduler this morning from 9:17am to 9:41am, which may have prevented jobs from starting during that time. Most ICE compute nodes are currently available for course usage.

about 1 month ago - at 04/04/2025 02:26AM

The controller for the system providing cooling to nodes in the Coda Research Hall has been restored and we have returned to the HTCP lineup and are in normal operation.

about 1 month ago - at 04/04/2025 12:08PM

The clusters are being powered up and tested. They will be returned to service as soon as they are ready. Updating the status back to "service disruption".

about 1 month ago - at 04/04/2025 01:12PM

ICE cluster released for user workloads.

about 1 month ago - at 04/04/2025 02:33PM

Hive cluster released for user workloads.

about 1 month ago - at 04/04/2025 02:59PM

Firebird cluster released for user workloads.

about 1 month ago - at 04/04/2025 03:36PM

Phoenix and Buzzard clusters released for user workloads.

Latest Georgia Tech IT outages

GlobalProtect Clientless VPN - Degraded Performance - 2 days ago

Campus Network Connectivity Issues - 9 days ago

GRS and Job Family issue - 14 days ago

GlobalProtect VPN - User/Group-based Policy Issues - about 1 month ago

Degraded Performance on Phoenix Project storage - about 1 month ago

All Your Service Status Pages in One Dashboard

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 4000 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook