Outage in Georgia Tech IT

Databank Power/Cooling Issue

Resolved Minor
October 01, 2024 - Started 3 months ago - Lasted about 5 hours

Need to monitor Georgia Tech IT outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Georgia Tech IT, and never miss an outage again.
Start Free Trial

Outage Details

At around 0830 on Tuesday ATL-1 had power blip. It caused the GT High Temp Chiller to fail. The chiller has failed to restart. The DCI team is trying to determine why the chiller will not restart. This is affecting the research hall and PACE operations.
Components affected
Georgia Tech IT Academic Services
Latest Updates ( sorted recent to last )
3 months ago - at 10/01/2024 01:00PM

At around 0830 on Tuesday ATL-1 had power blip. It caused the GT High Temp Chiller to fail. The chiller has failed to restart. The DCI team is trying to determine why the chiller will not restart. This is affecting the research hall and PACE operations.

3 months ago - at 10/01/2024 01:02PM

A cooling failure has impacted the Coda datacenter. Investigation as to the cause is underway.
To minimize impact and begin mitigating rising temperatures, PACE has initiated a partial shutdown. The Phoenix and Hive schedulers have been paused, and all idle compute nodes on Phoenix and Hive have been powered off. Running jobs are not currently impacted.
We will continue monitoring the situation and determine if additional measures are needed. ICE, Firebird, and Buzzard remain in production at this time.

3 months ago - at 10/01/2024 03:35PM

Cooling in the datacenter has been restored. The PACE team is now working to resume access to start new jobs on nodes that were not powered off, then to power on the remaining nodes and verify their functionality before restoring Phoenix and Hive to full capacity.

3 months ago - at 10/01/2024 04:06PM

The compute nodes that were not powered down have been released for new jobs, so queued jobs are now starting on these nodes (approximately 50% of Phoenix and 80% of Hive). PACE is now working to verify functionality of the nodes that were turned off this morning before returning them to service.

Need to know when vendors go down? You’re in the right place

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3278 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime