Trusted by 1,000+ teams
Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.
Due to a security emergency, PACE is required to shut down ALL servers in Phoenix, Firebird, and ICE to apply mitigations.
Starting at 3:00pm ET, users will be unable to login and all running jobs will be interrupted. ALL CANCELLED JOBS WILL BE REFUNDED on Phoenix and Firebird.
We realize this will impact upcoming conference submissions and course deadlines, but we are required to carry out this work immediately to ensure the security of Georgia Tech systems.
We will work rapidly to update servers and incrementally release the clusters as soon as possible. We are aiming to restore access to all systems (Phoenix, ICE, Firebird) tomorrow, but updates will be shared as they are available.
We appreciate your patience as we prioritize the security of Georgia Tech computing.
The PACE Team
Remediation in progress.
Head node and compute note rebuild in progress for all clusters.
We have completed mitigations on all Globus nodes, which are now open for access to data. Instructions for using Globus are available here.
JOBS WILL NOT RUN at this time, but you now have access to any data on PACE systems.
We are working to apply fixes across our nearly 2,000 compute nodes and will update as compute capability on each cluster is restored and verified.
Best,
The PACE Team
Service to all ICE resources has been restored. Open OnDemand, all CPUs/GPUs, and other data services are available for your jobs. Phoenix and Firebird are still under maintenance.
Interrupted jobs will need to be resubmitted. Jobs that had not started running prior to the downtime should still be in Slurm’s queue.
We recognize the challenge this may have presented to your end-of-semester coursework. All instructors and TAs were notified yesterday of this emergency downtime and the impact it had to running jobs.
Thank you for your understanding as we worked to mitigate this security concern.
The Phoenix cluster has been released for service – all node classes are back online and paused jobs have resumed. Open OnDemand, all CPUs/GPUs, and other data services are available for your jobs. The ICE cluster is also available. The Firebird cluster is still under maintenance.
We recognize the challenge this may have presented to your research deadlines and appreciate your understanding as we worked to mitigate this security incident.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6320 services available
Integrations with