Linode experienced a critical service issue affecting NVIDIA RTX 4000 Ada GPU nodes across multiple regions (Osaka, Seattle, and Chicago), causing unrecoverable error states that led to failures in Vulkan initialization and GPU-accelerated workloads. The incident also impacted some LKE clusters in the Osaka region with Control Plane connectivity issues, resulting in timed-out API requests and errors. The issue was resolved after 16.1 hours, with the root cause identified as a regression in the underlying host hypervisor or GPU firmware.
We haven’t observed any additional issues with the service, and will now consider this incident resolved. If you continue to experience problems, please open a Support ticket for assistance.
At this time we have been able to correct the issues affecting the service. We will be monitoring this to ensure that it remains stable. If you continue to experience problems, please open a Support ticket for assistance.
Our team has identified the issue affecting the service. We are working quickly to implement a fix, and we will provide an update as soon as the solution is in place.
We are continuing to investigate and will provide the next update as progress is made.
We are aware of a recurrence of this issue across multiple regions. We are continuing to investigate and will provide the next update as progress is made.
Our team has identified the issue affecting the service and implemented a fix. We will be monitoring this to ensure that it remains stable. If you continue to experience problems, please open a Support ticket for assistance.
We are continuing to investigate the issue. We will provide the next update as progress is made.
Our subject matter experts are actively investigating the issue. We will provide the next update as progress is made.
We are investigating a critical service issue affecting NVIDIA RTX 4000 Ada GPU nodes across multiple regions, including Osaka (osa1), Seattle (sea1), and Chicago (ord1).
Affected GPU nodes may report an unrecoverable error state leading to failures in Vulkan initialization and GPU-accelerated workloads. Additionally, some LKE clusters in the Osaka region are currently experiencing Control Plane connectivity issues, resulting in timed-out API requests and errors.
Our engineering teams are currently investigating the root cause, focusing on a potential regression in the underlying host hypervisor or GPU firmware. We will provide more information as it becomes available
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6020 services available
Integrations with