IONOS Cloud experienced performance degradation in their FRA data center affecting a subset of Virtual Machines and Kubernetes Clusters for 13.1 hours. The issue was caused by increased CPU steal time and problematic CPU core affinity settings on affected hosts. The incident was resolved through multiple configuration update rollouts that improved CPU performance across the affected fleet.
Trusted by 1,000+ teams
Stop finding out about outages from your users. Monitor 6,320+ cloud services and get alerted the second something breaks.
We have successfully completed the rollout to all remaining hosts and are closing this incident. A Root Cause Analysis is currently being conducted by the Compute Team and will be shared here upon completion.
Our Compute Team has confirmed that the fix has been rolled out to the majority of affected hosts. We are currently finishing the rollout and will provide an update once the remaining hosts on affected clusters are covered
The second configuration update rollout is currently in progress, and we have confirmed initial improvements related to CPU performance. Due to the size of the fleet, we expect the rollout to take some time to complete. Throughout the process, customers will see performance gains as soon as the specific hosts supporting their workloads have been updated. We will provide a final update once the rollout is finished.
The adjustment was rolled out. Our Compute team is seeing dropping CPU steal time. We are monitoring the situation. Our Tech Teams are preparing another rollout that should improve the performance further.
Our compute team has successfully tested the proposed fix for the CPU core affinity and is preparing a rollout. We will monitor the results.
Our compute team has found another factor negatively impacting CPU performance for affected VMs. We are currently testing a potential transparent resolution for the problematic CPU affinity setting.
We have identified an increase in CPU steal time on affected hosts. Our Compute team has identified a likely culprit and is currently testing a potential mitigation to ensure its effectiveness before a rollout.
We are currently investigating performance degradation affecting compute components in our FRA DC. This issue is impacting a subset of Virtual Machines (VMs) and Kubernetes Clusters. We will provide further updates as our investigation progresses.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 6320 services available
Integrations with