Global: Calico enabled GKE clusters’ pods may get stuck Terminating or Pending after upgrading to 1.22+
Resolved
Minor
September 15, 2022 - Started over 2 years ago
- Lasted 7 days
Need to monitor Google Cloud outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Google Cloud, and never miss an outage again.
Start Free Trial →
Outage Details
Summary: Global: Calico enabled GKE clusters’ pods may get stuck terminating after upgrading to 1.22+
Description: GKE clusters running versions 1.22 or later and that use Calico Network Policy might experience issues with terminating Pods under some conditions.
Our engineering team continues to investigate the issue and are qualifying a potential mitigation for release to the Rapid channel 1.24. After all the qualifications are done, we will expedite the backport of the fix to 1.22 as soon as possible.
We will provide an update by Friday, 2022-09-16 15:00 US/Pacific with current details.
We apologize to all who are affected by the disruption.
Diagnosis: The Calico CNI plugin will show the following error terminating Pods:
“Warning FailedKillPod 36m (x389 over 121m) kubelet error killing pod: failed to "KillPodSandbox" for "af9ab8f9-d6d6-4828-9b8c-a58441dd1f86" with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod "myclient-pod-6474c76996" network: error getting ClusterInformation: connection is unauthorized: Unauthorized"
Workaround: Affected customers may try the following:
1. Restart the kubelet and calico-node can help getting the pods unstuck.
2. Disable the Calico network policy. (workaround #1 is recommended, as this workaround is only viable if the customer does not have a strong need for Calico).
Latest Google Cloud outages