LinuCエヴァンジェリストの鯨井貴博@opensourcetechです。
はじめに
KubernetesでPodが"Pending"のまま起動しない原因に関するメモです。
事象の発生
以下のように、Deployment(Pod)を含むをapplyします。
kubeuser@kubemaster1:~$ kubectl apply -f nginx.yaml deployment.apps/nginx created kubeuser@kubemaster1:~$ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-6c67f5ff6f-nrdvt 0/1 Pending 0 7s nginx-6c67f5ff6f-t4sg8 0/1 Pending 0 7s kubeuser@kubemaster1:~$ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6c67f5ff6f-nrdvt 0/1 Pending 0 13s <none> <none> <none> <none> nginx-6c67f5ff6f-t4sg8 0/1 Pending 0 13s <none> <none> <none> <none> . . . kubeuser@kubemaster1:~$ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6c67f5ff6f-nrdvt 0/1 Pending 0 77s <none> <none> <none> <none> nginx-6c67f5ff6f-t4sg8 0/1 Pending 0 77s <none> <none> <none> <none>
STATUSが"Pending"のまな起動してくる気配がありません。
Podが起動しない原因
Podが起動しない原因は、Worker Nodeにあります。
kubeuser@kubemaster1:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION kubemaster1 Ready master 6d1h v1.18.0 kubemaster2 Ready master 6d1h v1.18.0 kubemaster3 Ready master 6d v1.18.0 kubeworker Ready <none> 6d1h v1.18.0 kubeworker2 Ready <none> 6d v1.18.0
STATUS "Ready"となっており特に異常はないようですが、
実は、Worker Nodeではkubeletが停止しています。
kubernetesのクラスターでは、以下のようにMaster NodeのAPI-ServerとWorker Nodeのkubeletが通信をしていますが、kubeletが停止しているとこれが出来ずPodの配置(スケジューリング)が出来なくなることが原因です。
https://kubernetes.io/ja/docs/concepts/overview/components/
Worker Node(1台目)。
kubeuser@kubeworker:~$ sudo systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor prese> Drop-In: /etc/systemd/system/kubelet.service.d mq10-kubeadm.conf Active: inactive (dead) since Tue 2021-12-07 13:12:25 UTC; 3s ago Docs: https://kubernetes.io/docs/home/ Process: 19532 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET> Main PID: 19532 (code=exited, status=0/SUCCESS) Dec 07 10:52:30 kubeworker kubelet[19532]: E1207 10:52:30.866416 19532 remote> Dec 07 11:36:48 kubeworker kubelet[19532]: E1207 11:36:48.594091 19532 contro> Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.696722 19532 contro> Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.875900 19532 contro> Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.876512 19532 contro> Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.876845 19532 contro> Dec 07 11:36:56 kubeworker kubelet[19532]: I1207 11:36:55.877071 19532 contro> Dec 07 13:12:25 kubeworker systemd[1]: Stopping kubelet: The Kubernetes Node Ag> Dec 07 13:12:25 kubeworker systemd[1]: kubelet.service: Succeeded. Dec 07 13:12:25 kubeworker systemd[1]: Stopped kubelet: The Kubernetes Node Age>
Worker Node(2台目)。
kubeuser@kubeworker2:~$ sudo systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor prese> Drop-In: /etc/systemd/system/kubelet.service.d mq10-kubeadm.conf Active: inactive (dead) since Tue 2021-12-07 13:12:56 UTC; 1s ago Docs: https://kubernetes.io/docs/home/ Process: 661 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_C> Main PID: 661 (code=exited, status=0/SUCCESS) Dec 07 13:03:15 kubeworker2 kubelet[3396]: 2021-12-07 13:03:15.293 [INFO][3410]> Dec 07 13:03:15 kubeworker2 kubelet[3396]: time="2021-12-07T13:03:15Z" level=in> Dec 07 13:03:15 kubeworker2 kubelet[3396]: 2021-12-07 13:03:15.318 [INFO][3396]> Dec 07 13:03:38 kubeworker2 kubelet[661]: E1207 13:03:38.903002 661 control> Dec 07 13:03:38 kubeworker2 kubelet[661]: E1207 13:03:38.952191 661 kubelet> Dec 07 13:03:42 kubeworker2 kubelet[661]: E1207 13:03:42.313131 661 control> Dec 07 13:09:25 kubeworker2 kubelet[661]: E1207 13:09:25.525803 661 kubelet> Dec 07 13:12:56 kubeworker2 systemd[1]: Stopping kubelet: The Kubernetes Node A> Dec 07 13:12:56 kubeworker2 systemd[1]: kubelet.service: Succeeded. Dec 07 13:12:56 kubeworker2 systemd[1]: Stopped kubelet: The Kubernetes Node Ag>
kubectl describe nodesでみても、"Kubelet stopped posting node status."と出てますね。
kubeuser@kubemaster1:~$ kubectl describe nodes kubeworker2 Name: kubeworker2 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=kubeworker2 kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.1.254/24 projectcalico.org/IPv4IPIPTunnelAddr: 10.0.225.0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Wed, 01 Dec 2021 12:23:46 +0000 Taints: node.kubernetes.io/unreachable:NoExecute node.kubernetes.io/unreachable:NoSchedule Unschedulable: false Lease: HolderIdentity: kubeworker2 AcquireTime: <unset> RenewTime: Tue, 07 Dec 2021 13:43:03 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Tue, 07 Dec 2021 13:03:14 +0000 Tue, 07 Dec 2021 13:03:14 +0000 CalicoIsUp Calico is running on this node MemoryPressure Unknown Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:47 +0000 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:47 +0000 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:47 +0000 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:47 +0000 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 192.168.1.254 Hostname: kubeworker2 Capacity: cpu: 2 ephemeral-storage: 20511312Ki hugepages-2Mi: 0 memory: 2035140Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 18903225108 hugepages-2Mi: 0 memory: 1932740Ki pods: 110 System Info: Machine ID: 7c474b3b662c452a98ea24d02d1871e9 System UUID: 7c474b3b-662c-452a-98ea-24d02d1871e9 Boot ID: d487a166-68fa-4da9-b89d-45cefb6bddc1 Kernel Version: 5.4.0-91-generic OS Image: Ubuntu 20.04.3 LTS Operating System: linux Architecture: amd64 Container Runtime Version: docker://20.10.7 Kubelet Version: v1.18.0 Kube-Proxy Version: v1.18.0 PodCIDR: 10.0.1.0/24 PodCIDRs: 10.0.1.0/24 Non-terminated Pods: (2 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- kube-system calico-node-lk26d 250m (12%) 0 (0%) 0 (0%) 0 (0%) 6d1h kube-system kube-proxy-xv78r 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6d1h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 250m (12%) 0 (0%) memory 0 (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Starting 44m kubelet, kubeworker2 Starting kubelet. Warning ImageGCFailed 44m kubelet, kubeworker2 failed to get imageFs info: unable to find data in memory cache Normal NodeAllocatableEnforced 44m kubelet, kubeworker2 Updated Node Allocatable limit across pods Normal NodeHasSufficientMemory 44m (x2 over 44m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 44m (x2 over 44m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 44m (x2 over 44m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientPID Warning Rebooted 44m kubelet, kubeworker2 Node kubeworker2 has been rebooted, boot id: d487a166-68fa-4da9-b89d-45cefb6bddc1 Normal NodeReady 44m kubelet, kubeworker2 Node kubeworker2 status is now: NodeReady Normal Starting 43m kube-proxy, kubeworker2 Starting kube-proxy. Normal Starting 115s kubelet, kubeworker2 Starting kubelet. Normal NodeHasSufficientMemory 115s kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientMemory Normal NodeHasSufficientPID 115s kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 115s kubelet, kubeworker2 Updated Node Allocatable limit across pods Normal NodeHasNoDiskPressure 99s (x2 over 115s) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasNoDiskPressure
参考のため、以下は正常時のもの。
kubeuser@kubemaster1:~$ kubectl describe nodes kubeworker2 Name: kubeworker2 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=kubeworker2 kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 192.168.1.254/24 projectcalico.org/IPv4IPIPTunnelAddr: 10.0.225.0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Wed, 01 Dec 2021 12:23:46 +0000 Taints: <none> Unschedulable: false Lease: HolderIdentity: kubeworker2 AcquireTime: <unset> RenewTime: Tue, 07 Dec 2021 13:43:03 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Tue, 07 Dec 2021 13:03:14 +0000 Tue, 07 Dec 2021 13:03:14 +0000 CalicoIsUp Calico is running on this node MemoryPressure False Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:04 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:04 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:04 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 07 Dec 2021 13:43:04 +0000 Tue, 07 Dec 2021 13:43:04 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled Addresses: InternalIP: 192.168.1.254 Hostname: kubeworker2 Capacity: cpu: 2 ephemeral-storage: 20511312Ki hugepages-2Mi: 0 memory: 2035140Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 18903225108 hugepages-2Mi: 0 memory: 1932740Ki pods: 110 System Info: Machine ID: 7c474b3b662c452a98ea24d02d1871e9 System UUID: 7c474b3b-662c-452a-98ea-24d02d1871e9 Boot ID: d487a166-68fa-4da9-b89d-45cefb6bddc1 Kernel Version: 5.4.0-91-generic OS Image: Ubuntu 20.04.3 LTS Operating System: linux Architecture: amd64 Container Runtime Version: docker://20.10.7 Kubelet Version: v1.18.0 Kube-Proxy Version: v1.18.0 PodCIDR: 10.0.1.0/24 PodCIDRs: 10.0.1.0/24 Non-terminated Pods: (2 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- kube-system calico-node-lk26d 250m (12%) 0 (0%) 0 (0%) 0 (0%) 6d1h kube-system kube-proxy-xv78r 0 (0%) 0 (0%) 0 (0%) 0 (0%) 6d1h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 250m (12%) 0 (0%) memory 0 (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Starting 43m kubelet, kubeworker2 Starting kubelet. Warning ImageGCFailed 43m kubelet, kubeworker2 failed to get imageFs info: unable to find data in memory cache Normal NodeAllocatableEnforced 43m kubelet, kubeworker2 Updated Node Allocatable limit across pods Normal NodeHasSufficientMemory 42m (x2 over 43m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 42m (x2 over 43m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 42m (x2 over 43m) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientPID Warning Rebooted 42m kubelet, kubeworker2 Node kubeworker2 has been rebooted, boot id: d487a166-68fa-4da9-b89d-45cefb6bddc1 Normal NodeReady 42m kubelet, kubeworker2 Node kubeworker2 status is now: NodeReady Normal Starting 42m kube-proxy, kubeworker2 Starting kube-proxy. Normal Starting 44s kubelet, kubeworker2 Starting kubelet. Normal NodeHasSufficientMemory 44s kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientMemory Normal NodeHasSufficientPID 44s kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 44s kubelet, kubeworker2 Updated Node Allocatable limit across pods Normal NodeHasNoDiskPressure 28s (x2 over 44s) kubelet, kubeworker2 Node kubeworker2 status is now: NodeHasNoDiskPressure
事象の復旧
Worker Nodeのkubeletを起動します。
ubeuser@kubeworker:~$ sudo systemctl start kubelet kubeuser@kubeworker:~$ sudo systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor prese> Drop-In: /etc/systemd/system/kubelet.service.d mq10-kubeadm.conf Active: active (running) since Tue 2021-12-07 13:14:32 UTC; 2s ago Docs: https://kubernetes.io/docs/home/ Main PID: 2055516 (kubelet) Tasks: 1 (limit: 2278) Memory: 3.8M CGroup: /system.slice/kubelet.service mq2055516 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/> Dec 07 13:14:32 kubeworker systemd[1]: Started kubelet: The Kubernetes Node Age>
では、改めてDeploymentのapplyを実行します。
kubeuser@kubemaster1:~$ kubectl apply -f nginx.yaml deployment.apps/nginx created kubeuser@kubemaster1:~$ kubectl get all NAME READY STATUS RESTARTS AGE pod/nginx-6c67f5ff6f-cc6x8 0/1 Pending 0 2s pod/nginx-6c67f5ff6f-rkwt2 0/1 Pending 0 3s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d1h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nginx 0/2 2 0 9s NAME DESIRED CURRENT READY AGE replicaset.apps/nginx-6c67f5ff6f 2 2 0 9s kubeuser@kubemaster1:~$ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-6c67f5ff6f-cc6x8 1/1 Running 0 23s 10.0.42.15 kubeworker <none> <none> nginx-6c67f5ff6f-rkwt2 1/1 Running 0 24s 10.0.42.14 kubeworker <none> <none>
今後は、無事に起動出来ました!
おわりに
・Worker Node構築時にkubeletが自動起動になっていない(systemctl enable kubeletの未実行)
・何かしらの理由でkubeletがダウンした
などに起因して発生するとは思うので発生頻度は稀かもしれませんが、
このことを想定している(知っている)か、そうではないかではトラブルシュートに差が出てきそうですね。