Opensourcetechブログ

OpensourcetechによるNGINX/Kubernetes/Zabbix/Neo4j/Linuxなどオープンソース技術に関するブログです。

Podが"Pending"のまま起動しない原因(kubernetes)


LinuCエヴァンジェリストの鯨井貴博@opensourcetechです。


はじめに
KubernetesでPodが"Pending"のまま起動しない原因に関するメモです。



事象の発生
以下のように、Deployment(Pod)を含むをapplyします。

kubeuser@kubemaster1:~$ kubectl apply -f nginx.yaml
deployment.apps/nginx created

kubeuser@kubemaster1:~$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
nginx-6c67f5ff6f-nrdvt   0/1     Pending   0          7s
nginx-6c67f5ff6f-t4sg8   0/1     Pending   0          7s

kubeuser@kubemaster1:~$ kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
nginx-6c67f5ff6f-nrdvt   0/1     Pending   0          13s   <none>   <none>   <none>           <none>
nginx-6c67f5ff6f-t4sg8   0/1     Pending   0          13s   <none>   <none>   <none>           <none>

.
.
.
kubeuser@kubemaster1:~$ kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP       NODE     NOMINATED NODE   READINESS GATES
nginx-6c67f5ff6f-nrdvt   0/1     Pending   0          77s   <none>   <none>   <none>           <none>
nginx-6c67f5ff6f-t4sg8   0/1     Pending   0          77s   <none>   <none>   <none>           <none>


STATUSが"Pending"のまな起動してくる気配がありません。


Podが起動しない原因
Podが起動しない原因は、Worker Nodeにあります。

kubeuser@kubemaster1:~$ kubectl get nodes
NAME          STATUS   ROLES    AGE    VERSION
kubemaster1   Ready    master   6d1h   v1.18.0
kubemaster2   Ready    master   6d1h   v1.18.0
kubemaster3   Ready    master   6d     v1.18.0
kubeworker    Ready    <none>   6d1h   v1.18.0
kubeworker2   Ready    <none>   6d     v1.18.0


STATUS "Ready"となっており特に異常はないようですが、
実は、Worker Nodeではkubeletが停止しています。
kubernetesのクラスターでは、以下のようにMaster NodeのAPI-ServerとWorker Nodeのkubeletが通信をしていますが、kubeletが停止しているとこれが出来ずPodの配置(スケジューリング)が出来なくなることが原因です。
f:id:opensourcetech:20211207223214p:plain:w600
https://kubernetes.io/ja/docs/concepts/overview/components/

Worker Node(1台目)。

kubeuser@kubeworker:~$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor prese>
    Drop-In: /etc/systemd/system/kubelet.service.d
             mq10-kubeadm.conf
     Active: inactive (dead) since Tue 2021-12-07 13:12:25 UTC; 3s ago
       Docs: https://kubernetes.io/docs/home/
    Process: 19532 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET>
   Main PID: 19532 (code=exited, status=0/SUCCESS)

Dec 07 10:52:30 kubeworker kubelet[19532]: E1207 10:52:30.866416   19532 remote>
Dec 07 11:36:48 kubeworker kubelet[19532]: E1207 11:36:48.594091   19532 contro>
Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.696722   19532 contro>
Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.875900   19532 contro>
Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.876512   19532 contro>
Dec 07 11:36:55 kubeworker kubelet[19532]: E1207 11:36:55.876845   19532 contro>
Dec 07 11:36:56 kubeworker kubelet[19532]: I1207 11:36:55.877071   19532 contro>
Dec 07 13:12:25 kubeworker systemd[1]: Stopping kubelet: The Kubernetes Node Ag>
Dec 07 13:12:25 kubeworker systemd[1]: kubelet.service: Succeeded.
Dec 07 13:12:25 kubeworker systemd[1]: Stopped kubelet: The Kubernetes Node Age>


Worker Node(2台目)。

kubeuser@kubeworker2:~$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor prese>
    Drop-In: /etc/systemd/system/kubelet.service.d
             mq10-kubeadm.conf
     Active: inactive (dead) since Tue 2021-12-07 13:12:56 UTC; 1s ago
       Docs: https://kubernetes.io/docs/home/
    Process: 661 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_C>
   Main PID: 661 (code=exited, status=0/SUCCESS)

Dec 07 13:03:15 kubeworker2 kubelet[3396]: 2021-12-07 13:03:15.293 [INFO][3410]>
Dec 07 13:03:15 kubeworker2 kubelet[3396]: time="2021-12-07T13:03:15Z" level=in>
Dec 07 13:03:15 kubeworker2 kubelet[3396]: 2021-12-07 13:03:15.318 [INFO][3396]>
Dec 07 13:03:38 kubeworker2 kubelet[661]: E1207 13:03:38.903002     661 control>
Dec 07 13:03:38 kubeworker2 kubelet[661]: E1207 13:03:38.952191     661 kubelet>
Dec 07 13:03:42 kubeworker2 kubelet[661]: E1207 13:03:42.313131     661 control>
Dec 07 13:09:25 kubeworker2 kubelet[661]: E1207 13:09:25.525803     661 kubelet>
Dec 07 13:12:56 kubeworker2 systemd[1]: Stopping kubelet: The Kubernetes Node A>
Dec 07 13:12:56 kubeworker2 systemd[1]: kubelet.service: Succeeded.
Dec 07 13:12:56 kubeworker2 systemd[1]: Stopped kubelet: The Kubernetes Node Ag>


kubectl describe nodesでみても、"Kubelet stopped posting node status."と出てますね。

kubeuser@kubemaster1:~$ kubectl describe nodes kubeworker2
Name:               kubeworker2
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=kubeworker2
                    kubernetes.io/os=linux
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.1.254/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 10.0.225.0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 01 Dec 2021 12:23:46 +0000
Taints:             node.kubernetes.io/unreachable:NoExecute
                    node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  kubeworker2
  AcquireTime:     <unset>
  RenewTime:       Tue, 07 Dec 2021 13:43:03 +0000
Conditions:
  Type                 Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----                 ------    -----------------                 ------------------                ------              -------
  NetworkUnavailable   False     Tue, 07 Dec 2021 13:03:14 +0000   Tue, 07 Dec 2021 13:03:14 +0000   CalicoIsUp          Calico is running on this node
  MemoryPressure       Unknown   Tue, 07 Dec 2021 13:43:04 +0000   Tue, 07 Dec 2021 13:43:47 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure         Unknown   Tue, 07 Dec 2021 13:43:04 +0000   Tue, 07 Dec 2021 13:43:47 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure          Unknown   Tue, 07 Dec 2021 13:43:04 +0000   Tue, 07 Dec 2021 13:43:47 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready                Unknown   Tue, 07 Dec 2021 13:43:04 +0000   Tue, 07 Dec 2021 13:43:47 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
Addresses:
  InternalIP:  192.168.1.254
  Hostname:    kubeworker2
Capacity:
  cpu:                2
  ephemeral-storage:  20511312Ki
  hugepages-2Mi:      0
  memory:             2035140Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  18903225108
  hugepages-2Mi:      0
  memory:             1932740Ki
  pods:               110
System Info:
  Machine ID:                 7c474b3b662c452a98ea24d02d1871e9
  System UUID:                7c474b3b-662c-452a-98ea-24d02d1871e9
  Boot ID:                    d487a166-68fa-4da9-b89d-45cefb6bddc1
  Kernel Version:             5.4.0-91-generic
  OS Image:                   Ubuntu 20.04.3 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.7
  Kubelet Version:            v1.18.0
  Kube-Proxy Version:         v1.18.0
PodCIDR:                      10.0.1.0/24
PodCIDRs:                     10.0.1.0/24
Non-terminated Pods:          (2 in total)
  Namespace                   Name                 CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                 ------------  ----------  ---------------  -------------  ---
  kube-system                 calico-node-lk26d    250m (12%)    0 (0%)      0 (0%)           0 (0%)         6d1h
  kube-system                 kube-proxy-xv78r     0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d1h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                250m (12%)  0 (0%)
  memory             0 (0%)      0 (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type     Reason                   Age                 From                     Message
  ----     ------                   ----                ----                     -------
  Normal   Starting                 44m                 kubelet, kubeworker2     Starting kubelet.
  Warning  ImageGCFailed            44m                 kubelet, kubeworker2     failed to get imageFs info: unable to find data in memory cache
  Normal   NodeAllocatableEnforced  44m                 kubelet, kubeworker2     Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  44m (x2 over 44m)   kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    44m (x2 over 44m)   kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     44m (x2 over 44m)   kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasSufficientPID
  Warning  Rebooted                 44m                 kubelet, kubeworker2     Node kubeworker2 has been rebooted, boot id: d487a166-68fa-4da9-b89d-45cefb6bddc1
  Normal   NodeReady                44m                 kubelet, kubeworker2     Node kubeworker2 status is now: NodeReady
  Normal   Starting                 43m                 kube-proxy, kubeworker2  Starting kube-proxy.
  Normal   Starting                 115s                kubelet, kubeworker2     Starting kubelet.
  Normal   NodeHasSufficientMemory  115s                kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientPID     115s                kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  115s                kubelet, kubeworker2     Updated Node Allocatable limit across pods
  Normal   NodeHasNoDiskPressure    99s (x2 over 115s)  kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasNoDiskPressure


参考のため、以下は正常時のもの。

kubeuser@kubemaster1:~$ kubectl describe nodes kubeworker2
Name:               kubeworker2
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=kubeworker2
                    kubernetes.io/os=linux
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.1.254/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 10.0.225.0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Wed, 01 Dec 2021 12:23:46 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  kubeworker2
  AcquireTime:     <unset>
  RenewTime:       Tue, 07 Dec 2021 13:43:03 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 07 Dec 2021 13:03:14 +0000   Tue, 07 Dec 2021 13:03:14 +0000   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Tue, 07 Dec 2021 13:43:04 +0000   Tue, 07 Dec 2021 13:43:04 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Tue, 07 Dec 2021 13:43:04 +0000   Tue, 07 Dec 2021 13:43:04 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Tue, 07 Dec 2021 13:43:04 +0000   Tue, 07 Dec 2021 13:43:04 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Tue, 07 Dec 2021 13:43:04 +0000   Tue, 07 Dec 2021 13:43:04 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  192.168.1.254
  Hostname:    kubeworker2
Capacity:
  cpu:                2
  ephemeral-storage:  20511312Ki
  hugepages-2Mi:      0
  memory:             2035140Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  18903225108
  hugepages-2Mi:      0
  memory:             1932740Ki
  pods:               110
System Info:
  Machine ID:                 7c474b3b662c452a98ea24d02d1871e9
  System UUID:                7c474b3b-662c-452a-98ea-24d02d1871e9
  Boot ID:                    d487a166-68fa-4da9-b89d-45cefb6bddc1
  Kernel Version:             5.4.0-91-generic
  OS Image:                   Ubuntu 20.04.3 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://20.10.7
  Kubelet Version:            v1.18.0
  Kube-Proxy Version:         v1.18.0
PodCIDR:                      10.0.1.0/24
PodCIDRs:                     10.0.1.0/24
Non-terminated Pods:          (2 in total)
  Namespace                   Name                 CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                 ------------  ----------  ---------------  -------------  ---
  kube-system                 calico-node-lk26d    250m (12%)    0 (0%)      0 (0%)           0 (0%)         6d1h
  kube-system                 kube-proxy-xv78r     0 (0%)        0 (0%)      0 (0%)           0 (0%)         6d1h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                250m (12%)  0 (0%)
  memory             0 (0%)      0 (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type     Reason                   Age                From                     Message
  ----     ------                   ----               ----                     -------
  Normal   Starting                 43m                kubelet, kubeworker2     Starting kubelet.
  Warning  ImageGCFailed            43m                kubelet, kubeworker2     failed to get imageFs info: unable to find data in memory cache
  Normal   NodeAllocatableEnforced  43m                kubelet, kubeworker2     Updated Node Allocatable limit across pods
  Normal   NodeHasSufficientMemory  42m (x2 over 43m)  kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    42m (x2 over 43m)  kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     42m (x2 over 43m)  kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasSufficientPID
  Warning  Rebooted                 42m                kubelet, kubeworker2     Node kubeworker2 has been rebooted, boot id: d487a166-68fa-4da9-b89d-45cefb6bddc1
  Normal   NodeReady                42m                kubelet, kubeworker2     Node kubeworker2 status is now: NodeReady
  Normal   Starting                 42m                kube-proxy, kubeworker2  Starting kube-proxy.
  Normal   Starting                 44s                kubelet, kubeworker2     Starting kubelet.
  Normal   NodeHasSufficientMemory  44s                kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientPID     44s                kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  44s                kubelet, kubeworker2     Updated Node Allocatable limit across pods
  Normal   NodeHasNoDiskPressure    28s (x2 over 44s)  kubelet, kubeworker2     Node kubeworker2 status is now: NodeHasNoDiskPressure




事象の復旧
Worker Nodeのkubeletを起動します。

ubeuser@kubeworker:~$ sudo systemctl start kubelet
kubeuser@kubeworker:~$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor prese>
    Drop-In: /etc/systemd/system/kubelet.service.d
             mq10-kubeadm.conf
     Active: active (running) since Tue 2021-12-07 13:14:32 UTC; 2s ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 2055516 (kubelet)
      Tasks: 1 (limit: 2278)
     Memory: 3.8M
     CGroup: /system.slice/kubelet.service
             mq2055516 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/>

Dec 07 13:14:32 kubeworker systemd[1]: Started kubelet: The Kubernetes Node Age>


では、改めてDeploymentのapplyを実行します。

kubeuser@kubemaster1:~$ kubectl apply -f nginx.yaml
deployment.apps/nginx created

kubeuser@kubemaster1:~$ kubectl get all
NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-6c67f5ff6f-cc6x8   0/1     Pending   0          2s
pod/nginx-6c67f5ff6f-rkwt2   0/1     Pending   0          3s

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   6d1h

NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx   0/2     2            0           9s

NAME                               DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-6c67f5ff6f   2         2         0       9s

kubeuser@kubemaster1:~$ kubectl get pods -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP           NODE         NOMINATED NODE   READINESS GATES
nginx-6c67f5ff6f-cc6x8   1/1     Running   0          23s   10.0.42.15   kubeworker   <none>           <none>
nginx-6c67f5ff6f-rkwt2   1/1     Running   0          24s   10.0.42.14   kubeworker   <none>           <none>


今後は、無事に起動出来ました!



おわりに
・Worker Node構築時にkubeletが自動起動になっていない(systemctl enable kubeletの未実行) ・何かしらの理由でkubeletがダウンした などに起因して発生するとは思うので発生頻度は稀かもしれませんが、
このことを想定している(知っている)か、そうではないかではトラブルシュートに差が出てきそうですね。

Opensourcetech by Takahiro Kujirai