Opensourcetechブログ

OpensourcetechによるNGINX/Kubernetes/Zabbix/Neo4j/Linuxなどオープンソース技術に関するブログです。

kubernetesノードのリソース不足


LinuCエヴァンジェリスト・Open Source Summit Japan 2022ボランティアリーダー鯨井貴博@opensourcetechです。


はじめに
今回は、kubernetesノードにおけるリソース(CPU/メモリー)不足に関するトピックです。


リソース不足になった場合
Podなどのデプロイでエラーが発生します。
例えば、CPUリソースが不足の場合以下のようなメッセージが出ます。

Warning  FailedScheduling  11s (x3 over 5m18s)  default-scheduler  0/2 nodes are available: 1 Insufficient cpu. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod..


当然ですが、Podは起動できず、ずっとPending状態のままです。


ノードのリソース確認
ノードのリソースは、kubectl describe node ノード名で行います。

ubuntu@ip-10-30-0-15:~$ kubectl describe nodes ip-10-30-0-16

Name:               ip-10-30-0-16
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-30-0-16
                    kubernetes.io/os=linux
                    system=secondOne
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 10.30.0.16/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 192.168.207.64
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 18 Aug 2023 08:33:15 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  ip-10-30-0-16
  AcquireTime:     <unset>
  RenewTime:       Sun, 03 Sep 2023 06:08:33 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Sun, 03 Sep 2023 06:05:15 +0000   Sun, 03 Sep 2023 06:05:15 +0000   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Sun, 03 Sep 2023 06:04:59 +0000   Sat, 02 Sep 2023 15:29:52 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Sun, 03 Sep 2023 06:04:59 +0000   Sat, 02 Sep 2023 15:29:52 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Sun, 03 Sep 2023 06:04:59 +0000   Sat, 02 Sep 2023 15:29:52 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Sun, 03 Sep 2023 06:04:59 +0000   Sun, 03 Sep 2023 06:04:59 +0000   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:  10.30.0.16
  Hostname:    ip-10-30-0-16
Capacity:
  cpu:                2
  ephemeral-storage:  50620216Ki
  hugepages-2Mi:      0
  memory:             8125340Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  46651590989
  hugepages-2Mi:      0
  memory:             8022940Ki
  pods:               110
System Info:
  Machine ID:                 b4d4e43090b34c4ca5ee3bb8f9e7782f
  System UUID:                ec2c1614-c01f-a088-4d39-bf5996711999
  Boot ID:                    a69e1dce-2abc-47b4-a628-2c4427ad183e
  Kernel Version:             5.15.0-1043-aws
  OS Image:                   Ubuntu 20.04.6 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.22
  Kubelet Version:            v1.26.1
  Kube-Proxy Version:         v1.26.1
PodCIDR:                      192.168.1.0/24
PodCIDRs:                     192.168.1.0/24
Non-terminated Pods:          (19 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  accounting                  nginx-one-bcdf9b6f5-gnmz6                   100m (5%)     100m (5%)   20Mi (0%)        20Mi (0%)      13h
  accounting                  nginx-one-bcdf9b6f5-mdsn8                   100m (5%)     100m (5%)   20Mi (0%)        20Mi (0%)      13h
  default                     myingress-ingress-nginx-controller-xq2cw    100m (5%)     100m (5%)   90Mi (1%)        20Mi (0%)      13h
  default                     nginx-748c667d99-dnqwg                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         2d23h
  default                     web-one-6c8d79dc77-qs742                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         13h
  default                     web-one-6c8d79dc77-xdfnd                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         13h
  default                     web-two-7b6658654c-fgmqn                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         13h
  default                     web-two-7b6658654c-td9x7                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         13h
  kube-system                 calico-node-qmf25                           250m (12%)    0 (0%)      0 (0%)           0 (0%)         15d
  kube-system                 kube-proxy-4kqgq                            0 (0%)        0 (0%)      0 (0%)           0 (0%)         12d
  linkerd-viz                 metrics-api-697654f965-k66fg                100m (5%)     100m (5%)   20Mi (0%)        20Mi (0%)      14h
  linkerd-viz                 prometheus-74cfd84488-4cxvx                 100m (5%)     100m (5%)   20Mi (0%)        20Mi (0%)      14h
  linkerd-viz                 tap-54d4b7db5-z5pn6                         100m (5%)     100m (5%)   20Mi (0%)        20Mi (0%)      14h
  linkerd-viz                 tap-injector-5d8669d944-mb9dm               100m (5%)     100m (5%)   20Mi (0%)        20Mi (0%)      14h
  linkerd-viz                 web-56bd56c47f-j7ddt                        100m (5%)     100m (5%)   20Mi (0%)        20Mi (0%)      14h
  linkerd                     linkerd-destination-68b67b7895-stqcv        100m (5%)     100m (5%)   20Mi (0%)        20Mi (0%)      14h
  linkerd                     linkerd-identity-6f6f98567-8jffb            100m (5%)     100m (5%)   20Mi (0%)        20Mi (0%)      14h
  linkerd                     linkerd-proxy-injector-66c4fbd96d-dz8v2     100m (5%)     100m (5%)   20Mi (0%)        20Mi (0%)      14h
  low-usage-limit             limited-hog-5b6647ff5c-j67h4                500m (25%)    1 (50%)     100Mi (1%)       500Mi (6%)     3d3h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests     Limits
  --------           --------     ------
  cpu                1850m (92%)  2100m (105%)
  memory             390Mi (4%)   720Mi (9%)
  ephemeral-storage  0 (0%)       0 (0%)
  hugepages-2Mi      0 (0%)       0 (0%)
Events:
  Type     Reason                   Age    From             Message
  ----     ------                   ----   ----             -------
  Normal   Starting                 3m34s  kube-proxy       
  Normal   Starting                 3m39s  kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity      3m39s  kubelet          invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  3m39s  kubelet          Node ip-10-30-0-16 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    3m39s  kubelet          Node ip-10-30-0-16 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     3m39s  kubelet          Node ip-10-30-0-16 status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  3m39s  kubelet          Updated Node Allocatable limit across pods
  Warning  Rebooted                 3m39s  kubelet          Node ip-10-30-0-16 has been rebooted, boot id: a69e1dce-2abc-47b4-a628-2c4427ad183e
  Normal   NodeNotReady             3m39s  kubelet          Node ip-10-30-0-16 status is now: NodeNotReady
  Normal   NodeReady                3m39s  kubelet          Node ip-10-30-0-16 status is now: NodeReady
  Normal   RegisteredNode           3m8s   node-controller  Node ip-10-30-0-16 event: Registered Node ip-10-30-0-16 in Controller


cpuのRequestsが92%、Limitsが105%といっぱいいっぱいです。

この場合、これ以上Podなどの起動は難しいので、
Workerノードの増設やスケールアウト(クラウド上のVMなどでは高いスペックに変更する)という対応が必要になります。

Opensourcetech by Takahiro Kujirai