LinuCエヴァンジェリスト・Open Source Summit Japan 2022ボランティアリーダーの鯨井貴博@opensourcetechです。
はじめに
今回は、kubernetesノードにおけるリソース(CPU/メモリー)不足に関するトピックです。
リソース不足になった場合
Podなどのデプロイでエラーが発生します。
例えば、CPUリソースが不足の場合以下のようなメッセージが出ます。
Warning FailedScheduling 11s (x3 over 5m18s) default-scheduler 0/2 nodes are available: 1 Insufficient cpu. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod..
当然ですが、Podは起動できず、ずっとPending状態のままです。
ノードのリソース確認
ノードのリソースは、kubectl describe node ノード名で行います。
ubuntu@ip-10-30-0-15:~$ kubectl describe nodes ip-10-30-0-16 Name: ip-10-30-0-16 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=ip-10-30-0-16 kubernetes.io/os=linux system=secondOne Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 projectcalico.org/IPv4Address: 10.30.0.16/24 projectcalico.org/IPv4IPIPTunnelAddr: 192.168.207.64 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Fri, 18 Aug 2023 08:33:15 +0000 Taints: <none> Unschedulable: false Lease: HolderIdentity: ip-10-30-0-16 AcquireTime: <unset> RenewTime: Sun, 03 Sep 2023 06:08:33 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Sun, 03 Sep 2023 06:05:15 +0000 Sun, 03 Sep 2023 06:05:15 +0000 CalicoIsUp Calico is running on this node MemoryPressure False Sun, 03 Sep 2023 06:04:59 +0000 Sat, 02 Sep 2023 15:29:52 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Sun, 03 Sep 2023 06:04:59 +0000 Sat, 02 Sep 2023 15:29:52 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Sun, 03 Sep 2023 06:04:59 +0000 Sat, 02 Sep 2023 15:29:52 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Sun, 03 Sep 2023 06:04:59 +0000 Sun, 03 Sep 2023 06:04:59 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled Addresses: InternalIP: 10.30.0.16 Hostname: ip-10-30-0-16 Capacity: cpu: 2 ephemeral-storage: 50620216Ki hugepages-2Mi: 0 memory: 8125340Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 46651590989 hugepages-2Mi: 0 memory: 8022940Ki pods: 110 System Info: Machine ID: b4d4e43090b34c4ca5ee3bb8f9e7782f System UUID: ec2c1614-c01f-a088-4d39-bf5996711999 Boot ID: a69e1dce-2abc-47b4-a628-2c4427ad183e Kernel Version: 5.15.0-1043-aws OS Image: Ubuntu 20.04.6 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.6.22 Kubelet Version: v1.26.1 Kube-Proxy Version: v1.26.1 PodCIDR: 192.168.1.0/24 PodCIDRs: 192.168.1.0/24 Non-terminated Pods: (19 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- accounting nginx-one-bcdf9b6f5-gnmz6 100m (5%) 100m (5%) 20Mi (0%) 20Mi (0%) 13h accounting nginx-one-bcdf9b6f5-mdsn8 100m (5%) 100m (5%) 20Mi (0%) 20Mi (0%) 13h default myingress-ingress-nginx-controller-xq2cw 100m (5%) 100m (5%) 90Mi (1%) 20Mi (0%) 13h default nginx-748c667d99-dnqwg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d23h default web-one-6c8d79dc77-qs742 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13h default web-one-6c8d79dc77-xdfnd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13h default web-two-7b6658654c-fgmqn 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13h default web-two-7b6658654c-td9x7 0 (0%) 0 (0%) 0 (0%) 0 (0%) 13h kube-system calico-node-qmf25 250m (12%) 0 (0%) 0 (0%) 0 (0%) 15d kube-system kube-proxy-4kqgq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d linkerd-viz metrics-api-697654f965-k66fg 100m (5%) 100m (5%) 20Mi (0%) 20Mi (0%) 14h linkerd-viz prometheus-74cfd84488-4cxvx 100m (5%) 100m (5%) 20Mi (0%) 20Mi (0%) 14h linkerd-viz tap-54d4b7db5-z5pn6 100m (5%) 100m (5%) 20Mi (0%) 20Mi (0%) 14h linkerd-viz tap-injector-5d8669d944-mb9dm 100m (5%) 100m (5%) 20Mi (0%) 20Mi (0%) 14h linkerd-viz web-56bd56c47f-j7ddt 100m (5%) 100m (5%) 20Mi (0%) 20Mi (0%) 14h linkerd linkerd-destination-68b67b7895-stqcv 100m (5%) 100m (5%) 20Mi (0%) 20Mi (0%) 14h linkerd linkerd-identity-6f6f98567-8jffb 100m (5%) 100m (5%) 20Mi (0%) 20Mi (0%) 14h linkerd linkerd-proxy-injector-66c4fbd96d-dz8v2 100m (5%) 100m (5%) 20Mi (0%) 20Mi (0%) 14h low-usage-limit limited-hog-5b6647ff5c-j67h4 500m (25%) 1 (50%) 100Mi (1%) 500Mi (6%) 3d3h Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1850m (92%) 2100m (105%) memory 390Mi (4%) 720Mi (9%) ephemeral-storage 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Starting 3m34s kube-proxy Normal Starting 3m39s kubelet Starting kubelet. Warning InvalidDiskCapacity 3m39s kubelet invalid capacity 0 on image filesystem Normal NodeHasSufficientMemory 3m39s kubelet Node ip-10-30-0-16 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 3m39s kubelet Node ip-10-30-0-16 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 3m39s kubelet Node ip-10-30-0-16 status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 3m39s kubelet Updated Node Allocatable limit across pods Warning Rebooted 3m39s kubelet Node ip-10-30-0-16 has been rebooted, boot id: a69e1dce-2abc-47b4-a628-2c4427ad183e Normal NodeNotReady 3m39s kubelet Node ip-10-30-0-16 status is now: NodeNotReady Normal NodeReady 3m39s kubelet Node ip-10-30-0-16 status is now: NodeReady Normal RegisteredNode 3m8s node-controller Node ip-10-30-0-16 event: Registered Node ip-10-30-0-16 in Controller
cpuのRequestsが92%、Limitsが105%といっぱいいっぱいです。
この場合、これ以上Podなどの起動は難しいので、
Workerノードの増設やスケールアウト(クラウド上のVMなどでは高いスペックに変更する)という対応が必要になります。