LinuCエヴァンジェリストの鯨井貴博@opensourcetechです。
はじめに
前回(kubernetesにおけるPodへの負荷分散状況の確認(Service/Deployment経由))の記事でトラフィックの分散をkube-proxyが担っていることが分かりました。
なので今回さらに深堀りということで、kube-proxyの動作モードをデフォルト(iptables)からipvsに変更してトラフィック分散の挙動を確認してみようと思います。
なお、kubernetesはこちらの記事で作ったものを使っています。
kube-proxy動作モードの確認
まず、特にモード指定せずにインストールしたkubernetes環境でkube-proxyのモードを確認します。
kube-proxy(pod)のログからiptablesモードで動いていることが分かります。
I0314 13:48:24.900100 1 server_others.go:535] "Using iptables proxy"
I0314 13:48:24.931674 1 server_others.go:176] "Using iptables Proxier"
kubeuser@master01:~$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-57b57c56f-p6xds 1/1 Running 0 12d
calico-node-phmkb 1/1 Running 0 12d
calico-node-wjdqx 1/1 Running 0 12d
calico-node-xdkfv 1/1 Running 0 12d
coredns-787d4945fb-6n79l 1/1 Running 0 12d
coredns-787d4945fb-dfplr 1/1 Running 0 12d
etcd-master01 1/1 Running 3 12d
kube-apiserver-master01 1/1 Running 2 12d
kube-controller-manager-master01 1/1 Running 0 12d
kube-proxy-2n7b2 1/1 Running 0 12d
kube-proxy-7k425 1/1 Running 0 12d
kube-proxy-c5pkt 1/1 Running 0 12d
kube-scheduler-master01 1/1 Running 3 12d
metrics-server-6b6f9ccc7-qmtb9 1/1 Running 0 130m
kubeuser@master01:~$ kubectl logs kube-proxy-2n7b2 -n kube-system
I0314 13:48:24.898834 1 node.go:163] Successfully retrieved node IP: 192.168.1.45
I0314 13:48:24.899245 1 server_others.go:109] "Detected node IP" address="192.168.1.45"
I0314 13:48:24.900100 1 server_others.go:535] "Using iptables proxy"
I0314 13:48:24.931674 1 server_others.go:176] "Using iptables Proxier"
I0314 13:48:24.931939 1 server_others.go:183] "kube-proxy running in dual-stack mode" ipFamily=IPv4
I0314 13:48:24.932054 1 server_others.go:184] "Creating dualStackProxier for iptables"
I0314 13:48:24.932195 1 proxier.go:242] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
I0314 13:48:25.005296 1 server.go:655] "Version info" version="v1.26.0"
I0314 13:48:25.005692 1 server.go:657] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0314 13:48:25.009442 1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=131072
I0314 13:48:25.012672 1 config.go:317] "Starting service config controller"
I0314 13:48:25.012821 1 shared_informer.go:273] Waiting for caches to sync for service config
I0314 13:48:25.012983 1 config.go:226] "Starting endpoint slice config controller"
I0314 13:48:25.013049 1 shared_informer.go:273] Waiting for caches to sync for endpoint slice config
I0314 13:48:25.014118 1 config.go:444] "Starting node config controller"
I0314 13:48:25.014369 1 shared_informer.go:273] Waiting for caches to sync for node config
I0314 13:48:25.198054 1 shared_informer.go:280] Caches are synced for service config
I0314 13:48:25.198100 1 shared_informer.go:280] Caches are synced for endpoint slice config
I0314 13:48:25.199072 1 shared_informer.go:280] Caches are synced for node config
E0316 23:45:26.858242 1 service_health.go:141] "Failed to start healthcheck" err="listen tcp :31402: bind: address already in use" node="worker01" service="ingress-nginx/ingress-nginx-controller" port=31402
E0316 23:45:26.922288 1 service_health.go:141] "Failed to start healthcheck" err="listen tcp :31402: bind: address already in use" node="worker01" service="ingress-nginx/ingress-nginx-controller" port=31402
E0316 23:45:27.838639 1 event_broadcaster.go:253] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"ingress-nginx-controller.174d0afedb1b6a4c", GenerateName:"", Namespace:"ingress-nginx", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, EventTime:time.Date(2023, time.March, 16, 23, 45, 25, 793025688, time.Local), Series:(*v1.EventSeries)(0xc00014d7c0), ReportingController:"kube-proxy", ReportingInstance:"kube-proxy-worker01", Action:"Listen", Reason:"FailedToStartServiceHealthcheck", Regarding:v1.ObjectReference{Kind:"Service", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"ingress-nginx/ingress-nginx-controller", APIVersion:"", ResourceVersion:"", FieldPath:""}, Related:(*v1.ObjectReference)(nil), Note:"node worker01 failed to start healthcheck \"ingress-
.
.
.
.
また、podの設定としては/var/lib/kube-proxy/config.conf、
mode: ""とある部分でデフォルトのiptablesモードとなっていることが分かります。
kubeuser@master01:~$ kubectl describe pods kube-proxy-2n7b2 -n kube-system
Name: kube-proxy-2n7b2
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: kube-proxy
Node: worker01/192.168.1.45
Start Time: Tue, 14 Mar 2023 13:48:20 +0000
Labels: controller-revision-hash=78545cdb7d
k8s-app=kube-proxy
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 192.168.1.45
IPs:
IP: 192.168.1.45
Controlled By: DaemonSet/kube-proxy
Containers:
kube-proxy:
Container ID: containerd://ed9b81bf010560e2f3175dfd4f1e0415f8bc32ab57b4d2beb15822a66c31c6f0
Image: registry.k8s.io/kube-proxy:v1.26.0
Image ID: registry.k8s.io/kube-proxy@sha256:1e9bbe429e4e2b2ad32681c91deb98a334f1bf4135137df5f84f9d03689060fe
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/kube-proxy
--config=/var/lib/kube-proxy/config.conf
--hostname-override=$(NODE_NAME)
State: Running
Started: Tue, 14 Mar 2023 13:48:22 +0000
Ready: True
Restart Count: 0
Environment:
NODE_NAME: (v1:spec.nodeName)
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/kube-proxy from kube-proxy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-44qpn (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-proxy:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-proxy
Optional: false
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
kube-api-access-44qpn:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events: <none>
kubeuser@master01:~$ kubectl exec kube-proxy-2n7b2 -n kube-system cat /var/lib/kube-proxy/config.conf
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
bindAddressHardFail: false
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: 10.0.0.0/16,fd12:b5e0:383e::/64
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
detectLocal:
bridgeInterface: ""
interfaceNamePrefix: ""
detectLocalMode: ""
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
localhostNodePorts: null
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
tcpFinTimeout: 0s
tcpTimeout: 0s
udpTimeout: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
showHiddenMetricsForVersion: ""
winkernel:
enableDSR: false
forwardHealthCheckVip: false
networkName: ""
rootHnsEndpointName: ""
sourceVip: ""
kubernetesのドキュメントにも、Linuxではiptablesモードがデフォルトと書かれています。

iptablesモード
iptablesモードは、kube-proxyはバックエンドPodをランダムで選択する動きをします。


※https://kubernetes.io/ja/docs/concepts/services-networking/service/#proxy-mode-iptables
ランダムで選択とはいえ、実際にはざっくり均等になるという結果でした。
kubernetesにおけるPodへの負荷分散状況の確認(Service/Deployment経由))
ipvsモード
ipvsモードは、低いレイテンシーでトラフィックをリダイレクトする、
様々な分散方式などの特徴があります。



※https://kubernetes.io/ja/docs/concepts/services-networking/service/#proxy-mode-iptables
ipvsモード(rr:ラウンドロビン)に変更する
では、iptablesモードからipvsモード(rr:ラウンドロビン)に変更していきます。
まずipvsカーネルモジュールが有効になっているか確認します。
※無効な場合には、モジュールのインストールや有効化(modprobe)を実施します。
カーネルモジュールの操作については、こちらを参考にしてください。
kubeuser@master01:~$ lsmod | grep ip_vs ip_vs_sed 16384 0 ip_vs_nq 16384 0 ip_vs_dh 16384 0 ip_vs_lc 16384 0 ip_vs_sh 16384 25 ip_vs_wrr 16384 0 ip_vs_rr 16384 0 ip_vs 176128 39 ip_vs_rr,ip_vs_dh,ip_vs_sh,ip_vs_nq,ip_vs_wrr,ip_vs_lc,ip_vs_sed nf_conntrack 172032 6 xt_conntrack,nf_nat,xt_nat,nf_conntrack_netlink,xt_MASQUERADE,ip_vs nf_defrag_ipv6 24576 2 nf_conntrack,ip_vs libcrc32c 16384 6 nf_conntrack,nf_nat,btrfs,nf_tables,raid456,ip_vs
kube-proxyの設定は、configmapで定義されています。
そのため、該当箇所(ipvsのschedulerとmode)を変更します。
kubeuser@master01:~$ kubectl get cm -n kube-system
NAME DATA AGE
calico-config 4 12d
coredns 1 12d
extension-apiserver-authentication 6 12d
kube-proxy 2 12d
kube-root-ca.crt 1 12d
kubeadm-config 1 12d
kubelet-config 1 12d
kubeuser@master01:~$ kubectl get cm -n kube-system kube-proxy -o yaml
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
bindAddressHardFail: false
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: 10.0.0.0/16,fd12:b5e0:383e::/64
configSyncPeriod: 0s
conntrack:
maxPerCore: null
min: null
tcpCloseWaitTimeout: null
tcpEstablishedTimeout: null
detectLocal:
bridgeInterface: ""
interfaceNamePrefix: ""
detectLocalMode: ""
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
localhostNodePorts: null
masqueradeAll: false
masqueradeBit: null
minSyncPeriod: 0s
syncPeriod: 0s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
strictARP: false
syncPeriod: 0s
tcpFinTimeout: 0s
tcpTimeout: 0s
udpTimeout: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
showHiddenMetricsForVersion: ""
winkernel:
enableDSR: false
forwardHealthCheckVip: false
networkName: ""
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
bindAddressHardFail: false
clientConnection:
acceptContentTypes: ""
burst: 0
contentType: ""
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 0
clusterCIDR: 10.0.0.0/16,fd12:b5e0:383e::/64
configSyncPeriod: 0s
conntrack:
maxPerCore: null
rootHnsEndpointName: ""
sourceVip: ""
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://master01:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
annotations:
kubeadm.kubernetes.io/component-config.hash: sha256:e1193cefc1046d8fe6dbffed62d04e546bd4142781cc28f632adfe72f499e4be
creationTimestamp: "2023-03-14T13:45:56Z"
labels:
app: kube-proxy
name: kube-proxy
namespace: kube-system
resourceVersion: "271"
uid: 687e9c38-9699-46dc-9fbb-5de6bf270781
kubeuser@master01:~$ kubectl edit cm -n kube-system kube-proxy
configmap/kube-proxy edited
kubeuser@master01:~$ kubectl get cm kube-proxy -n kube-system -o yaml
.
.
.
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: "rr"
strictARP: false
syncPeriod: 0s
tcpFinTimeout: 0s
tcpTimeout: 0s
udpTimeout: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: "ipvs"
.
.
.
更新したconfigmapの内容を反映させるため、
kube-proxy(pod)を削除します。
※podは、その後 自動で起動します。
kubeuser@master01:~$ kubectl delete pods kube-proxy-2n7b2 -n kube-system pod "kube-proxy-2n7b2" deleted kubeuser@master01:~$ kubectl delete pods kube-proxy-7k425 -n kube-system pod "kube-proxy-7k425" deleted kubeuser@master01:~$ kubectl delete pods kube-proxy-c5pkt -n kube-system pod "kube-proxy-c5pkt" deleted
起動したkube-proxyのpodは、
I0326 14:20:14.047849 1 server_others.go:248] "Using ipvs Proxier"
I0326 14:20:14.048263 1 server_others.go:250] "Creating dualStackProxier for ipvs"
となっていますね。
kubeuser@master01:~$ kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-57b57c56f-p6xds 1/1 Running 0 12d calico-node-phmkb 1/1 Running 0 12d calico-node-wjdqx 1/1 Running 0 12d calico-node-xdkfv 1/1 Running 0 12d coredns-787d4945fb-6n79l 1/1 Running 0 12d coredns-787d4945fb-dfplr 1/1 Running 0 12d etcd-master01 1/1 Running 3 12d kube-apiserver-master01 1/1 Running 2 12d kube-controller-manager-master01 1/1 Running 0 12d kube-proxy-2sdm6 1/1 Running 0 20h kube-proxy-dvrph 1/1 Running 0 20h kube-proxy-gpp8x 1/1 Running 0 20h kube-scheduler-master01 1/1 Running 3 12d metrics-server-6b6f9ccc7-qmtb9 1/1 Running 0 23h kubeuser@master01:~$ kubectl logs kube-proxy-2sdm6 -n kube-system I0326 14:20:10.451704 1 node.go:163] Successfully retrieved node IP: 192.168.1.45 I0326 14:20:10.452153 1 server_others.go:109] "Detected node IP" address="192.168.1.45" I0326 14:20:14.047849 1 server_others.go:248] "Using ipvs Proxier" I0326 14:20:14.048263 1 server_others.go:250] "Creating dualStackProxier for ipvs" I0326 14:20:14.162812 1 proxier.go:462] "IPVS scheduler not specified, use rr by default" I0326 14:20:14.163387 1 proxier.go:462] "IPVS scheduler not specified, use rr by default" I0326 14:20:14.163447 1 ipset.go:116] "Ipset name truncated" ipSetName="KUBE-6-LOAD-BALANCER-SOURCE-CIDR" truncatedName="KUBE-6-LOAD-BALANCER-SOURCE-CID" I0326 14:20:14.163502 1 ipset.go:116] "Ipset name truncated" ipSetName="KUBE-6-NODE-PORT-LOCAL-SCTP-HASH" truncatedName="KUBE-6-NODE-PORT-LOCAL-SCTP-HAS" I0326 14:20:14.164023 1 server.go:655] "Version info" version="v1.26.0" I0326 14:20:14.164057 1 server.go:657] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK="" I0326 14:20:14.489671 1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=131072 I0326 14:20:14.541448 1 config.go:444] "Starting node config controller" I0326 14:20:14.541714 1 shared_informer.go:273] Waiting for caches to sync for node config I0326 14:20:14.542439 1 config.go:317] "Starting service config controller" I0326 14:20:14.542471 1 shared_informer.go:273] Waiting for caches to sync for service config I0326 14:20:14.542730 1 config.go:226] "Starting endpoint slice config controller" I0326 14:20:14.542800 1 shared_informer.go:273] Waiting for caches to sync for endpoint slice config I0326 14:20:14.922644 1 shared_informer.go:280] Caches are synced for endpoint slice config I0326 14:20:15.241990 1 shared_informer.go:280] Caches are synced for node config I0326 14:20:15.343632 1 shared_informer.go:280] Caches are synced for service config E0326 14:20:17.079817 1 service_health.go:141] "Failed to start healthcheck" err="listen tcp :31402: bind: address already in use" node="worker01" service="ingress-nginx/ingress-nginx-controller" port=31402
トラフィック分散の確認
ipvsモード(rr:ラウンドロビン)の挙動を確認します。
前回同様に、
XAMPP(apache bench)で1,000リクエスト投げてみます。
c:\xampp\apache\bin>ab.exe -n 1000 -c 1000 http://192.168.1.51/
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.1.51 (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software: nginx/1.23.0
Server Hostname: 192.168.1.51
Server Port: 80
Document Path: /
Document Length: 256 bytes
Concurrency Level: 1000
Time taken for tests: 3.660 seconds
Complete requests: 1000
Failed requests: 333
(Connect: 0, Receive: 0, Length: 333, Exceptions: 0)
Total transferred: 489333 bytes
HTML transferred: 256333 bytes
Requests per second: 273.25 [#/sec] (mean)
Time per request: 3659.664 [ms] (mean)
Time per request: 3.660 [ms] (mean, across all concurrent requests)
Transfer rate: 130.58 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 3 2.2 3 21
Processing: 159 1884 953.1 1917 3509
Waiting: 5 1807 990.5 1842 3491
Total: 161 1887 952.9 1919 3512
Percentage of the requests served within a certain time (ms)
50% 1919
66% 2412
75% 2678
80% 2841
90% 3208
95% 3366
98% 3456
99% 3481
100% 3512 (longest request)
Wiresharkで確認すると、きれいに分散されていますね。

ipvsモード(sh:送信元IPハッシュ)に変更する
ipvsモード(sh:送信元IPハッシュ)も使ってみました。
今回の試験環境だとクライアントIPは一つなので、
想定通り一つのPod(コンテナ)に偏っています。

おわりに
前回知ったipvsモードを使ってみましたが、なかなか面白いですね♪
処理する通信の要件にもよるけど、lc(最小コネクション)がいいかなと思う次第。
また、一度LoadBalancer(NW機器)など使ったことあると、
トラフィック分散方式の理解がしやすくてよさそうです。
補足
なお、kubernetesクラスター構築時にkube-proxyのモードをipvsにする場合、
以下を参考にすれば大丈夫ですね。
https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/README.md
