Opensourcetechブログ

OpensourcetechによるNGINX/Kubernetes/Zabbix/Neo4j/Linuxなどオープンソース技術に関するブログです。

kube-proxyのmodeをiptablesからipvsに変更する


LinuCエヴァンジェリストの鯨井貴博@opensourcetechです。


はじめに
前回(kubernetesにおけるPodへの負荷分散状況の確認(Service/Deployment経由))の記事でトラフィックの分散をkube-proxyが担っていることが分かりました。
なので今回さらに深堀りということで、kube-proxyの動作モードをデフォルト(iptables)からipvsに変更してトラフィック分散の挙動を確認してみようと思います。

なお、kubernetesはこちらの記事で作ったものを使っています。


kube-proxy動作モードの確認
まず、特にモード指定せずにインストールしたkubernetes環境でkube-proxyのモードを確認します。

kube-proxy(pod)のログからiptablesモードで動いていることが分かります。
I0314 13:48:24.900100 1 server_others.go:535] "Using iptables proxy" I0314 13:48:24.931674 1 server_others.go:176] "Using iptables Proxier"

kubeuser@master01:~$ kubectl get pods -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
calico-kube-controllers-57b57c56f-p6xds   1/1     Running   0          12d
calico-node-phmkb                         1/1     Running   0          12d
calico-node-wjdqx                         1/1     Running   0          12d
calico-node-xdkfv                         1/1     Running   0          12d
coredns-787d4945fb-6n79l                  1/1     Running   0          12d
coredns-787d4945fb-dfplr                  1/1     Running   0          12d
etcd-master01                             1/1     Running   3          12d
kube-apiserver-master01                   1/1     Running   2          12d
kube-controller-manager-master01          1/1     Running   0          12d
kube-proxy-2n7b2                          1/1     Running   0          12d
kube-proxy-7k425                          1/1     Running   0          12d
kube-proxy-c5pkt                          1/1     Running   0          12d
kube-scheduler-master01                   1/1     Running   3          12d
metrics-server-6b6f9ccc7-qmtb9            1/1     Running   0          130m

kubeuser@master01:~$ kubectl logs kube-proxy-2n7b2 -n kube-system
I0314 13:48:24.898834       1 node.go:163] Successfully retrieved node IP: 192.168.1.45
I0314 13:48:24.899245       1 server_others.go:109] "Detected node IP" address="192.168.1.45"
I0314 13:48:24.900100       1 server_others.go:535] "Using iptables proxy"
I0314 13:48:24.931674       1 server_others.go:176] "Using iptables Proxier"
I0314 13:48:24.931939       1 server_others.go:183] "kube-proxy running in dual-stack mode" ipFamily=IPv4
I0314 13:48:24.932054       1 server_others.go:184] "Creating dualStackProxier for iptables"
I0314 13:48:24.932195       1 proxier.go:242] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses"
I0314 13:48:25.005296       1 server.go:655] "Version info" version="v1.26.0"
I0314 13:48:25.005692       1 server.go:657] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0314 13:48:25.009442       1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=131072
I0314 13:48:25.012672       1 config.go:317] "Starting service config controller"
I0314 13:48:25.012821       1 shared_informer.go:273] Waiting for caches to sync for service config
I0314 13:48:25.012983       1 config.go:226] "Starting endpoint slice config controller"
I0314 13:48:25.013049       1 shared_informer.go:273] Waiting for caches to sync for endpoint slice config
I0314 13:48:25.014118       1 config.go:444] "Starting node config controller"
I0314 13:48:25.014369       1 shared_informer.go:273] Waiting for caches to sync for node config
I0314 13:48:25.198054       1 shared_informer.go:280] Caches are synced for service config
I0314 13:48:25.198100       1 shared_informer.go:280] Caches are synced for endpoint slice config
I0314 13:48:25.199072       1 shared_informer.go:280] Caches are synced for node config
E0316 23:45:26.858242       1 service_health.go:141] "Failed to start healthcheck" err="listen tcp :31402: bind: address already in use" node="worker01" service="ingress-nginx/ingress-nginx-controller" port=31402
E0316 23:45:26.922288       1 service_health.go:141] "Failed to start healthcheck" err="listen tcp :31402: bind: address already in use" node="worker01" service="ingress-nginx/ingress-nginx-controller" port=31402
E0316 23:45:27.838639       1 event_broadcaster.go:253] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"ingress-nginx-controller.174d0afedb1b6a4c", GenerateName:"", Namespace:"ingress-nginx", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, EventTime:time.Date(2023, time.March, 16, 23, 45, 25, 793025688, time.Local), Series:(*v1.EventSeries)(0xc00014d7c0), ReportingController:"kube-proxy", ReportingInstance:"kube-proxy-worker01", Action:"Listen", Reason:"FailedToStartServiceHealthcheck", Regarding:v1.ObjectReference{Kind:"Service", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"ingress-nginx/ingress-nginx-controller", APIVersion:"", ResourceVersion:"", FieldPath:""}, Related:(*v1.ObjectReference)(nil), Note:"node worker01 failed to start healthcheck \"ingress-
.
.
.
.


また、podの設定としては/var/lib/kube-proxy/config.conf
mode: ""とある部分でデフォルトのiptablesモードとなっていることが分かります。

kubeuser@master01:~$ kubectl describe pods kube-proxy-2n7b2 -n kube-system
Name:                 kube-proxy-2n7b2
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      kube-proxy
Node:                 worker01/192.168.1.45
Start Time:           Tue, 14 Mar 2023 13:48:20 +0000
Labels:               controller-revision-hash=78545cdb7d
                      k8s-app=kube-proxy
                      pod-template-generation=1
Annotations:          <none>
Status:               Running
IP:                   192.168.1.45
IPs:
  IP:           192.168.1.45
Controlled By:  DaemonSet/kube-proxy
Containers:
  kube-proxy:
    Container ID:  containerd://ed9b81bf010560e2f3175dfd4f1e0415f8bc32ab57b4d2beb15822a66c31c6f0
    Image:         registry.k8s.io/kube-proxy:v1.26.0
    Image ID:      registry.k8s.io/kube-proxy@sha256:1e9bbe429e4e2b2ad32681c91deb98a334f1bf4135137df5f84f9d03689060fe
    Port:          <none>
    Host Port:     <none>
    Command:
      /usr/local/bin/kube-proxy
      --config=/var/lib/kube-proxy/config.conf
      --hostname-override=$(NODE_NAME)
    State:          Running
      Started:      Tue, 14 Mar 2023 13:48:22 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
    Mounts:
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /var/lib/kube-proxy from kube-proxy (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-44qpn (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-proxy:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-proxy
    Optional:  false
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  
  kube-api-access-44qpn:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:                      <none>



kubeuser@master01:~$ kubectl exec kube-proxy-2n7b2 -n kube-system cat /var/lib/kube-proxy/config.conf
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
bindAddressHardFail: false
clientConnection:
  acceptContentTypes: ""
  burst: 0
  contentType: ""
  kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
  qps: 0
clusterCIDR: 10.0.0.0/16,fd12:b5e0:383e::/64
configSyncPeriod: 0s
conntrack:
  maxPerCore: null
  min: null
  tcpCloseWaitTimeout: null
  tcpEstablishedTimeout: null
detectLocal:
  bridgeInterface: ""
  interfaceNamePrefix: ""
detectLocalMode: ""
enableProfiling: false
healthzBindAddress: ""
hostnameOverride: ""
iptables:
  localhostNodePorts: null
  masqueradeAll: false
  masqueradeBit: null
  minSyncPeriod: 0s
  syncPeriod: 0s
ipvs:
  excludeCIDRs: null
  minSyncPeriod: 0s
  scheduler: ""
  strictARP: false
  syncPeriod: 0s
  tcpFinTimeout: 0s
  tcpTimeout: 0s
  udpTimeout: 0s
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: ""
nodePortAddresses: null
oomScoreAdj: null
portRange: ""
showHiddenMetricsForVersion: ""
winkernel:
  enableDSR: false
  forwardHealthCheckVip: false
  networkName: ""
  rootHnsEndpointName: ""
  sourceVip: ""


kubernetesのドキュメントにも、Linuxではiptablesモードがデフォルトと書かれています。



iptablesモード
iptablesモードは、kube-proxyはバックエンドPodをランダムで選択する動きをします。


https://kubernetes.io/ja/docs/concepts/services-networking/service/#proxy-mode-iptables

ランダムで選択とはいえ、実際にはざっくり均等になるという結果でした。
kubernetesにおけるPodへの負荷分散状況の確認(Service/Deployment経由))


ipvsモード
ipvsモードは、低いレイテンシーでトラフィックをリダイレクトする、
様々な分散方式などの特徴があります。



https://kubernetes.io/ja/docs/concepts/services-networking/service/#proxy-mode-iptables


ipvsモード(rr:ラウンドロビン)に変更する
では、iptablesモードからipvsモード(rr:ラウンドロビン)に変更していきます。
まずipvsカーネルモジュールが有効になっているか確認します。
※無効な場合には、モジュールのインストールや有効化(modprobe)を実施します。
 カーネルモジュールの操作については、こちらを参考にしてください。

kubeuser@master01:~$ lsmod | grep ip_vs
ip_vs_sed              16384  0
ip_vs_nq               16384  0
ip_vs_dh               16384  0
ip_vs_lc               16384  0
ip_vs_sh               16384  25
ip_vs_wrr              16384  0
ip_vs_rr               16384  0
ip_vs                 176128  39 ip_vs_rr,ip_vs_dh,ip_vs_sh,ip_vs_nq,ip_vs_wrr,ip_vs_lc,ip_vs_sed
nf_conntrack          172032  6 xt_conntrack,nf_nat,xt_nat,nf_conntrack_netlink,xt_MASQUERADE,ip_vs
nf_defrag_ipv6         24576  2 nf_conntrack,ip_vs
libcrc32c              16384  6 nf_conntrack,nf_nat,btrfs,nf_tables,raid456,ip_vs


kube-proxyの設定は、configmapで定義されています。
そのため、該当箇所(ipvsのschedulermode)を変更します。

kubeuser@master01:~$ kubectl get cm -n kube-system
NAME                                 DATA   AGE
calico-config                        4      12d
coredns                              1      12d
extension-apiserver-authentication   6      12d
kube-proxy                           2      12d
kube-root-ca.crt                     1      12d
kubeadm-config                       1      12d
kubelet-config                       1      12d

kubeuser@master01:~$ kubectl get cm -n kube-system kube-proxy -o yaml
apiVersion: v1
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    bindAddressHardFail: false
    clientConnection:
      acceptContentTypes: ""
      burst: 0
      contentType: ""
      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 0
    clusterCIDR: 10.0.0.0/16,fd12:b5e0:383e::/64
    configSyncPeriod: 0s
    conntrack:
      maxPerCore: null
      min: null
      tcpCloseWaitTimeout: null
      tcpEstablishedTimeout: null
    detectLocal:
      bridgeInterface: ""
      interfaceNamePrefix: ""
    detectLocalMode: ""
    enableProfiling: false
    healthzBindAddress: ""
    hostnameOverride: ""
    iptables:
      localhostNodePorts: null
      masqueradeAll: false
      masqueradeBit: null
      minSyncPeriod: 0s
      syncPeriod: 0s
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: ""
      strictARP: false
      syncPeriod: 0s
      tcpFinTimeout: 0s
      tcpTimeout: 0s
      udpTimeout: 0s
    kind: KubeProxyConfiguration
    metricsBindAddress: ""
    mode: ""
    nodePortAddresses: null
    oomScoreAdj: null
    portRange: ""
    showHiddenMetricsForVersion: ""
    winkernel:
      enableDSR: false
      forwardHealthCheckVip: false
      networkName: ""
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    bindAddress: 0.0.0.0
    bindAddressHardFail: false
    clientConnection:
      acceptContentTypes: ""
      burst: 0
      contentType: ""
      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 0
    clusterCIDR: 10.0.0.0/16,fd12:b5e0:383e::/64
    configSyncPeriod: 0s
    conntrack:
      maxPerCore: null
      rootHnsEndpointName: ""
      sourceVip: ""
  kubeconfig.conf: |-
    apiVersion: v1
    kind: Config
    clusters:
    - cluster:
        certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        server: https://master01:6443
      name: default
    contexts:
    - context:
        cluster: default
        namespace: default
        user: default
      name: default
    current-context: default
    users:
    - name: default
      user:
        tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
  annotations:
    kubeadm.kubernetes.io/component-config.hash: sha256:e1193cefc1046d8fe6dbffed62d04e546bd4142781cc28f632adfe72f499e4be
  creationTimestamp: "2023-03-14T13:45:56Z"
  labels:
    app: kube-proxy
  name: kube-proxy
  namespace: kube-system
  resourceVersion: "271"
  uid: 687e9c38-9699-46dc-9fbb-5de6bf270781


kubeuser@master01:~$ kubectl edit cm -n kube-system kube-proxy
configmap/kube-proxy edited

kubeuser@master01:~$ kubectl get cm kube-proxy -n kube-system -o yaml
.
.
.
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: "rr"
      strictARP: false
      syncPeriod: 0s
      tcpFinTimeout: 0s
      tcpTimeout: 0s
      udpTimeout: 0s
    kind: KubeProxyConfiguration
    metricsBindAddress: ""
    mode: "ipvs"
.
.
.


更新したconfigmapの内容を反映させるため、
kube-proxy(pod)を削除します。
※podは、その後 自動で起動します。

kubeuser@master01:~$ kubectl delete pods kube-proxy-2n7b2 -n kube-system
pod "kube-proxy-2n7b2" deleted

kubeuser@master01:~$ kubectl delete pods kube-proxy-7k425 -n kube-system
pod "kube-proxy-7k425" deleted

kubeuser@master01:~$ kubectl delete pods kube-proxy-c5pkt -n kube-system
pod "kube-proxy-c5pkt" deleted


起動したkube-proxyのpodは、
I0326 14:20:14.047849 1 server_others.go:248] "Using ipvs Proxier"
I0326 14:20:14.048263 1 server_others.go:250] "Creating dualStackProxier for ipvs"
となっていますね。

kubeuser@master01:~$ kubectl get pods -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
calico-kube-controllers-57b57c56f-p6xds   1/1     Running   0          12d
calico-node-phmkb                         1/1     Running   0          12d
calico-node-wjdqx                         1/1     Running   0          12d
calico-node-xdkfv                         1/1     Running   0          12d
coredns-787d4945fb-6n79l                  1/1     Running   0          12d
coredns-787d4945fb-dfplr                  1/1     Running   0          12d
etcd-master01                             1/1     Running   3          12d
kube-apiserver-master01                   1/1     Running   2          12d
kube-controller-manager-master01          1/1     Running   0          12d
kube-proxy-2sdm6                          1/1     Running   0          20h
kube-proxy-dvrph                          1/1     Running   0          20h
kube-proxy-gpp8x                          1/1     Running   0          20h
kube-scheduler-master01                   1/1     Running   3          12d
metrics-server-6b6f9ccc7-qmtb9            1/1     Running   0          23h

kubeuser@master01:~$ kubectl logs kube-proxy-2sdm6 -n kube-system
I0326 14:20:10.451704       1 node.go:163] Successfully retrieved node IP: 192.168.1.45
I0326 14:20:10.452153       1 server_others.go:109] "Detected node IP" address="192.168.1.45"
I0326 14:20:14.047849       1 server_others.go:248] "Using ipvs Proxier"
I0326 14:20:14.048263       1 server_others.go:250] "Creating dualStackProxier for ipvs"
I0326 14:20:14.162812       1 proxier.go:462] "IPVS scheduler not specified, use rr by default"
I0326 14:20:14.163387       1 proxier.go:462] "IPVS scheduler not specified, use rr by default"
I0326 14:20:14.163447       1 ipset.go:116] "Ipset name truncated" ipSetName="KUBE-6-LOAD-BALANCER-SOURCE-CIDR" truncatedName="KUBE-6-LOAD-BALANCER-SOURCE-CID"
I0326 14:20:14.163502       1 ipset.go:116] "Ipset name truncated" ipSetName="KUBE-6-NODE-PORT-LOCAL-SCTP-HASH" truncatedName="KUBE-6-NODE-PORT-LOCAL-SCTP-HAS"
I0326 14:20:14.164023       1 server.go:655] "Version info" version="v1.26.0"
I0326 14:20:14.164057       1 server.go:657] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
I0326 14:20:14.489671       1 conntrack.go:52] "Setting nf_conntrack_max" nf_conntrack_max=131072
I0326 14:20:14.541448       1 config.go:444] "Starting node config controller"
I0326 14:20:14.541714       1 shared_informer.go:273] Waiting for caches to sync for node config
I0326 14:20:14.542439       1 config.go:317] "Starting service config controller"
I0326 14:20:14.542471       1 shared_informer.go:273] Waiting for caches to sync for service config
I0326 14:20:14.542730       1 config.go:226] "Starting endpoint slice config controller"
I0326 14:20:14.542800       1 shared_informer.go:273] Waiting for caches to sync for endpoint slice config
I0326 14:20:14.922644       1 shared_informer.go:280] Caches are synced for endpoint slice config
I0326 14:20:15.241990       1 shared_informer.go:280] Caches are synced for node config
I0326 14:20:15.343632       1 shared_informer.go:280] Caches are synced for service config
E0326 14:20:17.079817       1 service_health.go:141] "Failed to start healthcheck" err="listen tcp :31402: bind: address already in use" node="worker01" service="ingress-nginx/ingress-nginx-controller" port=31402



トラフィック分散の確認
ipvsモード(rr:ラウンドロビン)の挙動を確認します。
前回同様に、
XAMPP(apache bench)で1,000リクエスト投げてみます。

c:\xampp\apache\bin>ab.exe -n 1000 -c 1000 http://192.168.1.51/
This is ApacheBench, Version 2.3 <$Revision: 1901567 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.1.51 (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        nginx/1.23.0
Server Hostname:        192.168.1.51
Server Port:            80

Document Path:          /
Document Length:        256 bytes

Concurrency Level:      1000
Time taken for tests:   3.660 seconds
Complete requests:      1000
Failed requests:        333
   (Connect: 0, Receive: 0, Length: 333, Exceptions: 0)
Total transferred:      489333 bytes
HTML transferred:       256333 bytes
Requests per second:    273.25 [#/sec] (mean)
Time per request:       3659.664 [ms] (mean)
Time per request:       3.660 [ms] (mean, across all concurrent requests)
Transfer rate:          130.58 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1    3   2.2      3      21
Processing:   159 1884 953.1   1917    3509
Waiting:        5 1807 990.5   1842    3491
Total:        161 1887 952.9   1919    3512

Percentage of the requests served within a certain time (ms)
  50%   1919
  66%   2412
  75%   2678
  80%   2841
  90%   3208
  95%   3366
  98%   3456
  99%   3481
 100%   3512 (longest request)


Wiresharkで確認すると、きれいに分散されていますね。



ipvsモード(sh:送信元IPハッシュ)に変更する
ipvsモード(sh:送信元IPハッシュ)も使ってみました。
今回の試験環境だとクライアントIPは一つなので、
想定通り一つのPod(コンテナ)に偏っています。



おわりに
前回知ったipvsモードを使ってみましたが、なかなか面白いですね♪
処理する通信の要件にもよるけど、lc(最小コネクション)がいいかなと思う次第。
また、一度LoadBalancer(NW機器)など使ったことあると、
トラフィック分散方式の理解がしやすくてよさそうです。


補足
なお、kubernetesクラスター構築時にkube-proxyのモードをipvsにする場合、
以下を参考にすれば大丈夫ですね。
https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/README.md

Opensourcetech by Takahiro Kujirai