LinuCエヴァンジェリストの鯨井貴博@opensourcetechです。
はじめに
今回は、Kubernetesのデータベースであるetcdのバックアップです。
etcdはkubernetesに関する構成情報などを格納しているので冗長構成としますが、
さらに定期的なバックアップをして耐障害性を高めます。
データのありか
etcdのデータ格納先ですが、etcd導入に使われたyamlファイルに記載されています。
具体的には、19行目にある" - --data-dir=/var/lib/etcd"です。
kubeuser@kubemaster1:~$ sudo cat -n /etc/kubernetes/manifests/etcd.yaml 1 apiVersion: v1 2 kind: Pod 3 metadata: 4 annotations: 5 kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.1.251:2379 6 creationTimestamp: null 7 labels: 8 component: etcd 9 tier: control-plane 10 name: etcd 11 namespace: kube-system 12 spec: 13 containers: 14 - command: 15 - etcd 16 - --advertise-client-urls=https://192.168.1.251:2379 17 - --cert-file=/etc/kubernetes/pki/etcd/server.crt 18 - --client-cert-auth=true 19 - --data-dir=/var/lib/etcd 20 - --initial-advertise-peer-urls=https://192.168.1.251:2380 21 - --initial-cluster=kubemaster1=https://192.168.1.251:2380 22 - --key-file=/etc/kubernetes/pki/etcd/server.key 23 - --listen-client-urls=https://127.0.0.1:2379,https://192.168.1.251:2379 24 - --listen-metrics-urls=http://127.0.0.1:2381 25 - --listen-peer-urls=https://192.168.1.251:2380 26 - --name=kubemaster1 27 - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt 28 - --peer-client-cert-auth=true 29 - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key 30 - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt 31 - --snapshot-count=10000 32 - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt 33 image: k8s.gcr.io/etcd:3.4.3-0 34 imagePullPolicy: IfNotPresent 35 livenessProbe: 36 failureThreshold: 8 37 httpGet: 38 host: 127.0.0.1 39 path: /health 40 port: 2381 41 scheme: HTTP 42 initialDelaySeconds: 15 43 timeoutSeconds: 15 44 name: etcd 45 resources: {} 46 volumeMounts: 47 - mountPath: /var/lib/etcd 48 name: etcd-data 49 - mountPath: /etc/kubernetes/pki/etcd 50 name: etcd-certs 51 hostNetwork: true 52 priorityClassName: system-cluster-critical 53 volumes: 54 - hostPath: 55 path: /etc/kubernetes/pki/etcd 56 type: DirectoryOrCreate 57 name: etcd-certs 58 - hostPath: 59 path: /var/lib/etcd 60 type: DirectoryOrCreate 61 name: etcd-data 62 status: {} kubeuser@kubemaster1:~$ sudo grep data-dir /etc/kubernetes/manifests/etcd.yaml - --data-dir=/var/lib/etcd
etcdコンテナへのログイン
今回操作するkubernetes環境では、etcdがPod(コンテナ)として稼働しています。
kubeuser@kubemaster1:~$ kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-77ff9c69dd-m5jbz 1/1 Running 1 5d calico-node-7t474 1/1 Running 0 5d calico-node-bdw22 1/1 Running 0 5d calico-node-ftjg8 1/1 Running 0 5d calico-node-lk26d 1/1 Running 0 4d23h calico-node-qphzt 1/1 Running 0 4d23h coredns-66bff467f8-cpd25 1/1 Running 0 5d coredns-66bff467f8-wtww9 1/1 Running 0 5d etcd-kubemaster1 1/1 Running 1 5d ・・・・ここ etcd-kubemaster2 1/1 Running 2 5d ・・・・ここ etcd-kubemaster3 1/1 Running 1 4d23h ・・・・ここ kube-apiserver-kubemaster1 1/1 Running 3 5d kube-apiserver-kubemaster2 1/1 Running 1 5d kube-apiserver-kubemaster3 1/1 Running 1 4d23h kube-controller-manager-kubemaster1 1/1 Running 6 5d kube-controller-manager-kubemaster2 1/1 Running 6 5d kube-controller-manager-kubemaster3 1/1 Running 5 4d23h kube-proxy-47qm5 1/1 Running 0 4d23h kube-proxy-6gq55 1/1 Running 0 5d kube-proxy-flcg2 1/1 Running 0 5d kube-proxy-xqvdx 1/1 Running 0 5d kube-proxy-xv78r 1/1 Running 0 4d23h kube-scheduler-kubemaster1 1/1 Running 7 5d kube-scheduler-kubemaster2 1/1 Running 5 5d kube-scheduler-kubemaster3 1/1 Running 4 4d23h
では、そのコンテナへログインします。
ログイン後、使用するコマンドetcdctlのヘルプをetcdctl -hでオプションなどを確認しておきます。
kubeuser@kubemaster1:~$ kubectl -n kube-system exec -it etcd-kubemaster1 -- sh # etcdctl -h NAME: etcdctl - A simple command line client for etcd3. USAGE: etcdctl [flags] VERSION: 3.4.3 API VERSION: 3.4 COMMANDS: alarm disarm Disarms all alarms alarm list Lists all alarms auth disable Disables authentication auth enable Enables authentication check datascale Check the memory usage of holding data for different workloads on a given server endpoint. check perf Check the performance of the etcd cluster compaction Compacts the event history in etcd defrag Defragments the storage of the etcd members with given endpoints del Removes the specified key or range of keys [key, range_end) elect Observes and participates in leader election endpoint hashkv Prints the KV history hash for each endpoint in --endpoints endpoint health Checks the healthiness of endpoints specified in `--endpoints` flag endpoint status Prints out the status of endpoints specified in `--endpoints` flag get Gets the key or a range of keys help Help about any command lease grant Creates leases lease keep-alive Keeps leases alive (renew) lease list List all active leases lease revoke Revokes leases lease timetolive Get lease information lock Acquires a named lock make-mirror Makes a mirror at the destination etcd cluster member add Adds a member into the cluster member list Lists all members in the cluster member promote Promotes a non-voting member in the cluster member remove Removes a member from the cluster member update Updates a member in the cluster migrate Migrates keys in a v2 store to a mvcc store move-leader Transfers leadership to another etcd cluster member. put Puts the given key into the store role add Adds a new role role delete Deletes a role role get Gets detailed information of a role role grant-permission Grants a key to a role role list Lists all roles role revoke-permission Revokes a key from a role snapshot restore Restores an etcd member snapshot to an etcd directory snapshot save Stores an etcd node backend snapshot to a given file snapshot status Gets backend snapshot status of a given file txn Txn processes all the requests in one transaction user add Adds a new user user delete Deletes a user user get Gets detailed information of a user user grant-role Grants a role to a user user list Lists all users user passwd Changes password of user user revoke-role Revokes a role from a user version Prints the version of etcdctl watch Watches events stream on keys or prefixes OPTIONS: --cacert="" verify certificates of TLS-enabled secure servers using this CA bundle --cert="" identify secure client using this TLS certificate file --command-timeout=5s timeout for short running command (excluding dial timeout) --debug[=false] enable client-side debug logging --dial-timeout=2s dial timeout for client connections -d, --discovery-srv="" domain name to query for SRV records describing cluster endpoints --discovery-srv-name="" service name to query when using DNS discovery --endpoints=[127.0.0.1:2379] gRPC endpoints -h, --help[=false] help for etcdctl --hex[=false] print byte strings as hex encoded strings --insecure-discovery[=true] accept insecure SRV records describing cluster endpoints --insecure-skip-tls-verify[=false] skip server certificate verification --insecure-transport[=true] disable transport security for client connections --keepalive-time=2s keepalive time for client connections --keepalive-timeout=6s keepalive timeout for client connections --key="" identify secure client using this TLS key file --password="" password for authentication (if this option is used, --user option shouldn't include password) --user="" username[:password] for authentication (prompt if password is not supplied) -w, --write-out="simple" set the output format (fields, json, protobuf, simple, table)
また、etcdctl実行時に指定するPKI(証明書・秘密鍵など)を確認します。
# cd /etc/kubernetes/pki/etcd # pwd /etc/kubernetes/pki/etcd # ls ca.crt healthcheck-client.crt peer.crt server.crt ca.key healthcheck-client.key peer.key server.key # exit
なお、少し前の記事でetcdctlを使ってみたので、参考にどうぞ。
https://www.opensourcetech.tokyo/entry/20211021/1634747642
データベース状態の確認
実行には、先ほど確認したPKI情報を使います。
kubeuser@kubemaster1:~$ kubectl -n kube-system exec -it etcd-kubemaster1 -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl endpoint health" 127.0.0.1:2379 is healthy: successfully committed proposal: took = 249.182775ms
etcdデータベース数の確認
3台のetcd(Master Node)で構成されているので、一応確認。
kubeuser@kubemaster1:~$ kubectl -n kube-system exec -it etcd-kubemaster1 -- sh -c "ETCDCTL_API=3 etcdctl --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kub ernetes/pki/etcd//peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt --endpoints= https://127.0.0.1:2379 member list" 1b1c43fd2a12bac1, started, kubemaster1, https://192.168.1.251:2380, https://192.168.1.251:2379, false 2e9d81b870c2839b, started, kubemaster2, https://192.168.1.252:2380, https://192.168.1.252:2379, false 83ceb25956e47fcd, started, kubemaster3, https://192.168.1.249:2380, https://192.168.1.249:2379, false
表形式での表示も出来ます。
kubeuser@kubemaster1:~$ kubectl -n kube-system exec -it etcd-kubemaster1 -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379 -w table endpoint status --cluster" +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.1.251:2379 | 1b1c43fd2a12bac1 | 3.4.3 | 5.8 MB | false | false | 97 | 1445079 | 1445079 | | | https://192.168.1.252:2379 | 2e9d81b870c2839b | 3.4.3 | 5.7 MB | false | false | 97 | 1445079 | 1445079 | | | https://192.168.1.249:2379 | 83ceb25956e47fcd | 3.4.3 | 5.7 MB | true | false | 97 | 1445079 | 1445079 | | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
データベースバックアップの取得
"etcdctl snapshot save"を使ってバックアップを取得します。
kubeuser@kubemaster1:~$ kubectl -n kube-system exec -it etcd-kubemaster1 -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379 snapshot save /var/lib/etcd/snapshot.db" {"level":"info","ts":1638795365.0601962,"caller":"snapshot/v3_snapshot.go:110","msg":"created temporary db file","path":"/var/lib/etcd/snapshot.db.part"} {"level":"warn","ts":"2021-12-06T12:56:05.069Z","caller":"clientv3/retry_interceptor.go:116","msg":"retry stream intercept"} {"level":"info","ts":1638795365.0704958,"caller":"snapshot/v3_snapshot.go:121","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"} {"level":"info","ts":1638795367.2656121,"caller":"snapshot/v3_snapshot.go:134","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","took":2.205339955} {"level":"info","ts":1638795367.2657769,"caller":"snapshot/v3_snapshot.go:143","msg":"saved","path":"/var/lib/etcd/snapshot.db"} Snapshot saved at /var/lib/etcd/snapshot.db kubeuser@kubemaster1:~$ date Mon Dec 6 12:56:35 UTC 2021 kubeuser@kubemaster1:~$ sudo ls -l /var/lib/etcd [sudo] password for kubeuser: total 5708 drwx------ 4 root root 4096 Dec 1 12:01 member -rw------- 1 root root 5836832 Dec 6 12:56 snapshot.db
その他データのバックアップ
etcdが稼働するMaster Nodeを復旧するために必要なデータも併せてバックアップします。
ubeuser@kubemaster1:~$ mkdir $HOME/backup kubeuser@kubemaster1:~$ sudo cp /var/lib/etcd/snapshot.db $HOME/backup/snapshot.db-$(date +%m-%d-%y) kubeuser@kubemaster1:~$ sudo cp /root/kubeadm-config.yaml $HOME/backup cp: cannot stat '/root/kubeadm-config.yaml': No such file or directory kubeuser@kubemaster1:~$ sudo cp /etc/kubernetes/kubeadm-config.yaml $HOME/backup cp: cannot stat '/etc/kubernetes/kubeadm-config.yaml': No such file or directory kubeuser@kubemaster1:~$ sudo cp /etc/kubernetes/ admin.conf controller-manager.conf kubelet.conf manifests/ pki/ scheduler.conf kubeuser@kubemaster1:~$ sudo cp /etc/kubernetes/ admin.conf controller-manager.conf kubelet.conf manifests/ pki/ scheduler.conf kubeuser@kubemaster1:~$ sudo cp /etc/kubernetes/manifests/ etcd.yaml kube-apiserver.yaml kube-controller-manager.yaml kube-scheduler.yaml kubeuser@kubemaster1:~$ kubeuser@kubemaster1:~$ sudo find / -name kubeadm-config.yaml /home/kubeuser/kubeadm-config.yaml ^C kubeuser@kubemaster1:~$ sudo cp /home/kubeuser/kubeadm-config.yaml $HOME/backup kubeuser@kubemaster1:~$ sudo cp -r /etc/kubernetes/pki/etcd/ $HOME/backup/ kubeuser@kubemaster1:~$ ls -l $HOME/backup total 5712 drwxr-xr-x 2 root root 4096 Dec 6 13:05 etcd -rw-r--r-- 1 root root 165 Dec 6 13:04 kubeadm-config.yaml -rw------- 1 root root 5836832 Dec 6 13:01 snapshot.db-12-06-21
補足(データのリストア)
データのリストアは、以下の手順で実施します。
https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#restoring-an-etcd-cluster
おわりに
今回は単発のバックアップ作業でしたが、最新のバックアップデータを保持し続けるために、
cronjobなどで定期的にバックアップ作業が必要となります。