一次kube-controller-manager的bug導致的線上無法排程處理過程
線上的k8s環境普遍版本很低,今天同事心血來潮去一個新節點上去看了下發現根分割槽滿了,清理了後發有一個pod沒有創建出來。清理後把因為檔案系統滿了而down掉的k8s相關程序起來後發現kubelet的log一直報錯cannot remove /var/lib/kubelet/xxxxconfig/key resource busy now 啥的。查看了下describe對應rc還沒有任何Event
$ kubectl describe rc rabbit3rc Name:rabbit3rc Namespace:default Selector:app=rabbitmq-cluster,node=rabbit3 Labels:app=rabbitmq-cluster node=rabbit3 Annotations:<none> Replicas:1 current / 1 desired Pods Status:0 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels:app=rabbitmq-cluster node=rabbit3 Containers: rabbit3: Image:cloud-base/rabbitmq-3.6.5:E3103-PUB-20181015-RC1 Ports:4369/TCP, 5672/TCP, 15672/TCP, 25672/TCP Limits: cpu:16 memory:8Gi Requests: cpu:400m memory:500Mi Liveness:exec [health_check.sh] delay=600s timeout=10s period=15s #success=1 #failure=3 Environment: RABBITMQ_DEFAULT_USER:xxx RABBITMQ_DEFAULT_PASS:xxx RABBITMQ_ERLANG_COOKIE:xxx Mounts: /etc/localtime from time (rw) /var/lib/rabbitmq from rabbitmqvar (rw) Volumes: time: Type:HostPath (bare host directory volume) Path:/etc/localtime rabbitmqvar: Type:HostPath (bare host directory volume) Path:/opt/cloud/rabbitmq Events:<none>
在對應的node上看了下發現沒有rabbitmq的容器,只有一個pause的容器,但是狀態是Dead並且很多這樣的而且刪不掉。然後重啟了下docker後雖然Dead的沒了但是等被拉起來後對應的rabiitmq和它的pause容器都沒有。懷疑排程有問題。
這個低版本叢集的HA這塊兒方案一直很迷,從程序存活來判斷leader的機器後用systemctl檢視上面的kube-controller-manager發現如下資訊
Failed to update lock: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again
其他節點的controller認為它是leader但是它一直報錯無法獲取選舉鎖,然後複製systemd啟動引數手動啟動調整loglevel為8發現資訊為如下
I0322 20:26:36.96693134218 round_trippers.go:395] PUT https://100.68.24.2:6443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager I0322 20:26:36.96693834218 round_trippers.go:402] Request Headers: I0322 20:26:36.96694434218 round_trippers.go:405]Accept: application/vnd.kubernetes.protobuf, */* I0322 20:26:36.96695134218 round_trippers.go:405]Content-Type: application/vnd.kubernetes.protobuf I0322 20:26:36.96695634218 round_trippers.go:405]User-Agent: kube-controller-manager/v1.6.7+095136c3078cc (linux/amd64) kubernetes/095136c/leader-election I0322 20:26:36.96772634218 round_trippers.go:420] Response Status: 409 Conflict in 0 milliseconds I0322 20:26:36.96773834218 round_trippers.go:423] Response Headers: I0322 20:26:36.96774434218 round_trippers.go:426]Content-Type: application/vnd.kubernetes.protobuf I0322 20:26:36.96774934218 round_trippers.go:426]Content-Length: 259 I0322 20:26:36.96775434218 round_trippers.go:426]Date: Fri, 22 Mar 2019 12:26:36 GMT I0322 20:26:36.96788834218 request.go:988] Response Body: 000000006b 38 73 00 0a 0c 0a 0276 31 12 06 53 74 61 74|k8s.....v1..Stat| 0000001075 73 12 ea 01 0a 04 0a00 12 00 12 07 46 61 69|us...........Fai| 000000206c 75 72 65 1a a1 01 4f70 65 72 61 74 69 6f 6e|lure...Operation| 0000003020 63 61 6e 6e 6f 74 2062 65 20 66 75 6c 66 69| cannot be fulfi| 000000406c 6c 65 64 20 6f 6e 2065 6e 64 70 6f 69 6e 74|lled on endpoint| 0000005073 20 22 6b 75 62 65 2d63 6f 6e 74 72 6f 6c 6c|s "kube-controll| 0000006065 72 2d 6d 61 6e 61 6765 72 22 3a 20 74 68 65|er-manager": the| 0000007020 6f 62 6a 65 63 74 2068 61 73 20 62 65 65 6e| object has been| 0000008020 6d 6f 64 69 66 69 6564 3b 20 70 6c 65 61 73| modified; pleas| 0000009065 20 61 70 70 6c 79 2079 6f 75 72 20 63 68 61|e apply your cha| 000000a06e 67 65 73 20 74 6f 2074 68 65 20 6c 61 74 65|nges to the late| 000000b073 74 20 76 65 72 73 696f 6e 20 61 6e 64 20 74|st version and t| 000000c072 79 20 61 67 61 69 6e22 08 43 6f 6e 66 6c 69|ry again".Confli| 000000d063 74 2a 28 0a 17 6b 7562 65 2d 63 6f 6e 74 72|ct*(..kube-contr| 000000e06f 6c 6c 65 72 2d 6d 616e 61 67 65 72 12 00 1a|oller-manager...| 000000f009 65 6e 64 70 6f 69 6e74 73 28 00 30 99 03 1a|.endpoints(.0...| 0000010000 22 00|.".| E0322 20:26:36.96796034218 leaderelection.go:263] Failed to update lock: Operation cannot be fulfilled on endpoints "kube-controller-manager": the object has been modified; please apply your changes to the latest version and try again I0322 20:26:36.96797134218 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager ^C
去官方程式碼倉庫準備搜下程式碼看看選舉邏輯是如何獲得鎖的,結果根本看不懂,於是想著看看有沒有人研究過選舉邏輯啥的結果。按照關鍵詞搜到了一個國外文章http://gogosatellite.blogspot.com/2017/07/how-to-setup-high-availability.html ,裡面一行輸出日誌給了我排查方向
I0607 11:04:32.48550217291 leaderelection.go:248] lock is held by kuberm and has not yet expired I0607 11:04:32.48550617291 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager I0607 11:04:36.26303217291 round_trippers.go:417] GET http://172.16.155.165:8080/api/v1/namespaces/kube-system/endpoints/kube-controller-manager 200 OK in 1 milliseconds I0607 11:04:36.26312217291 leaderelection.go:248] lock is held by kuberm and has not yet expired I0607 11:04:36.26312517291 leaderelection.go:185] failed to acquire lease kube-system/kube-controller-manager
猜測到是所有kube-controller-manager請求apiserver然後競爭這個ep來獲得鎖,用kubectl請求了下這個ep發現如下資訊並且一直不會改變,annotations
的裡的holderIdentity
欄位是當前leader的node
$ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "87e9ff0a-388b-11e9-949b-0cda411d3f00", "resourceVersion": "36217274", "creationTimestamp": "2019-02-24T23:25:54Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"xxxxx{nodename}xxxxxx\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-02-24T23:25:54Z\",\"renewTime\":\"2019-03-17T11:20:08Z\",\"leaderTransitions\":0}" } }, "subsets": [] } $ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "87e9ff0a-388b-11e9-949b-0cda411d3f00", "resourceVersion": "36217274", "creationTimestamp": "2019-02-24T23:25:54Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-02-24T23:25:54Z\",\"renewTime\":\"2019-03-17T11:20:08Z\",\"leaderTransitions\":0}" } }, "subsets": [] }
在我自己搭建的新叢集上查看了下發現每時每刻resourceVersion
一直在改變,而對比上面線上的卻沒有任何變化
[root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52752", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } } [root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52772", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } }
為了確定欄位holderIdentity
是leader,便嘗試通過關閉kube-controller-manager
看看欄位會不會變,發現真的改變並且會有新的ep又出現了(注意看creationTimestamp
實際上不是生成)
[root@k8s-m1 Kubernetes-ansible]# systemctl stop kube-controller-manager.service [root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52819", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } } [root@k8s-m1 Kubernetes-ansible]# kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "0915773e-4c4d-11e9-a0b8-fa163e4edb6a", "resourceVersion": "52819", "creationTimestamp": "2019-03-22T02:48:56Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"k8s-m1_00dbe494-4c4d-11e9-a89f-fa163ed10d54\",\"leaseDurationSeconds\":15,\"aransitions\":1}" } } }
基本就確定了某些原因(可能檔案系統慢導致的也可能其他),於是生產環境上刪除ep來踢出無法修改的leader資訊,發現雖然還是他自己獲取到leader,但是resourceVersion
開始重新整理了,而controller的日誌裡也沒報錯了
$ kubectl -n kube-system delete ep kube-controller-manager endpoints "kube-controller-manager" deleted $kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "dec669dd-4c9f-11e9-949b-0cda411d3f00", "resourceVersion": "37542637", "creationTimestamp": "2019-03-22T12:41:53Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-03-22T12:41:53Z\",\"renewTime\":\"2019-03-22T12:41:53Z\",\"leaderTransitions\":0}" } }, "subsets": [] } $ kubectl get --raw /api/v1/namespaces/kube-system/endpoints/kube-controller-manager | jq . { "kind": "Endpoints", "apiVersion": "v1", "metadata": { "name": "kube-controller-manager", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/endpoints/kube-controller-manager", "uid": "dec669dd-4c9f-11e9-949b-0cda411d3f00", "resourceVersion": "37542785", "creationTimestamp": "2019-03-22T12:41:53Z", "annotations": { "control-plane.alpha.kubernetes.io/leader": "{\"holderIdentity\":\"Xxxxxxx-S02\",\"leaseDurationSeconds\":15,\"acquireTime\":\"2019-03-22T12:41:53Z\",\"renewTime\":\"2019-03-22T12:41:59Z\",\"leaderTransitions\":0}" } }, "subsets": []
檢視檢視rc描述和對應的pod都起來了
kubectl describe rc rabbit3rc Name:rabbit3rc Namespace:default Selector:app=rabbitmq-cluster,node=rabbit3 Labels:app=rabbitmq-cluster node=rabbit3 Annotations:<none> Replicas:1 current / 1 desired Pods Status:1 Running / 0 Waiting / 0 Succeeded / 0 Failed ...