1. 程式人生 > >定位“kubernetes pod卡在ContainerCreating狀態”問題的方法

定位“kubernetes pod卡在ContainerCreating狀態”問題的方法

經過千辛萬苦終於在本地搭建k8s環境後,昨天在除錯的時候有出現了pod卡在ContainerCreating狀態的問題。

這個問題的原因有幾種,我遇到的問題是拉去image失敗,如“image pull failed for gcr.io/google_containers/pause:2.0”。原來k8s預設從gcr.io/google_containers拉去映象,國內網路無法訪問。原來忘了連線VPN了…

問題是比較低階,其實主要是想跟大家分享下定位的方法。主要是通過“kubectl describe pod PodName”指令檢視pod發生的事件,從事件列表中可以查詢到錯誤資訊。

[email protected]
-ubuntu-trusty-64:~/work/k8s-foo$ kubectl run foo --image=hello-world deployment "foo" created [email protected]-ubuntu-trusty-64:~/work/k8s-foo$ kubectl get pods NAME READY STATUS RESTARTS AGE foo-928603113-igh2x 0/1 ContainerCreating 0 4
m [email protected]-ubuntu-trusty-64:~/work/k8s-foo$ kubectl describe pod foo Name: foo-928603113-igh2x Namespace: default Node: 127.0.0.1/127.0.0.1 Start Time: Mon, 11 Apr 2016 15:11:49 +0000 Labels: pod-template-hash=928603113,run=foo Status: Pending IP: Controllers: ReplicaSet/foo-
928603113 Containers: foo: Container ID: Image: hello-world Image ID: Port: QoS Tier: memory: BestEffort cpu: BestEffort State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment Variables: Conditions: Type Status Ready False Volumes: default-token-fbasq: Type: Secret (a volume populated by a Secret) SecretName: default-token-fbasq Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 7m 7m 1 {default-scheduler } Normal Scheduled Successfully assigned foo-928603113-igh2x to 127.0.0.1 4m 4m 1 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for gcr.io/google_containers/pause:2.0, this may be because there are no credentials on this request. details: (API error (500): unable to ping registry endpoint https://gcr.io/v0/\nv2 ping attempt failed with error: Get https://gcr.io/v2/: dial tcp 74.125.203.82:443: i/o timeout\n v1 ping attempt failed with error: Get https://gcr.io/v1/_ping: dial tcp 74.125.203.82:443: i/o timeout\n)"

晚間嘗試啟動kube-dns時也遇到了類似的問題。檢視kube-dns Service時一切正常:

vagrant@vagrant-ubuntu-trusty-64:~/work/k8s-foo$ kubectl get services kube-dns --namespace=kube-system
NAME       CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
kube-dns   10.0.0.10    <none>        53/UDP,53/TCP   56m

但啟動一個Service之後嘗試使用Service名稱解析dns卻失敗了。執行“kubectl get pods –namespace=kube-system”檢視發現kube-dns相關pod啟動失敗了。

再通過“kubectl describe”檢視相關pod的事件時發現原來kube-dns啟動時也需要下載新映象。果斷開啟VPN,再重啟叢集,over。