1. 程式人生 > >etcd子網配置失效導致Docker啟動失敗

etcd子網配置失效導致Docker啟動失敗

Docker啟動失敗

執行 docker status docker 檢視原因,發現以下錯誤

Oct 14 16:39:10 *.*.* systemd[1]: Dependency failed for Docker Application Container Engine.
Oct 14 16:39:10 *.*.* systemd[1]: Job docker.service/start failed with result 'dependency'.
Oct 24 09:02:34 *.*.* systemd[1]: Dependency failed for Docker Application Container Engine.
Oct 24 09:02:34 *.*.* systemd[1]: Job docker.service/start failed with result 'dependency'.
Oct 24 09:45:01 *.*.* systemd[1]: Dependency failed for Docker Application Container Engine.
Oct 24 09:45:01 *.*.* systemd[1]: Job docker.service/start failed with result 'dependency'.

由於安裝了flanneld,因此docker增加了對flanneld的依賴,執行systemctl status flanneld 檢視原因,現出以下錯誤

Oct 24 10:15:51 *.*.* flanneld-start[1187]: E1024 10:15:51.327561    1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110550]
Oct 24 10:15:52 *.*.* flanneld-start[1187]: E1024 10:15:52.328849    1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110551]
Oct 24 10:15:53 *.*.* flanneld-start[1187]: E1024 10:15:53.329930    1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110552]
Oct 24 10:15:54 *.*.* flanneld-start[1187]: E1024 10:15:54.331301    1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110553]
Oct 24 10:15:55 *.*.* flanneld-start[1187]: E1024 10:15:55.332503    1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110554]

原因為etcd子網配置/kube-centos/network/config 未找到,懷疑每次啟動etcd後,都需要重新配置子網,建立etcd 啟動指令碼如下

systemctl start etcd
etcdctl mkdir /kube-centos/network
etcdctl mk /kube-centos/network/config "{ \"Network\": \"172.30.0.0/16\", \"SubnetLen\": 24, \"Backend\": { \"Type\": \"vxlan\" } }"

執行該指令碼,再執行 systemctl start flanneld
[[email protected]
Kubernetes]# systemctl status flanneld ● flanneld.service - Flanneld overlay address etcd agent Loaded: loaded (/usr/lib/systemd/system/flanneld.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2017-10-24 10:15:56 CST; 44s ago Process: 1334 ExecStartPost=/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker (code=exited, status=0/SUCCESS) Main PID: 1187 (flanneld) Memory: 6.9M CGroup: /system.slice/flanneld.service └─1187 /usr/bin/flanneld -etcd-endpoints=http://q.emulian.com:2379 -etcd-prefix=/kube-centos/network Oct 24 10:15:51 *.*.* flanneld-start[1187]: E1024 10:15:51.327561 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110550] Oct 24 10:15:52 *.*.* flanneld-start[1187]: E1024 10:15:52.328849 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110551] Oct 24 10:15:53 *.*.* flanneld-start[1187]: E1024 10:15:53.329930 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110552] Oct 24 10:15:54 *.*.* flanneld-start[1187]: E1024 10:15:54.331301 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110553] Oct 24 10:15:55 *.*.* flanneld-start[1187]: E1024 10:15:55.332503 1187 network.go:102] failed to retrieve network config: 100: Key not found (/kube-centos/network/config) [110554] Oct 24 10:15:56 *.*.* flanneld-start[1187]: I1024 10:15:56.337850 1187 local_manager.go:179] Picking subnet in range 172.30.1.0 ... 172.30.255.0 Oct 24 10:15:56 *.*.* flanneld-start[1187]: I1024 10:15:56.339319 1187 manager.go:250] Lease acquired: 172.30.56.0/24 Oct 24 10:15:56 *.*.* flanneld-start[1187]: I1024 10:15:56.340333 1187 network.go:58] Watching for L3 misses Oct 24 10:15:56 *.*.* flanneld-start[1187]: I1024 10:15:56.340369 1187 network.go:66] Watching for new subnet leases

啟動成功,再執行 systemctl start docker

[[email protected] Kubernetes]# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/docker.service.d
           └─flannel.conf
   Active: active (running) since Tue 2017-10-24 10:17:02 CST; 10min ago
     Docs: http://docs.docker.com
 Main PID: 1684 (dockerd-current)
   Memory: 48.6M
   CGroup: /system.slice/docker.service
           ├─1684 /usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/dock...
           └─1720 /usr/bin/docker-containerd-current -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --shim docker-containerd-shim --metrics-interval=0 --start-timeout 2m --state-dir /var/run/docker/lib...

Oct 24 10:17:01 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:01.969192305+08:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Oct 24 10:17:01 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:01.970172523+08:00" level=warning msg="mountpoint for pids not found"
Oct 24 10:17:01 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:01.970874674+08:00" level=info msg="Loading containers: start."
Oct 24 10:17:01 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:01.986058242+08:00" level=info msg="Firewalld running: false"
Oct 24 10:17:02 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:02.078754482+08:00" level=info msg="Loading containers: done."
Oct 24 10:17:02 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:02.079203957+08:00" level=info msg="Daemon has completed initialization"
Oct 24 10:17:02 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:02.079223135+08:00" level=info msg="Docker daemon" commit="88a4867/1.12.6" graphdriver=devicemapper version=1.12.6
Oct 24 10:17:02 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:02.089322355+08:00" level=info msg="API listen on [::]:2375"
Oct 24 10:17:02 *.*.* dockerd-current[1684]: time="2017-10-24T10:17:02.089334645+08:00" level=info msg="API listen on /var/run/docker.sock"
Oct 24 10:17:02 *.*.* systemd[1]: Started Docker Application Container Engine

Docker啟動成功

補充flanneld啟動錯誤,因防火牆埠問題導致啟動失敗,錯誤資訊如下

[[email protected] kubernetes]# systemctl status flanneld.service
● flanneld.service - Flanneld overlay address etcd agent
   Loaded: loaded (/usr/lib/systemd/system/flanneld.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Thu 2017-11-16 13:32:16 CST; 4min 51s ago
  Process: 25604 ExecStart=/usr/bin/flanneld-start $FLANNEL_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 25604 (code=exited, status=0/SUCCESS)

Nov 16 13:32:09 *.*.* flanneld-start[25604]: I1116 13:32:09.616564   25604 main.go:132] Installing signal handlers
Nov 16 13:32:09 *.*.* flanneld-start[25604]: I1116 13:32:09.616932   25604 manager.go:136] Determining IP address of default interface
Nov 16 13:32:09 *.*.* flanneld-start[25604]: I1116 13:32:09.617847   25604 manager.go:149] Using interface with name enp4s0f1 and address 192.168.1.101
Nov 16 13:32:09 *.*.* flanneld-start[25604]: I1116 13:32:09.617894   25604 manager.go:166] Defaulting external address to interface address (192.168.1.101)
Nov 16 13:32:10 *.*.* flanneld-start[25604]: E1116 13:32:10.618518   25604 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp *.*.*:2379: i/o timeout
Nov 16 13:32:12 *.*.* flanneld-start[25604]: E1116 13:32:12.619119   25604 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp *.*.*:2379: i/o timeout
Nov 16 13:32:14 *.*.* flanneld-start[25604]: E1116 13:32:14.619756   25604 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp *.*.*:2379: i/o timeout
Nov 16 13:32:15 *.*.* flanneld-start[25604]: E1116 13:32:15.622421   25604 network.go:102] failed to retrieve network config: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp *.*.*:2379: getsockopt: no route to host
Nov 16 13:32:16 *.*.* flanneld-start[25604]: I1116 13:32:16.446052   25604 main.go:172] Exiting...
Nov 16 13:32:16 *.*.* systemd[1]: Stopped Flanneld overlay address etcd agent.
可以看到主要錯誤資訊為
failed to retrieve network config: client: etcd cluster is unavailable or misconfigured;
經過排查,發現firewalld的沒有開放2379埠,因此,flanneld無法訪問etcd服務,導致啟動失敗

etcd配置檔案位於 /etc/etcd/etcd.conf

flanneld配置檔案位於 /etc/sysconfig/flanneld.conf