龍芯平臺搭建Docker Swarm叢集
http://ask.loongnix.org/?/article/91
本教程將指導完成如下工作:
- 初始化一個Docker swarm叢集;
- 新增節點到swarm;
- 部署swarm服務;
- 管理swarm。
準備工作:
- 三臺Linux系統的主機(可以是物理機、虛擬機器或者docker容器,本文選擇了三臺龍芯3A3000+Loongnix(Fedora21-20170927));
- 每臺主機安裝Docker Engine且版本為1.12及以上;
- 其中一臺主機作為管理節點,需知道其IP地址;
- 主機彼此之間開放下面幾個埠:
- TCP 埠2377用於叢集管理通訊,
- TCP/UDP埠7946用於節點間通訊,
- UDP埠4789用於overlay網路通訊。
預設情況這些埠都是開放的,如果不確定可以執行下面的命令開啟這3個端
iptables -A INPUT -p tcp --dport 2377 -j ACCEPTiptables -A INPUT -p tcp --dport 7946 -j ACCEPTiptables -A INPUT -p udp --dport 7946 -j ACCEPTiptables -A INPUT -p udp --dport 4789 -j ACCEPT
初始化一個Docker swarm叢集首先確認各主機的Docker deamon已經啟動:
如果服務狀態不是active(running),執行命令service docker start來啟動Docker deamon。接下來就可以正式開始了。1.選擇一臺主機作為管理節點(manager1),獲取到主機IP為10.20.42.45。終端輸入命令docker swarm init 初始化swarm。
[[email protected] ~]# docker swarm init --advertise-addr 10.20.42.45Swarm initialized: current node (892ozqeoeh6fugx5iao3luduk) is now a manager. To add a worker to this swarm, run the following command: docker swarm join \ --token SWMTKN-1-5vs5ndm8k5idcxeckprr61kg6a7h90dp3uihdhr3kwl1ejwtwg-58jqj86p1nqfh225t51p5h8lp \ 10.20.42.45:2377 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
--advertise-addr 配置管理節點的廣播地址為10.20.42.45,其他的節點要想加入叢集需要能夠訪問該地址。輸出資訊顯示了其他節點分別作為管理節點和工作節點加入該叢集的方法。2.輸入命令docker info檢視當前狀態的swarm資訊,擷取部分關鍵資訊如下:
[[email protected] ~]# docker infoContainers: 10 Running: 0 Paused: 0 Stopped: 10Images: 22Server Version: 1.12.2... ...Swarm: active NodeID: 250tj9l3mnrrtprdd0990b2t3 Is Manager: true ClusterID: atrevada8k0amn83zdiig6qkb Managers: 1 Nodes: 1 Orchestration: Task History Retention Limit: 5... ...
3.輸入命令docker node ls檢視節點資訊:
[[email protected] loongson]# docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS250tj9l3mnrrtprdd0990b2t3 * manager1 Ready Active Leader
*表示你連線到的節點ID。新增兩個節點到swarm另外選擇一臺主機作為工作節點(worker1),第三臺主機也作為工作節點(worker2)。1.worker1上開啟終端。上文我們在建立swarm時,輸出提示資訊展示瞭如何作為工作節點加入swarm:
[[email protected] ~]# docker swarm join /--token SWMTKN-1-5vs5ndm8k5idcxeckprr61kg6a7h90dp3uihdhr3kwl1ejwtwg-58jqj86p1nqfh225t51p5h8lp /10.20.42.45:2377This node joined a swarm as a worker.
如果你丟失了上面命令的資訊,可以在manager1上執行docker swarm join-token worker重新獲取。2.worker2上重複worker1的步驟,作為工作節點加入swarm。3.回到管理節點mgnager1,輸入命令docker node ls檢視swarm內所有節點狀態:
[[email protected] ~]# docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS250tj9l3mnrrtprdd0990b2t3 * manager1 Ready Active Leadera24i9nu2943niy8eq239bbpwv worker1 Ready Active e0lh6c2zb57qg8db7usvg17r6 worker2 Ready Active
*顯示本機連線到的節點,MANAGERSTATUS一欄為Leader表示該節點為管理節點,空表示為工作節點。 部署一個服務到swarm為了更直觀的觀察叢集的服務編排,我們在管理節點啟動portainer,其中swarm visualizer模組能直觀地看到每個節點的服務詳情。portainer 下載:
[[email protected] ~]# docker pull jiangxinshang/portainer
啟動portainer(9000埠不要被其他應用佔用):
[[email protected] ~]# docker run -t -i -p 9000:9000 -v /var/run/docker.sock:/var/run/docker.sock docker.io/jiangxinshang/portainer2017/10/13 07:55:28 Starting Portainer 1.14.3 on :9000
瀏覽器輸出127.0.0.1:9000就能看到叢集圖示了,下圖為服務部署之前的情況:
1.manager1上開啟終端,輸入命令:
[[email protected] ~]# docker service create --replicas 1 --name hello 10.20.42.45:5000/fedora /bin/bash -c "ping loongnix.org"an85njt7e5dadfpwcfyr21sfs
docker service create命令是建立服務--name 將該服務命名為hello-- replicas 規定了該服務的期望狀態為一個執行示例引數10.20.42.45:5000/fedora /bin/bash -c "ping loongnix.org"定義了服務是用映象10.20.42.45:5000/fedora(三臺主機節點上必須有同一個pull下來的映象)建立一個容器,並在容器內執行/bin/bash -c "ping loongnix.org"。2.執行如下命令,可以檢視當前服務狀態:
[[email protected] ~]# docker service lsID NAME REPLICAS IMAGE COMMANDan85njt7e5da hello 1/1 10.20.42.45:5000/fedora /bin/bash -c ping loongnix.org
3. 檢視該服務資訊
[[email protected] ~]# docker service ps helloID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERRORcwk9d8b67n20lmglhbz7jewne hello.1 10.20.42.45:5000/fedora worker2 Running Running 5 minutes ago
當前狀態是服務執行在worker2上,但也可能執行在worker1或者manager1上,因為管理節點同時作為工作節點也會執行服務。在下面的步驟中你會看到更詳細的演示。在swarm visualizer上觀察當前三個節點的服務。
檢視服務的詳細資訊1.登入manager1,終端執行命令docker service inspect --pretty <SERVICE-ID>,能檢視該服務可讀性良好的格式化輸出:
[[email protected] ~]# docker service inspect --pretty helloID: an85njt7e5dadfpwcfyr21sfsName: helloMode: Replicated Replicas: 1Placement:UpdateConfig: Parallelism: 1 On failure: pauseContainerSpec: Image: 10.20.42.45:5000/fedora Args: /bin/bash -c ping loongnix.orgResources:
2. 如果把--pretty去掉,看到的則是json形式的格式化輸出:
[[email protected] ~]# docker service inspect hello[ { "ID": "an85njt7e5dadfpwcfyr21sfs", "Version": { "Index": 230 }, "CreatedAt": "2017-10-13T02:14:33.706598Z", "UpdatedAt": "2017-10-13T02:14:33.706598Z", "Spec": { "Name": "hello", "TaskTemplate": { "ContainerSpec": { "Image": "10.20.42.45:5000/fedora", "Args": [ "/bin/bash", "-c", "ping", "loongnix.org" ] }, "Resources": { "Limits": {}, "Reservations": {} }, "RestartPolicy": { "Condition": "any", "MaxAttempts": 0 }, "Placement": {} }, "Mode": { "Replicated": { "Replicas": 1 } }, "UpdateConfig": { "Parallelism": 1, "FailureAction": "pause" }, "EndpointSpec": { "Mode": "vip" } }, "Endpoint": { "Spec": {} }, "UpdateStatus": { "StartedAt": "0001-01-01T00:00:00Z", "CompletedAt": "0001-01-01T00:00:00Z" } }]
伸縮服務1.manager1上執行命令docker service scale <SERVICE-ID>=<NUMBER-OF-TASKA>可修改當前執行任務的個數,例如:
[[email protected] ~]# docker service scale hello=5hello scaled to 5
2. 通過命令檢視當前執行任務個數,REPLICAS變為了5/5。
[[email protected] ~]# docker service lsID NAME REPLICAS IMAGE COMMANDan85njt7e5da hello 5/5 10.20.42.45:5000/fedora /bin/bash -c ping loongnix.org
3.檢視hello服務在各節點分配情況:
[[email protected] ~]# docker service ps helloID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERRORcwk9d8b67n20lmglhbz7jewne hello.1 10.20.42.45:5000/fedora worker2 Running Running 7 minutes ago 9wyodmvvf77ucbums23ortuj0 hello.2 10.20.42.45:5000/fedora worker1 Running Running 41 seconds ago 19ev4tcj0be2t1xbdkf7evp4f hello.3 10.20.42.45:5000/fedora manager1 Running Running 44 seconds ago 1tm1tc2r8xdgdry2hu0pjilmy hello.4 10.20.42.45:5000/fedora manager1 Running Running less than a second ago cfqh34h9e6jv7l0iiwau8jhsf hello.5 10.20.42.45:5000/fedora worker2 Running Running less than a second ago
通過swarm visualizer檢視服務直觀圖示:
swarm會負載均衡地編排服務在各節點的執行。4. 在manager1節點上我們也可以觀察到對應的兩個容器資訊:
[[email protected] ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES2f6be7870988 10.20.42.45:5000/fedora:latest "/bin/bash -c 'ping l" 5 minutes ago Up 5 minutes hello.4.1tm1tc2r8xdgdry2hu0pjilmya2da3bb3c4f0 10.20.42.45:5000/fedora:latest "/bin/bash -c 'ping l" 5 minutes ago Up 5 minutes hello.3.19ev4tcj0be2t1xbdkf7evp4f
worker1上通過命令觀察到一個對應容器資訊:
[[email protected] ~]# docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES954af2601ac2 10.20.42.45:5000/fedora:latest "/bin/bash -c 'ping l" 6 minutes ago Up 6 minutes hello.2.9wyodmvvf77ucbums23ortuj0
worker2 略。刪除執行的服務1.在管理節點manager1上,刪除服務通過命令dockr service rm <SERVICE-ID>實現:
[[email protected] ~]# docker service rm hellohello
2. 管理節點再檢視該服務資訊會報錯:
[[email protected] ~]# docker service ps helloError: No such service: hello
3.工作節點檢視容器也關閉了
[[email protected] ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
滾動更新服務在這個環節我們部署一個服務來建立基於fedora:21.0映象的容器,然後演示滾動更新服務,升級映象容器到fedora:21.1。1.首先在manager1節點部署服務,並配置swarm的更新間隔時間為10秒:
[[email protected] ~]# docker service create --replicas 5 --name fedora --update-delay 10s fedora:21.0 ping loongnix.orgdes35b3cu097uelo8n8zv5gez
我們在部署服務時指定滾動更新策略。--update-delay 表示更新服務的一個任務或一組任務之間的時間間隔。時間間隔用數字和單位組成,m 表示分,h 表示時,例如10m30s表示指定了10分30秒的間隔。預設情況下,排程器一次更新一個任務。也可以通過引數 --update-parallelism 配置排程器每次同時更新的最大任務數量。預設情況下,若更新一個任務返回了RUNNING狀態,排程器會轉去更新下一個,直到所有任務都更新完成;若更新一個任務返回了FAILED,排程器則暫停更新。我們可以在執行docker service create 命令或 docker service update 命令時使用 --update-failure-action 引數來指定更新返回失敗之後的行為。2. 檢視叢集內各節點的服務編排:
3.開始更新fedora映象,swarm管理器將依據update的配置測略實施更新:
[[email protected] ~]# docker service update --image fedora:21.1 fedorafedora
排程器按照如下步驟實現滾動更新:
- 停止第一個任務
- 為已停止的任務排程更新
- 為已更新的任務開啟容器
- 如果一個任務的更新結果返回RUNNING,等待指定的時間間隔後開始更新下一個任務;如果更新一個任務的任意階段返回了FAILED,中止更新任務。
4.通過portainer能看到服務的實時更新情況:
5.輸入命令docker service ps <SERVICE-ID>觀察滾動更新:
[[email protected] ~]# docker service ps fedoraID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR6t6p5r9ujrsq0e1kf3k0yta6n fedora.1 fedora:21.1 worker1 Running Running 8 minutes ago 2vctv5ay741c7z6vlx2kqdwje \_ fedora.1 fedora:21.0 worker2 Shutdown Shutdown 8 minutes ago 8vrfziykhn7ar7b7zyhxlsarr fedora.2 fedora:21.1 manager1 Running Running 6 minutes ago cy0zvhqawk1j9g0jb112sq482 \_ fedora.2 fedora:21.0 worker1 Shutdown Shutdown 6 minutes ago 0yojbjr1omf46qycc2c6at4b7 fedora.3 fedora:21.1 worker2 Running Running 7 minutes ago cisqv46430hy7p3kj5xaefgd8 \_ fedora.3 fedora:21.0 manager1 Shutdown Shutdown 7 minutes ago 17ailb36wdeg4zvkl9mytoxhu fedora.4 fedora:21.1 manager1 Running Running 6 minutes ago cvpl0svmlyj1ozdb6r0eiovts \_ fedora.4 fedora:21.0 manager1 Shutdown Shutdown 6 minutes ago 4lt4ax7ezyqid6ndk89ksngc5 fedora.5 fedora:21.1 worker2 Running Running 7 minutes ago 8sbzju1llogo7d6s9t4h0e1ia \_ fedora.5 fedora:21.0 worker2 Shutdown Shutdown 7 minutes ago
輸出顯示全部任務已經更新完畢。 下線某個節點在前面所有的步驟中,所有節點都是執行狀態且可用性為ACTIVE。swarm管理器會向ACTIVE狀態的節點分配任務,目前為止各節點都能接收任務。有時,像計劃中的維護時段,需要將一個節點可用性設定為DRAIN。DRAIN的節點不能從swarm管理器接收任務。管理器會將DRAIN節點的任務停止掉,分發給其他ACTIVE的節點。1.開始之前,先確認叢集內各節點狀態都是ACTIVE:
[[email protected] ~]# docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS250tj9l3mnrrtprdd0990b2t3 * manager1 Ready Active Leadera24i9nu2943niy8eq239bbpwv worker1 Ready Active e0lh6c2zb57qg8db7usvg17r6 worker2 Ready Active
2.重新執行之前的服務部署,將任務個數設為3,保證每個節點都被分發有任務:
[[email protected] ~]# docker service create --replicas 3 --name helloagain 10.20.42.45:5000/fedora /bin/bash -c "ping loongnix.org"0rxfbv9fwrs0fv06hfnp8j5je[[email protected] ~]# docker service ps helloagainID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR8xszceewi2dj9xhbce53rkq1a helloagain.1 10.20.42.45:5000/fedora worker1 Running Preparing 6 seconds ago 6mwhdnseq3wkkxbc3dmxhwoey helloagain.2 10.20.42.45:5000/fedora worker2 Running Running 1 seconds ago 395a7pk7e4vby3jd1sihv0gic helloagain.3 10.20.42.45:5000/fedora manager1 Running Running 5 seconds ago
3.執行命令docker node update --availability drain <NODE-ID>將一個存在任務的節點下線:
[[email protected] ~]# docker node update --availability drain worker1worker1
4.檢視下線節點的詳細資訊,其中Availability顯示為Drain:
[[email protected] ~]# docker node inspect --pretty worker1ID: a24i9nu2943niy8eq239bbpwvHostname: worker1Joined at: 2017-10-13 01:16:01.272489 +0000 utcStatus: State: Ready Availability: DrainPlatform: Operating System: linux Architecture: mips64Resources: CPUs: 4 Memory: 7.598 GiBPlugins: Network: bridge, host, null, overlay Volume: localEngine Version: 1.12.2
5.檢視該服務當前的編排情況:
[[email protected] ~]# docker service ps helloagainID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERRORe6mh01ekb4dcpnnu3yez7qqdx helloagain.1 10.20.42.45:5000/fedora worker2 Running Running 4 minutes ago 8xszceewi2dj9xhbce53rkq1a \_ helloagain.1 10.20.42.45:5000/fedora worker1 Shutdown Shutdown 4 minutes ago 6mwhdnseq3wkkxbc3dmxhwoey helloagain.2 10.20.42.45:5000/fedora worker2 Running Running 10 minutes ago 395a7pk7e4vby3jd1sihv0gic helloagain.3 10.20.42.45:5000/fedora manager1 Running Running 10 minutes ago
worker1節點的任務已經關閉,被分發到了worker2上。6.重新將worker1的Availability從DRAIN改回為ACTIVE,再觀察:
[[email protected] ~]# docker node update --availability active worker1worker1[[email protected] ~]# docker node inspect --pretty worker1ID: a24i9nu2943niy8eq239bbpwvHostname: worker1Joined at: 2017-10-13 01:16:01.272489 +0000 utcStatus: State: Ready Availability: ActivePlatform: Operating System: linux Architecture: mips64Resources: CPUs: 4 Memory: 7.598 GiBPlugins: Network: bridge, host, null, overlay Volume: localEngine Version: 1.12.2[[email protected] ~]# docker service ps helloagainID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERRORe6mh01ekb4dcpnnu3yez7qqdx helloagain.1 10.20.42.45:5000/fedora worker2 Running Running 8 minutes ago 8xszceewi2dj9xhbce53rkq1a \_ helloagain.1 10.20.42.45:5000/fedora worker1 Shutdown Shutdown 8 minutes ago 6mwhdnseq3wkkxbc3dmxhwoey helloagain.2 10.20.42.45:5000/fedora worker2 Running Running 13 minutes ago 395a7pk7e4vby3jd1sihv0gic helloagain.3 10.20.42.45:5000/fedora manager1 Running Running 13 minutes ago
可以看到,worker1的Availability狀態變回Active,狀態為Ready。因為當前沒有任務變化還暫時沒有被分配任務。一個可用性為Active的節點可以在以下情況接收新的任務:
- 當伸縮一個服務時
- 當任務滾動更新時
- 當其他某個節點被設為Drain時
- 當某個任務在其他 Active 節點上啟動失敗時