深度分析LINUX環境下如何配置multipath
首先介紹一下什麼是多路徑(multi-path)?先說說多路徑功能產生的背景,在多路徑功能出現之前,主機上的硬碟是直接掛接到一個匯流排(PCI)上,路徑是一對一的關係,也就是一條路徑指向一個硬碟或是儲存裝置,這樣的一對一關係對於作業系統而言,處理相對簡單,但是缺少了可靠性。當出現了光纖通道網路(Fibre Channle)也就是通常所說的SAN網路時,或者由iSCSI組成的IPSAN環境時,由於主機和儲存之間通過光纖通道交換機或者多塊網絡卡及IP來連線時,構成了多對多關係的IO通道,也就是說一臺主機到一臺儲存裝置之間存在多條路徑。當這些路徑同時生效時,I/O流量如何分配和排程,如何做IO流量的負載均衡,如何做主備。這種背景下多路徑軟體就產生了。
多路徑的主要功能就是和儲存裝置一起配合實現如下功能:
1.故障的切換和恢復
2.IO流量的負載均衡
3.磁碟的虛擬化
在linux作業系統中,RedHat和Suse的2.6的核心中都自帶了免費的多路徑軟體包,ESX作業系統下也是自帶了免費的多路徑功能,而windows作業系統下,就需要購買一個叫MPIO的軟體lience才能使用multi-path多路徑功能。其他windows和ESX作業系統下的多路徑 功能都是圖形化介面比較簡單這裡就不多做介紹了,在這裡就是介紹一下linux環境下如何配置multi-path多路徑功能。
1、device-mapper-multipath:
2、 device-mapper:主要包括兩大部分:核心部分和使用者部分。核心部分主要由device mapper核心(dm.ko)和一些target driver(md-multipath.ko)。核心完成裝置的對映,而target根據對映關係和自身特點具體處理從mappered device 下來的i/o。同時,在核心部分,提供了一個介面,使用者通過ioctr可和核心部分通訊,以指導核心驅動的行為,比如如何建立mappered device,這些divece的屬性等。linux device mapper的使用者空間部分主要包括device-mapper這個包。其中包括dmsetup工具和一些幫助建立和配置mappered device的庫。這些庫主要抽象,封裝了與ioctr通訊的介面,以便方便建立和配置mappered device。multipath-tool的程式中就需要呼叫這些庫。
3、dm-multipath.ko和dm.ko:dm.ko是device mapper驅動。它是實現multipath的基礎。dm-multipath其實是dm的一個target驅動。
4、scsi_id: 包含在udev程式包中,可以在multipath.conf中配置該程式來獲取scsi裝置的序號。通過序號,便可以判斷多個路徑對應了同一裝置。這個是多路徑實現的關鍵。scsi_id是通過sg驅動,向裝置傳送EVPD page80或page83 的inquery命令來查詢scsi裝置的標識。但一些裝置並不支援EVPD 的inquery命令,所以他們無法被用來生成multipath裝置。但可以改寫scsi_id,為不能提供scsi裝置標識的裝置虛擬一個識別符號,並輸出到標準輸出。multipath程式在建立multipath裝置時,會呼叫scsi_id,從其標準輸出中獲得該裝置的scsi id。在改寫時,需要修改scsi_id程式的返回值為0。因為在multipath程式中,會檢查該值來確定scsi id是否已經成功得到。
二、multipath在redhat 6.2中的基本配置:
1. 通過命令:lsmod |grep dm_multipath 檢查是否正常安裝成功。如果沒有輸出說明沒有安裝那麼通過yum功能安裝一下軟體包:yum –y install device-mapper device-mapper-multipath
接著通過命令:multipath –ll 檢視多路徑狀態檢視模組是否載入成功
[[email protected] ~]# multipath –ll 檢視多路徑狀態
Mar 10 19:18:28 | /etc/multipath.conf does not exist, blacklisting all devices.
Mar 10 19:18:28 | A sample multipath.conf file is located at
Mar 10 19:18:28 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf
Mar 10 19:18:28 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf
Mar 10 19:18:28 | DM multipath kernel driver not loaded ----DM模組沒有載入
如果模組沒有載入成功請使用下列命初始化DM,或重啟系統
---Use the following commands to initialize and start DM for the first time:
# modprobe dm-multipath
# modprobe dm-round-robin
# service multipathd start
# multipath –v2
初始化完了之後再通過multipath -ll命令檢視是否載入成功
[[email protected] ~]# multipath -ll
Mar 10 19:21:14 | /etc/multipath.conf does not exist, blacklisting all devices.
Mar 10 19:21:14 | A sample multipath.conf file is located at
Mar 10 19:21:14 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf
Mar 10 19:21:14 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf
DM multipath kernel driver not loaded ----這個提示沒了說明DM模組已載入成功。
從上面的提示可以看到,DM模組是成功載入,但是/etc/下沒有multipath.conf 配置檔案,下一步介紹如何配置multipath.conf 檔案。
2. 配置multipath:
通過vi命令建立一個Multipath的配置檔案路徑是/etc/multipath.conf ,在配置檔案中新增multipath正常工作的最簡配置如下:
vi /etc/multipath.conf
blacklist {
devnode "^sda"
}
defaults {
user_friendly_names yes
path_grouping_policy multibus
failback immediate
no_path_retry fail
}
編輯完成後儲存配置,同時通過命令:
#開啟mulitipath服務
# /etc/init.d/multipathd start
如果出現無法開啟服務的情況,沒有提示OK的話如下:
[[email protected] mapper]# service multipathd start
Starting multipathd daemon: 沒有提示OK
重新開關一下服務就可以解決了。
[[email protected] mapper]# /etc/init.d/multipathd stop
Stopping multipathd daemon: [ OK ]
[[email protected] mapper]# /etc/init.d/multipathd start
Starting multipathd daemon: [ OK ] -----提示OK 正常開啟服務
通過命令檢視:
[[email protected] mapper]# multipath -ll
mpatha (360a9800064665072443469563477396c) dm-0 NETAPP,LUN ----建立了一個lun
size=3.5G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=4 status=active
|- 1:0:0:0 sdb 8:16 active ready running ----多路徑下的兩個碟符sdb和sde.
`- 2:0:0:0 sde 8:64 active ready running
目錄/dev/mapper/ 下多了兩個資料夾mpatha 和mpathap1。
[[email protected] mapper]# cd /dev/mapper/
[[email protected] mapper]# ls
control mpatha mpathap1
同時fdisk –l的命令下也多了兩個裝置標識:
沒有配置多路徑時:
[[email protected]~]# fdisk -l
Disk /dev/sda: 146.8 GB, 146815733760 bytes
255 heads, 63 sectors/track, 17849 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a6cdd
Device Boot Start End Blocks Id System
/dev/sda1 * 1 26 204800 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 26 287 2097152 82 Linux swap / Solaris
Partition 2 does not end on cylinder boundary.
/dev/sda3 287 17850 141071360 83 Linux
Disk /dev/sdb: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sdb1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/sde: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sde1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
兩個CAN網絡卡獲取到同一碟符:
/dev/sde和/dev/sdb.
配置後多了/dev/mapper/mpatha和/dev/mapper/mpathap1:
[[email protected] mapper]# fdisk -l
Disk /dev/sda: 146.8 GB, 146815733760 bytes
255 heads, 63 sectors/track, 17849 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a6cdd
Device Boot Start End Blocks Id System
/dev/sda1 * 1 26 204800 83 Linux
Partition 1 does not end on cylinder boundary.
/dev/sda2 26 287 2097152 82 Linux swap / Solaris
Partition 2 does not end on cylinder boundary.
/dev/sda3 287 17850 141071360 83 Linux
Disk /dev/sdb: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sdb1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/sde: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/sde1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/mapper/mpatha: 3774 MB, 3774873600 bytes
117 heads, 62 sectors/track, 1016 cylinders
Units = cylinders of 7254 * 512 = 3714048 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Disk identifier: 0xac956c3a
Device Boot Start End Blocks Id System
/dev/mapper/mpathap1 1 1016 3685001 83 Linux
Partition 1 does not start on physical sector boundary.
Disk /dev/mapper/mpathap1: 3773 MB, 3773441024 bytes
255 heads, 63 sectors/track, 458 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 65536 bytes
Alignment offset: 1024 bytes
Disk identifier: 0x00000000
Disk /dev/mapper/mpathap1 doesn't contain a valid partition table
# multipath -F #刪除現有路徑 兩個新的路徑就會被刪除
# multipath -v2 #格式化路徑 格式化後又出現
3. multipath磁碟的基本操作
要對多路徑軟體生成的磁碟進行操作直接操作/dev/mapper/目錄下的磁碟就行.
在對多路徑軟體生成的磁碟進行分割槽之前最好執行一下pvcreate命令:
# pvcreate /dev/mapper/mpatha
# fdisk /dev/mapper/mpatha 分割槽時用這個目錄/dev/mapper/mpatha
用fdisk對多路徑軟體生成的磁碟進行分割槽儲存時會有一個報錯,此報錯不用理會.
# ls -l /dev/mapper/
[[email protected] mnt]# ls -l /dev/mapper/
total 0
crw-rw----. 1 root root 10, 58 Mar 10 19:10 control
lrwxrwxrwx. 1 root root 7 Mar 10 20:28 mpatha -> ../dm-0
lrwxrwxrwx. 1 root root 7 Mar 10 20:33 mpathap1 -> ../dm-1
的mpathap1就是我們對multipath磁碟進行的分割槽
# mkfs.ext4 /dev/mapper/mpathap1 #對mpath1p1分割槽格式化成ext4檔案系統
# mount /dev/mapper/mpathap1 /mnt/ #掛載mpathap1分割槽
格式化和掛載時用/dev/mapper/mpathap1
4. 分割槽磁碟:
上面有提到分割槽時用目錄/dev/mapper/mpatha
[[email protected]~]# fdisk /dev/mapper/mpatha
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xac956c3a.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
WARNING: DOS-compatible mode is deprecated. It's strongly recommended to
switch off the mode (command 'c') and change display units to
sectors (command 'u').
Command (m for help): n------------------------新建分割槽
Command action
e extended
p primary partition (1-4)
p-----------------------------主分割槽
Partition number (1-4): 1
First cylinder (1-1016, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-1016, default 1016):
Using default value 1016
Command (m for help): w ---------------------寫入列表相當於儲存
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
注:如果同一臺裝置的兩個node掛同樣的碟符,另一個碟符還需要再次寫入w就行。不需要n了。
5. 格式化:
[[email protected] ~]# mkfs.ext4 /dev/mapper/mpathap1
mke2fs 1.41.12 (17-May-2010)
/dev/sdd1 alignment is offset by 1024 bytes.
This may result in very poor performance, (re)-partitioning suggested.
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=1 blocks, Stripe width=16 blocks
230608 inodes, 921250 blocks
46062 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=943718400
29 block groups
32768 blocks per group, 32768 fragments per group
7952 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736
Writing inode tables: done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 33 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
6. 掛載 /dev/mapper/mpathap1 到 /mnt
[[email protected] ~]# mount /dev/mapper/mpathap1 /mnt
三、multipath的高階配置之前的配置都是用multipath的預設配置來完成multipath,比如對映裝置的名稱,multipath負載均衡的方法都是預設設定。那有沒有按照我們自己定義的方法來配置multipath呢,答案是OK。
1、multipath.conf檔案的配置
接下來的工作就是要編輯/etc/multipath.conf的配置檔案
multipath.conf主要包括blacklist、multipaths、devices三部份的配置
blacklist配置
blacklist {
devnode "^sda"
}
Multipaths部分配置multipaths和devices兩部份的配置。
multipaths {
multipath {
wwid **************** #此值multipath -v3可以看到
alias iscsi-dm0 #對映後的別名,可以隨便取
path_grouping_policy multibus #路徑組策略
path_checker tur #決定路徑狀態的方法
path_selector "round-robin 0" #選擇那條路徑進行下一個IO操作的方法
}
}
Devices部分配置
devices {
device {
vendor "iSCSI-Enterprise" #廠商名稱
product "Virtual disk" #產品型號
path_grouping_policy multibus #預設的路徑組策略
getuid_callout "/sbin/scsi_id -g -u -s /block/%n" #獲得唯一裝置號使用的預設程式
prio_callout "/sbin/acs_prio_alua %d" #獲取有限級數值使用的預設程式
path_checker readsector0 #決定路徑狀態的方法
path_selector "round-robin 0" #選擇那條路徑進行下一個IO操作的方法
failback immediate #故障恢復的模式
no_path_retry queue #在disable queue之前系統嘗試使用失效路徑的次數的數值
rr_min_io 100 #在當前的使用者組中,在切換到另外一條路徑之前的IO請求的數目
}
}
下面是相關引數的標準文件的介紹:
Attribute |
Description |
|||||||||
wwid |
Specifies the WWID of the multipath device to which the multipath attributes apply. This parameter is mandatory for this section of themultipath.conf file. |
|||||||||
alias |
Specifies the symbolic name for the multipath device to which themultipath attributes apply. If you are using user_friendly_names, do not set this value tompathn; this may conflict with an automatically assigned user friendly name and give you incorrect device node names. |
|||||||||
path_grouping_policy |
|
|||||||||
path_selector |
|
|||||||||
failback |
|
|||||||||
prio |
|
|||||||||
no_path_retry |
|
|||||||||
rr_min_io |
Specifies the number of I/O requests to route to a path before switching to the next path in the current path group. This setting is only for systems running kernels older that 2.6.31. Newer systems should userr_min_io_rq. The default value is 1000. |
|||||||||
rr_min_io_rq |
Specifies the number of I/O requests to route to a path before switching to the next path in the current path group, using request-based device-mapper-multipath. This setting should be used on systems running current kernels. On systems running kernels older than 2.6.31, use rr_min_io. The default value is 1. |
|||||||||
rr_weight |
If set to priorities, then instead of sending rr_min_io requests to a path before callingpath_selector to choose the next path, the number of requests to send is determined byrr_min_io times the path's priority, as determined by the prio function. If set touniform, all path weights are equal. |
|||||||||
flush_on_last_del |
If set to yes, then multipath will disable queueing when the last path to a device has been deleted. |
在我本地的一個完整的高階配置如下:
[[email protected] ~]# vi /etc/multipath.conf
blacklist {
devnode "^sda"
}
multipaths {
multipath {
wwid 360a98000646650724434697454546156
alias mpathb_fcoe
path_grouping_policy multibus
#path_checker "directio"
prio "random"
path_selector "round-robin 0"
}
}
devices {
device {
vendor "NETAPP"
product "LUN"
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
#path_checker "directio"
#path_selector "round-robin 0"
failback immediate
no_path_retry fail
}
}
其中 wwid,vendor,product, getuid_callout這些引數可以通過:multipath -v3命令來獲取。如果在/etc/multipath.conf中有設定各wwid 別名,別名會覆蓋此設定。
可以使用dd命令來對裝置進行讀寫操作,並同時通過iostat來檢視I/0狀態,流量從哪個路徑出去:
DD命令:dd if=/dev/zero of=/mnt/1Gfile bs=8k count=131072 在上面我們已經把磁碟掛載在/MNT資料夾下所以我們在讀寫磁碟時直接對/mnt資料夾直接讀寫就可以了。
如果想對磁碟重複讀寫可以用如下語句:
[[email protected] ~]# for ((i=1;i<=5;i++));do dd if=/dev/zero of=/mnt/1Gfile bs=8k count=131072 2>&1|grep MB;done; ---重複讀寫5次這個值可以根據自己測試需求修改。
另一個控制檯輸入iostat 2 10檢視IO讀寫狀態:
可以看到sdc和sdd是兩個多路徑的碟符,流量均勻的負載在兩條路徑中,負載均衡很成功。
將其中一條路徑的埠down掉,所有流量會直接切換到另一個路徑中。
轉載於 https://blog.csdn.net/CrazyTeam/article/details/41483509?utm_source=copy