1. 程式人生 > >深度分析LINUX環境下如何配置multipath

深度分析LINUX環境下如何配置multipath

首先介紹一下什麼是多路徑(multi-path)?先說說多路徑功能產生的背景,在多路徑功能出現之前,主機上的硬碟是直接掛接到一個匯流排(PCI)上,路徑是一對一的關係,也就是一條路徑指向一個硬碟或是儲存裝置,這樣的一對一關係對於作業系統而言,處理相對簡單,但是缺少了可靠性。當出現了光纖通道網路(Fibre Channle)也就是通常所說的SAN網路時,或者由iSCSI組成的IPSAN環境時,由於主機和儲存之間通過光纖通道交換機或者多塊網絡卡及IP來連線時,構成了多對多關係的IO通道,也就是說一臺主機到一臺儲存裝置之間存在多條路徑。當這些路徑同時生效時,I/O流量如何分配和排程,如何做IO流量的負載均衡,如何做主備。這種背景下多路徑軟體就產生了。

多路徑的主要功能就是和儲存裝置一起配合實現如下功能:
1.故障的切換和恢復
2.IO流量的負載均衡
3.磁碟的虛擬化

在linux作業系統中,RedHat和Suse的2.6的核心中都自帶了免費的多路徑軟體包,ESX作業系統下也是自帶了免費的多路徑功能,而windows作業系統下,就需要購買一個叫MPIO的軟體lience才能使用multi-path多路徑功能。其他windows和ESX作業系統下的多路徑 功能都是圖形化介面比較簡單這裡就不多做介紹了,在這裡就是介紹一下linux環境下如何配置multi-path多路徑功能。

一、Linux下multipath相關工具和引數介紹:

1、device-mapper-multipath:

即multipath-tools。主要提供multipathd和multipath等工具和 multipath.conf等配置檔案。這些工具通過device mapper的ioctr的介面建立和配置multipath,裝置建立的多路徑裝置對映會在/dev /mapper中。

2、 device-mapper:主要包括兩大部分:核心部分和使用者部分。核心部分主要由device mapper核心(dm.ko)和一些target driver(md-multipath.ko)。核心完成裝置的對映,而target根據對映關係和自身特點具體處理從mappered device 下來的i/o。同時,在核心部分,提供了一個介面,使用者通過ioctr可和核心部分通訊,以指導核心驅動的行為,比如如何建立mappered device,這些divece的屬性等。linux device mapper的使用者空間部分主要包括device-mapper這個包。其中包括dmsetup工具和一些幫助建立和配置mappered device的庫。這些庫主要抽象,封裝了與ioctr通訊的介面,以便方便建立和配置mappered device。multipath-tool的程式中就需要呼叫這些庫。

3、dm-multipath.ko和dm.ko:dm.ko是device mapper驅動。它是實現multipath的基礎。dm-multipath其實是dm的一個target驅動。

4、scsi_id: 包含在udev程式包中,可以在multipath.conf中配置該程式來獲取scsi裝置的序號。通過序號,便可以判斷多個路徑對應了同一裝置。這個是多路徑實現的關鍵。scsi_id是通過sg驅動,向裝置傳送EVPD page80或page83 的inquery命令來查詢scsi裝置的標識。但一些裝置並不支援EVPD 的inquery命令,所以他們無法被用來生成multipath裝置。但可以改寫scsi_id,為不能提供scsi裝置標識的裝置虛擬一個識別符號,並輸出到標準輸出。multipath程式在建立multipath裝置時,會呼叫scsi_id,從其標準輸出中獲得該裝置的scsi id。在改寫時,需要修改scsi_id程式的返回值為0。因為在multipath程式中,會檢查該值來確定scsi id是否已經成功得到。

二、multipath在redhat 6.2中的基本配置:

1. 通過命令:lsmod |grep dm_multipath  檢查是否正常安裝成功。如果沒有輸出說明沒有安裝那麼通過yum功能安裝一下軟體包:yum –y install device-mapper device-mapper-multipath

接著通過命令:multipath –ll 檢視多路徑狀態檢視模組是否載入成功

[[email protected] ~]#  multipath –ll   檢視多路徑狀態

Mar 10 19:18:28 | /etc/multipath.conf does not exist, blacklisting all devices.

Mar 10 19:18:28 | A sample multipath.conf file is located at

Mar 10 19:18:28 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf

Mar 10 19:18:28 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf

Mar 10 19:18:28 | DM multipath kernel driver not loaded    ----DM模組沒有載入

如果模組沒有載入成功請使用下列命初始化DM,或重啟系統

---Use the following commands to initialize and start DM for the first time:
# modprobe dm-multipath
# modprobe dm-round-robin
# service multipathd start
# multipath –v2

初始化完了之後再通過multipath -ll命令檢視是否載入成功

[[email protected] ~]#  multipath -ll

Mar 10 19:21:14 | /etc/multipath.conf does not exist, blacklisting all devices.

Mar 10 19:21:14 | A sample multipath.conf file is located at

Mar 10 19:21:14 | /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf

Mar 10 19:21:14 | You can run /sbin/mpathconf to create or modify /etc/multipath.conf

DM multipath kernel driver not loaded    ----這個提示沒了說明DM模組已載入成功。

從上面的提示可以看到,DM模組是成功載入,但是/etc/下沒有multipath.conf 配置檔案,下一步介紹如何配置multipath.conf 檔案。

2. 配置multipath:

通過vi命令建立一個Multipath的配置檔案路徑是/etc/multipath.conf ,在配置檔案中新增multipath正常工作的最簡配置如下:

vi /etc/multipath.conf

blacklist {

devnode "^sda"

}

defaults {

user_friendly_names yes

path_grouping_policy multibus

failback immediate

no_path_retry fail

}

編輯完成後儲存配置,同時通過命令:

#開啟mulitipath服務
# /etc/init.d/multipathd start 

如果出現無法開啟服務的情況,沒有提示OK的話如下:

[[email protected] mapper]# service multipathd start

Starting multipathd daemon:     沒有提示OK

重新開關一下服務就可以解決了。

[[email protected] mapper]# /etc/init.d/multipathd stop

Stopping multipathd daemon:                                [  OK  ]

[[email protected] mapper]# /etc/init.d/multipathd start

Starting multipathd daemon:                                [  OK  ]  -----提示OK 正常開啟服務

通過命令檢視:

[[email protected] mapper]# multipath -ll

mpatha (360a9800064665072443469563477396c) dm-0 NETAPP,LUN    ----建立了一個lun

size=3.5G features='0' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=4 status=active

|- 1:0:0:0 sdb 8:16 active ready  running   ----多路徑下的兩個碟符sdb和sde.

`- 2:0:0:0 sde 8:64 active ready  running

目錄/dev/mapper/   下多了兩個資料夾mpatha 和mpathap1。

[[email protected] mapper]# cd /dev/mapper/

[[email protected] mapper]# ls

control  mpatha  mpathap1

同時fdisk –l的命令下也多了兩個裝置標識:

沒有配置多路徑時:

[[email protected]~]# fdisk -l

Disk /dev/sda: 146.8 GB, 146815733760 bytes

255 heads, 63 sectors/track, 17849 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x000a6cdd

Device Boot      Start         End      Blocks   Id  System

/dev/sda1   *           1          26      204800   83  Linux

Partition 1 does not end on cylinder boundary.

/dev/sda2              26         287     2097152   82  Linux swap / Solaris

Partition 2 does not end on cylinder boundary.

/dev/sda3             287       17850   141071360   83  Linux

Disk /dev/sdb: 3774 MB, 3774873600 bytes

117 heads, 62 sectors/track, 1016 cylinders

Units = cylinders of 7254 * 512 = 3714048 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0xac956c3a

Device Boot      Start         End      Blocks   Id  System

/dev/sdb1               1        1016     3685001   83  Linux

Partition 1 does not start on physical sector boundary.

Disk /dev/sde: 3774 MB, 3774873600 bytes

117 heads, 62 sectors/track, 1016 cylinders

Units = cylinders of 7254 * 512 = 3714048 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0xac956c3a

Device Boot      Start         End      Blocks   Id  System

/dev/sde1               1        1016     3685001   83  Linux

Partition 1 does not start on physical sector boundary.

兩個CAN網絡卡獲取到同一碟符:

/dev/sde和/dev/sdb.

配置後多了/dev/mapper/mpatha和/dev/mapper/mpathap1:

[[email protected] mapper]# fdisk -l

Disk /dev/sda: 146.8 GB, 146815733760 bytes

255 heads, 63 sectors/track, 17849 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x000a6cdd

Device Boot      Start         End      Blocks   Id  System

/dev/sda1   *           1          26      204800   83  Linux

Partition 1 does not end on cylinder boundary.

/dev/sda2              26         287     2097152   82  Linux swap / Solaris

Partition 2 does not end on cylinder boundary.

/dev/sda3             287       17850   141071360   83  Linux

Disk /dev/sdb: 3774 MB, 3774873600 bytes

117 heads, 62 sectors/track, 1016 cylinders

Units = cylinders of 7254 * 512 = 3714048 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0xac956c3a

Device Boot      Start         End      Blocks   Id  System

/dev/sdb1               1        1016     3685001   83  Linux

Partition 1 does not start on physical sector boundary.

Disk /dev/sde: 3774 MB, 3774873600 bytes

117 heads, 62 sectors/track, 1016 cylinders

Units = cylinders of 7254 * 512 = 3714048 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0xac956c3a

Device Boot      Start         End      Blocks   Id  System

/dev/sde1               1        1016     3685001   83  Linux

Partition 1 does not start on physical sector boundary.

Disk /dev/mapper/mpatha: 3774 MB, 3774873600 bytes

117 heads, 62 sectors/track, 1016 cylinders

Units = cylinders of 7254 * 512 = 3714048 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Disk identifier: 0xac956c3a

Device Boot      Start         End      Blocks   Id  System

/dev/mapper/mpathap1               1        1016     3685001   83  Linux

Partition 1 does not start on physical sector boundary.

Disk /dev/mapper/mpathap1: 3773 MB, 3773441024 bytes

255 heads, 63 sectors/track, 458 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 4096 bytes / 65536 bytes

Alignment offset: 1024 bytes

Disk identifier: 0x00000000

Disk /dev/mapper/mpathap1 doesn't contain a valid partition table

# multipath -F #刪除現有路徑  兩個新的路徑就會被刪除
# multipath -v2 #格式化路徑   格式化後又出現

3. multipath磁碟的基本操作

要對多路徑軟體生成的磁碟進行操作直接操作/dev/mapper/目錄下的磁碟就行.

在對多路徑軟體生成的磁碟進行分割槽之前最好執行一下pvcreate命令:

# pvcreate /dev/mapper/mpatha

# fdisk /dev/mapper/mpatha  分割槽時用這個目錄/dev/mapper/mpatha

用fdisk對多路徑軟體生成的磁碟進行分割槽儲存時會有一個報錯,此報錯不用理會.

# ls -l /dev/mapper/

[[email protected] mnt]#  ls -l /dev/mapper/

total 0

crw-rw----. 1 root root 10, 58 Mar 10 19:10 control

lrwxrwxrwx. 1 root root      7 Mar 10 20:28 mpatha -> ../dm-0

lrwxrwxrwx. 1 root root      7 Mar 10 20:33 mpathap1 -> ../dm-1

的mpathap1就是我們對multipath磁碟進行的分割槽

# mkfs.ext4 /dev/mapper/mpathap1 #對mpath1p1分割槽格式化成ext4檔案系統

# mount /dev/mapper/mpathap1 /mnt/ #掛載mpathap1分割槽

格式化和掛載時用/dev/mapper/mpathap1 

4. 分割槽磁碟:

上面有提到分割槽時用目錄/dev/mapper/mpatha

[[email protected]~]# fdisk /dev/mapper/mpatha

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel

Building a new DOS disklabel with disk identifier 0xac956c3a.

Changes will remain in memory only, until you decide to write them.

After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

WARNING: DOS-compatible mode is deprecated. It's strongly recommended to

switch off the mode (command 'c') and change display units to

sectors (command 'u').

Command (m for help): n------------------------新建分割槽

Command action

e   extended

p   primary partition (1-4)

p-----------------------------主分割槽

Partition number (1-4): 1

First cylinder (1-1016, default 1):

Using default value 1

Last cylinder, +cylinders or +size{K,M,G} (1-1016, default 1016):

Using default value 1016

Command (m for help): w ---------------------寫入列表相當於儲存

The partition table has been altered!

Calling ioctl() to re-read partition table.

Syncing disks.

注:如果同一臺裝置的兩個node掛同樣的碟符,另一個碟符還需要再次寫入w就行。不需要n了。

5. 格式化:

[[email protected] ~]# mkfs.ext4 /dev/mapper/mpathap1

mke2fs 1.41.12 (17-May-2010)

/dev/sdd1 alignment is offset by 1024 bytes.

This may result in very poor performance, (re)-partitioning suggested.

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

Stride=1 blocks, Stripe width=16 blocks

230608 inodes, 921250 blocks

46062 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=943718400

29 block groups

32768 blocks per group, 32768 fragments per group

7952 inodes per group

Superblock backups stored on blocks:

32768, 98304, 163840, 229376, 294912, 819200, 884736

Writing inode tables: done

Creating journal (16384 blocks): done

Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 33 mounts or

180 days, whichever comes first.  Use tune2fs -c or -i to override.

6. 掛載 /dev/mapper/mpathap1 到 /mnt

[[email protected] ~]# mount  /dev/mapper/mpathap1  /mnt

三、multipath的高階配置之前的配置都是用multipath的預設配置來完成multipath,比如對映裝置的名稱,multipath負載均衡的方法都是預設設定。那有沒有按照我們自己定義的方法來配置multipath呢,答案是OK。

1、multipath.conf檔案的配置

接下來的工作就是要編輯/etc/multipath.conf的配置檔案

multipath.conf主要包括blacklist、multipaths、devices三部份的配置

blacklist配置

blacklist {

devnode "^sda"

}

Multipaths部分配置multipaths和devices兩部份的配置。

multipaths {

multipath {

wwid **************** #此值multipath -v3可以看到

alias iscsi-dm0 #對映後的別名,可以隨便取

path_grouping_policy multibus #路徑組策略

path_checker tur #決定路徑狀態的方法

path_selector "round-robin 0" #選擇那條路徑進行下一個IO操作的方法

}

}

Devices部分配置

devices {

device {

vendor "iSCSI-Enterprise" #廠商名稱

product "Virtual disk" #產品型號

path_grouping_policy multibus #預設的路徑組策略

getuid_callout "/sbin/scsi_id -g -u -s /block/%n" #獲得唯一裝置號使用的預設程式

prio_callout      "/sbin/acs_prio_alua %d" #獲取有限級數值使用的預設程式

path_checker readsector0 #決定路徑狀態的方法

path_selector "round-robin 0" #選擇那條路徑進行下一個IO操作的方法

failback        immediate #故障恢復的模式

   no_path_retry      queue #在disable queue之前系統嘗試使用失效路徑的次數的數值

  rr_min_io       100 #在當前的使用者組中,在切換到另外一條路徑之前的IO請求的數目

}

}

下面是相關引數的標準文件的介紹:

Attribute

Description

wwid

Specifies the WWID of the multipath device to which the multipath attributes apply. This parameter is mandatory for this section of themultipath.conf file.

alias

Specifies the symbolic name for the multipath device to which themultipath attributes apply. If you are using user_friendly_names, do not set this value tompathn; this may conflict with an automatically assigned user friendly name and give you incorrect device node names.

path_grouping_policy

Specifies the default path grouping policy to apply to unspecified multipaths. Possible values include:

failover = 1 path per priority group

multibus = all valid paths in 1 priority group

group_by_serial = 1 priority group per detected serial number

group_by_prio = 1 priority group per path priority value

group_by_node_name = 1 priority group per target node name

path_selector

Specifies the default algorithm to use in determining what path to use for the next I/O operation. Possible values include:

round-robin 0: Loop through every path in the path group, sending the same amount of I/O to each.

queue-length 0: Send the next bunch of I/O down the path with the least number of outstanding I/O requests.

service-time 0: Send the next bunch of I/O down the path with the shortest estimated service time, which is determined by dividing the total size of the outstanding I/O to each path by its relative throughput.

failback

Manages path group failback.

A value of immediate specifies immediate failback to the highest priority path group that contains active paths.

A value of manual specifies that there should not be immediate failback but that failback can happen only with operator intervention.

A value of followover specifies that automatic failback should be performed when the first path of a path group becomes active. This keeps a node from automatically failing back when another node requested the failover.

A numeric value greater than zero specifies deferred failback, expressed in seconds.

prio

Specifies the default function to call to obtain a path priority value. For example, the ALUA bits in SPC-3 provide an exploitableprio value. Possible values include:

const: Set a priority of 1 to all paths.

emc: Generate the path priority for EMC arrays.

alua: Generate the path priority based on the SCSI-3 ALUA settings.

tpg_pref: Generate the path priority based on the SCSI-3 ALUA settings, using the preferred port bit.

ontap: Generate the path priority for NetApp arrays.

rdac: Generate the path priority for LSI/Engenio RDAC controller.

hp_sw: Generate the path priority for Compaq/HP controller in active/standby mode.

hds: Generate the path priority for Hitachi HDS Modular storage arrays.

no_path_retry

A numeric value for this attribute specifies the number of times the system should attempt to use a failed path before disabling queueing.

A value of fail indicates immediate failure, without queueing.

A value of queue indicates that queueing should not stop until the path is fixed.

rr_min_io

Specifies the number of I/O requests to route to a path before switching to the next path in the current path group. This setting is only for systems running kernels older that 2.6.31. Newer systems should userr_min_io_rq. The default value is 1000.

rr_min_io_rq

Specifies the number of I/O requests to route to a path before switching to the next path in the current path group, using request-based device-mapper-multipath. This setting should be used on systems running current kernels. On systems running kernels older than 2.6.31, use rr_min_io. The default value is 1.

rr_weight

If set to priorities, then instead of sending rr_min_io requests to a path before callingpath_selector to choose the next path, the number of requests to send is determined byrr_min_io times the path's priority, as determined by the prio function. If set touniform, all path weights are equal.

flush_on_last_del

If set to yes, then multipath will disable queueing when the last path to a device has been deleted.

在我本地的一個完整的高階配置如下:

[[email protected] ~]# vi /etc/multipath.conf

blacklist {

devnode "^sda"

}

multipaths {

multipath {

wwid       360a98000646650724434697454546156

alias      mpathb_fcoe

path_grouping_policy    multibus

#path_checker            "directio"

prio                    "random"

path_selector           "round-robin 0"

}

}

devices {

device {

vendor     "NETAPP"

product    "LUN"

getuid_callout       "/lib/udev/scsi_id --whitelisted --device=/dev/%n"

#path_checker    "directio"

#path_selector             "round-robin 0"

failback             immediate

no_path_retry fail

}

}

其中 wwid,vendor,product, getuid_callout這些引數可以通過:multipath -v3命令來獲取。如果在/etc/multipath.conf中有設定各wwid 別名,別名會覆蓋此設定。

四、負載均衡測試:

可以使用dd命令來對裝置進行讀寫操作,並同時通過iostat來檢視I/0狀態,流量從哪個路徑出去:

DD命令:dd if=/dev/zero of=/mnt/1Gfile bs=8k count=131072    在上面我們已經把磁碟掛載在/MNT資料夾下所以我們在讀寫磁碟時直接對/mnt資料夾直接讀寫就可以了。

如果想對磁碟重複讀寫可以用如下語句:

[[email protected] ~]# for ((i=1;i<=5;i++));do dd if=/dev/zero of=/mnt/1Gfile bs=8k count=131072 2>&1|grep MB;done; ---重複讀寫5次這個值可以根據自己測試需求修改。

深度分析LINUX環境下如何配置multi-path

另一個控制檯輸入iostat 2 10檢視IO讀寫狀態:

深度分析LINUX環境下如何配置multi-path

可以看到sdc和sdd是兩個多路徑的碟符,流量均勻的負載在兩條路徑中,負載均衡很成功。

五、路徑冗餘備份測試

將其中一條路徑的埠down掉,所有流量會直接切換到另一個路徑中。

深度分析LINUX環境下如何配置multi-path

轉載於 https://blog.csdn.net/CrazyTeam/article/details/41483509?utm_source=copy