centos7.5部署drbd centos 7.5部署heartbe mysql高可用方案 heartbeat+DRBD+mysq

做雙機熱備方案需要用到Hearbeat和存儲設備(如果沒存儲設備,可以用DRBD代替,但是最好用存儲設備)。

Heartbeat:如果熱備服務器在規定的時間內沒有收到主服務器心跳消息那麽熱備服務器會認為主服務器宕機了,熱備服務器就開始工作啟動IP、服務等也就是啟動故障轉移程序。啟動故障轉移程序的同時並取得主服務器上相關資源服務的控制權,接替主服務器繼續不間斷的提供服務,從而達到資源及服務高可用性的目的。
技術分享圖片


DRBD(代替存儲設備):Distributed Replicated Block Device(DRBD)是一個用軟件實現的、無共享的、服務器之間鏡像塊設備內容的存儲復制解決方案。用來將兩臺服務器的數據同步成一模一樣,只能一臺服務器掛載。可以理解為DRBD其實就是個網絡Raid 1。

技術分享圖片


DRBD原理參考:
https://www.cnblogs.com/guoting1202/p/3975685.html
https://blog.csdn.net/leshami/article/details/49509919


一、環境描述
系統版本:centos7.5 x64
DRBD版本:DRBD-8.4.3

node1(主節點)IP: 192.168.1.54 主機名:drbd1.db.com
node2(從節點)IP: 192.168.1.52 主機名:drbd2.db.com
虛擬IP地址(VIP):192.168.1.55


(node1) 僅為主節點配置
(node2) 僅為從節點配置
(node1,node2) 為主從節點共同配置


二、安裝前準備
1、更改主機名和hosts記錄(node1、node2)
node1:

# cat /etc/hostname 
drbd1.db.com
# cat /etc/hosts
127.0.0.1   localhost drbd1.db.com localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.54    drbd1.db.com
192.168.1.52    drbd2.db.com


node2:

# cat /etc/hostname 
drbd2.db.com
# cat /etc/hosts
127.0.0.1   localhost drbd2.db.com localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.54    drbd1.db.com
192.168.1.52    drbd2.db.com


2、關閉iptables和SELINUX,避免安裝過程中報錯,部署完成後可以再開啟(node1,node2)

# systemctl stop firewalld
# systemctl disable firewalld
# setenforce 0
# vi /etc/selinux/config
---------------
SELINUX=disabled
---------------


3、重啟服務器(node1、node2)

4、當前方案兩臺服務器/dev/mapper/centos-home的lvm分區大小一樣,並且都掛載到/home目錄了。所有要先卸載掛載、刪除lv、重新建立lv、然後不格式化不掛載、創建/store目錄(node1,node2)

# umount /home
# vi /etc/fstab
# lvremove /dev/mapper/centos-home
# lvcreate -n db -l +247071 centos
xfs signature detected on /dev/centos/db at offset 0. Wipe it? (會提示這個,我都是輸入的n)
# mkdir /store

備註:總之不管是普通分區,還是lv。drbd需要的是幹凈的分區,不要格式化

5、時間同步(node1,node2)

# yum install -y rdate
# rdate -s time-b.nist.gov



三、DRBD的安裝配置
1、安裝依賴包:(node1,node2)

# yum install gcc gcc-c++ make glibc flex kernel-devel kernel-headers

----------------------------這步不要執行,因為centos 7.5編譯安裝不成功----------------------------


2、編譯安裝DRBD,在centos 7.5上編譯安裝出錯了(6.X沒問題),而且網上也沒有解決辦法:(node1,node2)

# wget http://www.drbd.org/download/drbd/8.4/archive/drbd-8.4.3.tar.gz
# tar zxvf drbd-8.4.3.tar.gz
# cd drbd-8.4.3
# ./configure --prefix=/usr/local/drbd --with-km
# make KDIR=/usr/src/kernels/3.10.0-862.2.3.el7.x86_64/ (請替換成您操作系統內核版本)
# make install
# mkdir -p /usr/local/drbd/var/run/drbd
# cp /usr/local/drbd/etc/rc.d/init.d/drbd /etc/rc.d/init.d
# chkconfig --add drbd
# chkconfig drbd on

----------------------------這步不要執行,因為centos 7.5編譯安裝不成功----------------------------

3、由於編譯安裝沒成功,所以選擇yum方式安裝(node1,node2)

# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
# yum install -y kmod-drbd84 drbd84-utils
# systemctl enable drbd


4、配置文件介紹

# /etc/drbd.conf #主配置文件
# /etc/drbd.d/global_common.conf #全局配置文件


5、加載DRBD模塊、查看DRBD模塊是否加載到內核:(node1,node2)

# modprobe drbd
# lsmod |grep drbd
drbd                  397041  0 
libcrc32c              12644  2 xfs,drbd

如果加載DRBD模塊報下面的錯誤:
# modprobe drbd
FATAL: Module drbd not found.
備註:由於在安裝依賴包的時候,已經安裝kernel,所以一般情況下不會出現下面的錯誤。如果出現了可以先嘗試重啟看下,如果重啟後還是不行,就按照下面的方法操作:
原因:這個報錯是因為內核並不支持此模塊,所以需要更新內核,
更新內核的方法是:yum install kernel(備註:如果沒報錯不建議更新)
更新後,記得一定要重新啟動操作系統!!!
重啟系統後再次使用命令查看,此時的內核版本變為
# uname -r
此時再次嘗試加載模塊drbd
# modprobe drbd

6、參數配置:(node1,node2)

# vi /etc/drbd.d/db.res   
resource r0{
protocol C;
 
startup { wfc-timeout 0; degr-wfc-timeout 120;}
disk { on-io-error detach;}
net{
timeout 60;
connect-int 10;
ping-int 10;
max-buffers 2048;
max-epoch-size 2048;
}
syncer { rate 200M;}
 
on drbd1.db.com{
device /dev/drbd0;
disk /dev/centos/db;
address 192.168.1.54:7788;
meta-disk internal;
}
on drbd2.db.com{
device /dev/drbd0;
disk /dev/centos/db;
address 192.168.1.52:7788;
meta-disk internal;
}
}

註:請修改上面配置中的主機名、IP、和disk為自己的具體配置
註:之前我是直接刪除/usr/local/drbd/etc/drbd.conf裏面內容,直接在這裏面加入上面的信息



7、創建DRBD設備並激活r0資源:(node1,node2)

# mknod /dev/drbd0 b 147 0
# drbdadm create-md r0
等待片刻,顯示success表示drbd塊創建成功
md_offset 1036290879488
al_offset 1036290846720
bm_offset 1036259221504
Found some data
 ==> This might destroy existing data! <==
Do you want to proceed?
[need to type 'yes' to confirm] yes
initializing activity log
initializing bitmap (30884 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
success
註意:如果等很久都沒提示success,就按下回車鍵再等等。

再次輸入該命令:
# drbdadm create-md r0
成功激活r0
You want me to create a v08 style flexible-size internal meta data block.
There appears to be a v08 flexible-size internal meta data block
already in place on /dev/centos/db at byte offset 1036290879488
Do you really want to overwrite the existing meta-data?
[need to type 'yes' to confirm] yes
md_offset 1036290879488
al_offset 1036290846720
bm_offset 1036259221504
Found some data
 ==> This might destroy existing data! <==
Do you want to proceed?
[need to type 'yes' to confirm] yes
initializing activity log
initializing bitmap (30884 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.


8、啟動DRBD服務:(node1,node2)

# systemctl start drbd
# systemctl status drbd

註意:需要主從共同啟動方能生效

9、查看狀態:(node1,node2)

# cat /proc/drbd
version: 8.4.11-1 (api:1/proto:86-101)
GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by [email protected], 2018-04-26 12:10:42
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1011971896

這裏ro:Secondary/Secondary表示兩臺主機的狀態都是備機狀態,ds是磁盤狀態,顯示的狀態內容為“Inconsistent不一致”,這是因為DRBD無法判斷哪一方為主機,應以哪一方的磁盤數據作為標準。

10、將drbd1.gxm.com主機配置為主節點:(node1,註意只有node1,這步一定要等待顯示下面的狀態後才能執行下一步)

# drbdsetup /dev/drbd0 primary --force 
查看同步過程:
# cat /proc/drbd 
version: 8.4.11-1 (api:1/proto:86-101)
GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by [email protected], 2018-04-26 12:10:42
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:286720 nr:0 dw:0 dr:288816 al:8 bm:0 lo:0 pe:1 ua:0 ap:0 ep:1 wo:f oos:1011686200
    [>....................] sync'ed:  0.1% (987972/988252)M
    finish: 6:52:41 speed: 40,812 (40,812) K/sec

查看同步完成後的狀態:
(node1)
# cat /proc/drbd
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2015-05-12 21:05:41
m:res  cs         ro                 ds                 p  mounted     fstype
0:r0   Connected Primary/Secondary UpToDate/UpToDate C
(node2)
# cat /proc/drbd
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2015-05-12 21:05:46
m:res  cs         ro                 ds                 p  mounted     fstype
0:r0   Connected Secondary/Primary UpToDate/UpToDate C

備註:ro在主從服務器上分別顯示 Primary/Secondary和Secondary/Primary
ds顯示UpToDate/UpToDate,表示主從配置成功(註意這個需要時間初始化和同步的,請等待顯示成上面的狀態後再執行下面的步驟)。

11、掛載DRBD:(node1,註意只有node1)
從剛才的狀態上看到mounted和fstype參數為空,所以我們這步開始掛載DRBD到系統目錄/store

# mkfs.ext4 /dev/drbd0
# mount /dev/drbd0 /store
# df -h

註:Secondary節點上不允許對DRBD設備進行任何操作,包括掛載;所有的讀寫操作只能在Primary節點上進行,只有當Primary節點掛掉時,Secondary節點才能提升為Primary節點,並自動掛載DRBD繼續工作。

成功掛載後的DRBD狀態:(node1,註意只有node1)

# cat /proc/drbd
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2015-05-12 21:05:41
m:res  cs         ro                 ds                 p  mounted     fstype
0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /store      ext4



四、安裝業務程序(比如mysql),將數據目錄放在/store分區裏
數據目錄可以改數據庫的配置文件,將數據存儲目錄改成/store。或者用軟鏈接方式,我用的最多的就是軟鏈接這種。

1、將業務程序服務停止、並且設置開機不自動啟動

# systemctl stop mysqld
# systemctl disable mysqld


2、在node1進行以下操作(先把/store目錄掛載到node1)
移動目錄到 /store 目錄

# mv /usr/local/kkmail/data/mysql/default/kkmail /store
# mv /usr/local/kkmail/data/mysql/default/ibdata1 /store
# mv /usr/local/kkmail/data/mysql/default/ib_logfile0 /store
# mv /usr/local/kkmail/data/mysql/default/ib_logfile1 /store


建立軟鏈接

# ln -s /store/kkmail /usr/local/kkmail/data/mysql/default/kkmail
# ln -s /store/ibdata1 /usr/local/kkmail/data/mysql/default/ibdata1 
# ln -s /store/ib_logfile0 /usr/local/kkmail/data/mysql/default/ib_logfile0 
# ln -s /store/ib_logfile1 /usr/local/kkmail/data/mysql/default/ib_logfile1


更正權限

# chown -R kkmail_mysql.kkmail_mysql /usr/local/kkmail/data/mysql/default/kkmail
# chown -R kkmail_mysql.kkmail_mysql /usr/local/kkmail/data/mysql/default/ibdata1 
# chown -R kkmail_mysql.kkmail_mysql /usr/local/kkmail/data/mysql/default/ib_logfile0 
# chown -R kkmail_mysql.kkmail_mysql /usr/local/kkmail/data/mysql/default/ib_logfile1


3、在node2上面進行以下操作
修改原來的內容

# mv /usr/local/kkmail/data/mysql/default/kkmail{,_bak}
# mv /usr/local/kkmail/data/mysql/default/ibdata1{,_bak}
# mv /usr/local/kkmail/data/mysql/default/ib_logfile0{,_bak}
# mv /usr/local/kkmail/data/mysql/default/ib_logfile1{,_bak}


# 建立軟鏈接

# ln -s /store/kkmail /usr/local/kkmail/data/mysql/default/kkmail
# ln -s /store/ibdata1 /usr/local/kkmail/data/mysql/default/ibdata1 
# ln -s /store/ib_logfile0 /usr/local/kkmail/data/mysql/default/ib_logfile0 
# ln -s /store/ib_logfile1 /usr/local/kkmail/data/mysql/default/ib_logfile1

備註:
ln -s 源地址 目標地址
軟鏈接可以對一個不存在的文件名進行鏈接
軟鏈接可以對目錄進行鏈接


五、Hearbeat配置
1、編譯安裝heartbeat,centos7下沒有heartbeat的yum源(node1,node2)
相關包下載地址:http://www.linux-ha.org/wiki/Downloads

安裝基礎環境

# yum install -y bzip2 autoconf automake libtool glib2-devel libxml2-devel bzip2-devel libtool-ltdl-devel asciidoc libuuid-devel psmisc
安裝glue
# wget http://hg.linux-ha.org/glue/archive/0a7add1d9996.tar.bz2
# tar jxvf 0a7add1d9996.tar.bz2
# cd Reusable-Cluster-Components-glue--0a7add1d9996/
# groupadd haclient
# useradd -g haclient hacluster
# ./autogen.sh
# ./configure --prefix=/usr/local/heartbeat/
# make
# make install

安裝Resource Agents
# wget https://github.com/ClusterLabs/resource-agents/archive/v3.9.6.tar.gz
# tar zxvf v3.9.6.tar.gz
# cd resource-agents-3.9.6/
# ./autogen.sh
# export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib"
# ./configure --prefix=/usr/local/heartbeat/
# vi /etc/ld.so.conf.d/heartbeat.conf
/usr/local/heartbeat/lib
# ldconfig
# make
# make install

安裝HeartBeat
# wget http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/958e11be8686.tar.bz2
# tar jxvf 958e11be8686.tar.bz2
# cd Heartbeat-3-0-958e11be8686
# ./bootstrap
# export CFLAGS="$CFLAGS -I/usr/local/heartbeat/include -L/usr/local/heartbeat/lib"
# ./configure --prefix=/usr/local/heartbeat/
# vi /usr/local/heartbeat/include/heartbeat/glue_config.h
/*define HA_HBCONF_DIR “/usr/local/heartbeat/etc/ha.d/”*/   (註意這行用/**/註釋掉)
# make
# make install


2、復制配置文件

# cp /usr/local/heartbeat/share/doc/heartbeat/ha.cf  /usr/local/heartbeat/etc/ha.d
# cp /usr/local/heartbeat/share/doc/heartbeat/authkeys /usr/local/heartbeat/etc/ha.d
# cp /usr/local/heartbeat/share/doc/heartbeat/haresources /usr/local/heartbeat/etc/ha.d


3、設置ha.cf配置文件
(node1)
編輯ha.cf,添加下面配置:

# vi /usr/local/heartbeat/etc/ha.d/ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
keepalive 2
warntime 10
deadtime 30
initdead 60
udpport 1112
bcast ens192
ucast ens192 192.168.1.54
#baud 19200
auto_failback off
node drbd1.db.com
node drbd2.db.com
ping 192.168.1.3
respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail


(node2)
編輯ha.cf,添加下面配置:

# vi /usr/local/heartbeat/etc/ha.d/ha.cf
debugfile /var/log/ha-debug
logfile /var/log/ha-log
keepalive 2
warntime 10
deadtime 30
initdead 60
udpport 1112
bcast ens192
ucast ens192 192.168.1.52
#baud 19200
auto_failback off
node drbd1.db.com
node drbd2.db.com
ping 192.168.1.3
respawn hacluster /usr/local/heartbeat/libexec/heartbeat/ipfail


4、編輯雙機互聯驗證文件authkeys,添加以下內容:(node1,node2)

# vi /usr/local/heartbeat/etc/ha.d/authkeys
auth 1
1 crc

給驗證文件600權限
# chmod 600 /usr/local/heartbeat/etc/ha.d/authkeys

5、編輯集群資源haresources文件

# vi /usr/local/heartbeat/etc/ha.d/haresources
(node1)
drbd1.db.com IPaddr::192.168.1.55/24/ens192 drbddisk::r0 Filesystem::/dev/drbd0::/store mysqld
(node2)
drbd2.db.com IPaddr::192.168.1.55/24/ens192 drbddisk::r0 Filesystem::/dev/drbd0::/store mysqld

主機名是自己的,ip地址是雙機熱備虛擬IP地址。
註:該文件內IPaddr,Filesystem等腳本存放路徑在/etc/ha.d/resource.d/下,也可在該目錄下存放服務啟動腳本(例如:mysql,www),將相同腳本名稱添加到/etc/ha.d/haresources內容中,從而跟隨heartbeat啟動而啟動該腳本。
IPaddr::192.168.1.55/24/ens192:用IPaddr腳本配置對外服務的浮動虛擬IP
drbddisk::r0:用drbddisk腳本實現DRBD主從節點資源組的掛載和卸載
Filesystem::/dev/drbd0::/store:用Filesystem腳本實現磁盤掛載和卸載


六、創建DRBD腳本文件drbddisk:(node1,node2)

編輯drbddisk,添加下面的腳本內容

# vi /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk
#!/bin/bash
#
# This script is inteded to be used as resource script by heartbeat
#
# Copright 2003-2008 LINBIT Information Technologies
# Philipp Reisner, Lars Ellenberg
#
###
 
DEFAULTFILE="/etc/default/drbd"
DRBDADM="/sbin/drbdadm"
 
if [ -f $DEFAULTFILE ]; then
 . $DEFAULTFILE
fi
 
if [ "$#" -eq 2 ]; then
 RES="$1"
 CMD="$2"
else
 RES="all"
 CMD="$1"
fi
 
## EXIT CODES
# since this is a "legacy heartbeat R1 resource agent" script,
# exit codes actually do not matter that much as long as we conform to
#  http://wiki.linux-ha.org/HeartbeatResourceAgent
# but it does not hurt to conform to lsb init-script exit codes,
# where we can.
#  http://refspecs.linux-foundation.org/LSB_3.1.0/
#LSB-Core-generic/LSB-Core-generic/iniscrptact.html
####
 
drbd_set_role_from_proc_drbd()
{
local out
if ! test -e /proc/drbd; then
ROLE="Unconfigured"
return
fi
 
dev=$( $DRBDADM sh-dev $RES )
minor=${dev#/dev/drbd}
if [[ $minor = *[!0-9]* ]] ; then
# sh-minor is only supported since drbd 8.3.1
minor=$( $DRBDADM sh-minor $RES )
fi
if [[ -z $minor ]] || [[ $minor = *[!0-9]* ]] ; then
ROLE=Unknown
return
fi
 
if out=$(sed -ne "/^ *$minor: cs:/ { s/:/ /g; p; q; }" /proc/drbd); then
set -- $out
ROLE=${5%/**}
: ${ROLE:=Unconfigured} # if it does not show up
else
ROLE=Unknown
fi
}
 
case "$CMD" in
   start)
# try several times, in case heartbeat deadtime
# was smaller than drbd ping time
try=6
while true; do
$DRBDADM primary $RES && break
let "--try" || exit 1 # LSB generic error
sleep 1
done
;;
   stop)
# heartbeat (haresources mode) will retry failed stop
# for a number of times in addition to this internal retry.
try=3
while true; do
$DRBDADM secondary $RES && break
# We used to lie here, and pretend success for anything != 11,
# to avoid the reboot on failed stop recovery for "simple
# config errors" and such. But that is incorrect.
# Don't lie to your cluster manager.
# And don't do config errors...
let --try || exit 1 # LSB generic error
sleep 1
done
;;
   status)
if [ "$RES" = "all" ]; then
   echo "A resource name is required for status inquiries."
   exit 10
fi
ST=$( $DRBDADM role $RES )
ROLE=${ST%/**}
case $ROLE in
Primary|Secondary|Unconfigured)
# expected
;;
*)
# unexpected. whatever...
# If we are unsure about the state of a resource, we need to
# report it as possibly running, so heartbeat can, after failed
# stop, do a recovery by reboot.
# drbdsetup may fail for obscure reasons, e.g. if /var/lock/ is
# suddenly readonly.  So we retry by parsing /proc/drbd.
drbd_set_role_from_proc_drbd
esac
case $ROLE in
Primary)
echo "running (Primary)"
exit 0 # LSB status "service is OK"
;;
Secondary|Unconfigured)
echo "stopped ($ROLE)"
exit 3 # LSB status "service is not running"
;;
*)
# NOTE the "running" in below message.
# this is a "heartbeat" resource script,
# the exit code is _ignored_.
echo "cannot determine status, may be running ($ROLE)"
exit 4 #  LSB status "service status is unknown"
;;
esac
;;
   *)
echo "Usage: drbddisk [resource] {start|stop|status}"
exit 1
;;
esac
 
exit 0


賦予755執行權限:

# chmod 755 /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk



七、啟動HeartBeat服務

在兩個節點上啟動HeartBeat服務,先啟動node1,再啟動node2:(node1,node2)

# systemctl start heartbeat
# systemctl enable heartbeat
# systemctl status heartbeat


如果啟動失敗,麻煩執行這兩條命令後再啟動:

# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/* /usr/local/heartbeat/lib/heartbeat/plugins/RAExec/
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/* /usr/local/heartbeat/lib/heartbeat/plugins/



八、測試雙機熱備(高可用)

測試之前建議開啟防火墻firewalld,允許7788、1112端口(其中1112端口UDP端口)。
重啟、關機或停止heartbeat服務。但是不要同時重啟,要不兩臺服務器會同時掛載存儲),node2節點會立即無縫接管。

註意:此時node2上的DRBD狀態連接狀態可能是WFConnection,等nod1開機後就會變成Connected,並且ro和ds也會顯示Primary/Secondary UpToDate/UpToDate

# cat /proc/drbd
version: 8.4.11-1 (api:1/proto:86-101)
GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by [email protected], 2018-04-26 12:10:42
 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:12 dw:12 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

# cat /proc/drbd
version: 8.4.11-1 (api:1/proto:86-101)
GIT-hash: 66145a308421e9c124ec391a7848ac20203bb03c build by [email protected], 2018-04-26 12:10:42
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:8 nr:12 dw:24 dr:6605 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

備註:訪問服務器要用漂移IP(VIP):192.168.1.55


九、日誌和常見問題

1、重啟或關機mail1,mail2服務器日誌如下(正常切換的日誌):

May 18 12:14:39 drbd1.db.com heartbeat: [1243]: info: Received shutdown notice from 'drbd2.db.com'.
May 18 12:14:39 drbd1.db.com heartbeat: [1243]: info: Resources being acquired from drbd2.db.com.
May 18 12:14:39 drbd1.db.com heartbeat: [1812]: info: acquire all HA resources (standby).
ResourceManager(default)[1838]: 2018/05/18_12:14:39 info: Acquiring resource group: drbd1.db.com IPaddr::192.168.1.55/24/ens192 drbddisk::r0 Filesystem::/dev/drbd0::/store
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.1.55)[1889]:  2018/05/18_12:14:39 INFO:  Resource is stopped
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.1.55)[1890]:  2018/05/18_12:14:39 INFO:  Resource is stopped
May 18 12:14:39 drbd1.db.com heartbeat: [1813]: info: Local Resource acquisition completed.
ResourceManager(default)[1838]: 2018/05/18_12:14:39 info: Running /usr/local/heartbeat/etc/ha.d/resource.d/IPaddr 192.168.1.55/24/ens192 start
IPaddr(IPaddr_192.168.1.55)[2029]: 2018/05/18_12:14:39 INFO: Using calculated netmask for 192.168.1.55: 255.255.255.0
IPaddr(IPaddr_192.168.1.55)[2029]: 2018/05/18_12:14:39 INFO: eval ifconfig ens192:0 192.168.1.55 netmask 255.255.255.0 broadcast 192.168.1.255
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.1.55)[2003]:  2018/05/18_12:14:39 INFO:  Success
ResourceManager(default)[1838]: 2018/05/18_12:14:39 info: Running /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk r0 start
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[2176]: 2018/05/18_12:14:39 INFO:  Resource is stopped
ResourceManager(default)[1838]: 2018/05/18_12:14:39 info: Running /usr/local/heartbeat/etc/ha.d/resource.d/Filesystem /dev/drbd0 /store start
Filesystem(Filesystem_/dev/drbd0)[2260]:  2018/05/18_12:14:39 INFO: Running start for /dev/drbd0 on /store
Filesystem(Filesystem_/dev/drbd0)[2260]:  2018/05/18_12:14:39 INFO: Starting filesystem check on /dev/drbd0
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[2250]: 2018/05/18_12:14:40 INFO:  Success
May 18 12:14:40 drbd1.db.com heartbeat: [1812]: info: all HA resource acquisition completed (standby).
May 18 12:14:40 drbd1.db.com heartbeat: [1243]: info: Standby resource acquisition done [all].
harc(default)[2338]:  2018/05/18_12:14:40 info: Running /usr/local/heartbeat/etc/ha.d/rc.d/status status
mach_down(default)[2355]: 2018/05/18_12:14:40 info: /usr/local/heartbeat/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[2355]: 2018/05/18_12:14:40 info: mach_down takeover complete for node drbd2.db.com.
May 18 12:14:40 drbd1.db.com heartbeat: [1243]: info: mach_down takeover complete.
harc(default)[2391]:  2018/05/18_12:14:40 info: Running /usr/local/heartbeat/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp(default)[2391]: 2018/05/18_12:14:40 received ip-request-resp IPaddr::192.168.1.55/24/ens192 OK yes
ResourceManager(default)[2414]: 2018/05/18_12:14:40 info: Acquiring resource group: drbd1.db.com IPaddr::192.168.1.55/24/ens192 drbddisk::r0 Filesystem::/dev/drbd0::/store
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_192.168.1.55)[2442]:  2018/05/18_12:14:40 INFO:  Running OK
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[2519]: 2018/05/18_12:14:40 INFO:  Running OK
May 18 12:15:02 drbd1.db.com heartbeat: [1243]: info: Heartbeat restart on node drbd2.db.com
May 18 12:15:02 drbd1.db.com heartbeat: [1243]: info: Status update for node drbd2.db.com: status init
May 18 12:15:02 drbd1.db.com heartbeat: [1243]: info: Status update for node drbd2.db.com: status up
May 18 12:15:02 drbd1.db.com ipfail: [1765]: info: Status update: Node drbd2.db.com now has status init
May 18 12:15:02 drbd1.db.com ipfail: [1765]: info: Status update: Node drbd2.db.com now has status up
harc(default)[2574]:  2018/05/18_12:15:02 info: Running /usr/local/heartbeat/etc/ha.d/rc.d/status status
harc(default)[2591]:  2018/05/18_12:15:02 info: Running /usr/local/heartbeat/etc/ha.d/rc.d/status status
May 18 12:15:04 drbd1.db.com heartbeat: [1243]: info: Status update for node drbd2.db.com: status active
May 18 12:15:04 drbd1.db.com ipfail: [1765]: info: Status update: Node drbd2.db.com now has status active
May 18 12:15:04 drbd1.db.com ipfail: [1765]: info: Asking other side for ping node count.
harc(default)[2612]:  2018/05/18_12:15:04 info: Running /usr/local/heartbeat/etc/ha.d/rc.d/status status
May 18 12:15:05 drbd1.db.com heartbeat: [1243]: info: remote resource transition completed.
May 18 12:15:08 drbd1.db.com ipfail: [1765]: info: No giveup timer to abort.


2、常見錯誤與解決辦法

這個錯誤安裝yum install psmisc解決:
ERROR: Setup problem: couldn't find command: fuser
ERROR: Return code 5 from /usr/local/heartbeat/etc/ha.d/resource.d/Filesystem
ERROR:  Program is not installed

這個錯誤,執行這2條命令解決:
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/RAExec/* /usr/local/heartbeat/lib/heartbeat/plugins/RAExec/
# ln -svf /usr/local/heartbeat/lib64/heartbeat/plugins/* /usr/local/heartbeat/lib/heartbeat/plugins/
ERROR: Illegal directive [bcast] in /usr/local/heartbeat/etc/ha.d/ha.cf
ERROR: Illegal directive [ucast] in /usr/local/heartbeat/etc/ha.d/ha.cf
ERROR: Illegal directive [ping] in /usr/local/heartbeat/etc/ha.d/ha.cf
ERROR: Heartbeat not started: configuration error.
ERROR: Configuration error, heartbeat not started.

這個錯誤:文件不存在或權限不對
# chmod 755 /usr/local/heartbeat/etc/ha.d/resource.d/drbddisk
ERROR: Cannot locate resource script drbddisk
ERROR: Cannot locate resource script drbddisk
ERROR: Cannot locate resource script drbddisk
info: Retrying failed stop operation [drbddisk::r0]
ERROR: Resource script for drbddisk::r0 probably not LSB-compliant.

創建drbd分區報錯,是提示需要一個幹凈的分區,不能格式化
[[email protected] ~]# drbdadm create-md r0
md_offset 1036286685184
al_offset 1036286652416
bm_offset 1036255027200
Found xfs filesystem
  1011998720 kB data area apparently used
  1011967800 kB left usable by current configuration
Device size would be truncated, which
would corrupt data and result in
'access beyond end of device' errors.
You need to either
   * use external meta data (recommended)
   * shrink that filesystem first
   * zero out the device (destroy the filesystem)
Operation refused.

編譯安裝drbd的時候遇到問題,好像解決不了,改用yum安裝drbd
In file included from /root/drbd-8.4.3/drbd/drbd_proc.c:34:0:
/root/drbd-8.4.3/drbd/drbd_int.h:2515:0: warning: "idr_for_each_entry" redefined [enabled by default]
 #define idr_for_each_entry(idp, entry, id)     ^
In file included from include/linux/kernfs.h:14:0,
                 from include/linux/sysfs.h:15,
                 from include/linux/kobject.h:21,
                 from include/linux/module.h:16,
                 from /root/drbd-8.4.3/drbd/drbd_proc.c:26:
include/linux/idr.h:132:0: note: this is the location of the previous definition
 #define idr_for_each_entry(idp, entry, id)    ^
/root/drbd-8.4.3/drbd/drbd_proc.c: In function ‘drbd_proc_open’:
/root/drbd-8.4.3/drbd/drbd_proc.c:320:3: error: implicit declaration of function ‘PDE’ [-Werror=implicit-function-declaration]
   return single_open(file, drbd_seq_show, PDE(inode)->data);
   ^
/root/drbd-8.4.3/drbd/drbd_proc.c:320:53: error: invalid type argument of ‘->’ (have ‘int’)
   return single_open(file, drbd_seq_show, PDE(inode)->data);
                                                     ^
cc1: some warnings being treated as errors
make[3]: *** [/root/drbd-8.4.3/drbd/drbd_proc.o] Error 1
make[2]: *** [_module_/root/drbd-8.4.3/drbd] Error 2
make[2]: Leaving directory `/usr/src/kernels/3.10.0-862.2.3.el7.x86_64'
make[1]: *** [kbuild] Error 2
make[1]: Leaving directory `/root/drbd-8.4.3/drbd'
make: *** [module] Error 2



十、DRBD常見維護

1、註意監控
(1)監控heartbeat服務
(2)監控drbd服務和同步狀態
(3)監控/store掛載情況(同一時間只能掛載一邊)
(4)如果一臺服務器的業務代碼要升級,另外一臺也升級(是放在/store目錄外的數據)

2、服務器維護建議:
(1)不要同時重啟兩臺服務器,否則可能會爭奪資源(術語叫做腦裂),建議間隔5分鐘左右。
(2)不要同時開機兩臺服務器,否則可能會爭奪資源(術語叫做腦裂),建議間隔5分鐘左右。
(3)當前心跳線是192.168.1.0網段的,建議後期在兩臺服務器上各加一個網卡,用網線直接將兩臺服務器相連(IP配置成另外一個網段)。這樣可以避免由於您192.168.1.0網段出現故障造成爭奪資源(術語叫做腦裂)。傳輸速度也更高。

3、怎麽確認同步是否有問題:
最基本的方法,在兩臺服務器上運行df –h命令查看存儲掛載情況:
正常情況:一臺服務器掛載了,另外一臺服務器沒掛載,並且兩邊drbd都是啟動的,並且cat /proc/drbd狀態正常。
不正常情況1:如果兩臺服務器都掛載了,表示不正常,即發生了腦裂。這時候請聯系技術支持解決。
不正常情況2:一臺服務器掛載了,另外一臺服務器沒掛載,但是drdb服務停止狀態,並且cat /proc/drbd狀態不正常。
不正常情況下drbd狀態一般為:
(1). 其中兩個個節點的連接狀態為 StandAlone
(2). 其中一個節點的連接狀態為 WFConnection,另一個問題StandAlone
查看主備服務器DRBD狀態:
cat /proc/drbd

4、DRBD同步異常的原因:
(1). 采用HA環境的時候自動切換導致腦裂;
(2). 人為操作或配置失誤,導致產生的腦裂;
(3). 經驗有限,慚愧的很,只碰到以上2中產生腦裂的原因。
(4). drbd服務停止了

5、使用過程中可能遇到的問題和解決方法:
一般問題狀態可能是這樣的:
備機(hlt1):

[[email protected] ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2016-10-31 10:43:50
m:res  cs            ro                 ds                 p  mounted  fstype
0:r0   WFConnection  Secondary/Unknown  UpToDate/DUnknown  C
[[email protected] ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2016-10-31 10:43:50
 0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:383860


主機(hlt2):

[[email protected] ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2016-10-31 10:49:30
 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-----
ns:0 nr:0 dw:987208 dr:3426933 al:1388 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1380568204
[[email protected] ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2016-10-31 10:49:30
m:res  cs          ro               ds                 p       mounted  fstype
0:r0   StandAlone  Primary/Unknown  UpToDate/DUnknown  r-----  ext4


(1)在備服務器操作:其中example(比如r0)是資源名。

[[email protected] ~]# drbdadm secondary r0
[[email protected] ~]# drbdadm --discard-my-data connect r0 (如果返回錯誤信息,就多執行一次)


(2)在主服務器操作:

[[email protected] ~]# drbdadm connect r0
[[email protected] ~]# cat /proc/drbd
version: 8.4.4 (api:1/proto:86-101)
GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by [email protected], 2013-11-03 00:03:40
 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
    ns:6852 nr:0 dw:264460 dr:8393508 al:39 bm:512 lo:0 pe:2 ua:0 ap:0 ep:1 wo:d oos:257728
    [>....................] sync'ed:  4.7% (257728/264412)K
finish: 0:03:47 speed: 1,112 (1,112) K/sec


(3)備主機上查看:DRBD恢復正常:
備服務器:

[[email protected] ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2016-10-31 10:43:50
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:1455736720 dw:1455736720 dr:0 al:0 bm:140049 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0


主服務器:

[[email protected] ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2016-10-31 10:49:30
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:1455737960 nr:0 dw:85995012 dr:1403665281 al:113720 bm:139737 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0



DRBD日常管理:
http://blog.163.com/[email protected]/blog/static/27011089201561411536667/
http://blog.csdn.net/leshami/article/details/49777677
http://www.cnblogs.com/rainy-shurun/p/5335843.html

centos7.5部署heartbeat+DRBD+mysql高可用方案