1. 程式人生 > >【Web 叢集實戰】22_Nagios

【Web 叢集實戰】22_Nagios

【Web 叢集實戰】22_Nagios

標籤(空格分隔): Web叢集實戰


文章目錄

1. Nagios 監控工具及原理

1.1 Nagios 特點

  • 監控網路服務(HTTP、TCP、PING、SMTP、POP3等)
  • 監控主機資源(CPU、負載、I/O 狀況、虛擬及正式記憶體及磁碟利用率等)
  • 簡單的外掛設計模式使得使用者可以很方便地定製符合自己服務的檢測方法
  • 並行服務檢查機制
  • 具備定義網路分層結構的能力,用 “parent” 主機定義來表達網路間的關係,這種關係可被用來發現和明晰主機宕機或不可達的狀態
  • 當服務或主機問題產生與解決後將及時通報聯絡人(mail/im/sms/sound)
  • 具備定義事件控制代碼功能,它可以在主機或服務的事件發生時獲取更多問題定位
  • 自動的日誌回滾
  • 可以支援並實現對主機的冗餘監控(支援分散式監控)
  • 可選的 Web 介面用於檢視當前的網路狀態、通知和故障歷史、日誌檔案等

1.2 Nagios 監控構成

  • Nagios 監控一般由一個主程式(Nagios)、一個外掛程式(Nagios-plugins)和一些可選的附加程式(NRPE、NSClient++、NSCA 和 NDOUtils)等組成。

2. Nagios 伺服器端安裝

2.1 Nagios 安裝準備

(1)準備 3 臺伺服器或 VM 虛擬機器

HOSTNAME IP 說明
nagios-server 192.168.2.151 Nagios 伺服器端
web001 192.168.2.152 被監控的客戶端伺服器
web002 192.168.2.144 被監控的客戶端伺服器

(2)設定 yum 安裝源

[[email protected] ~]# cp /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak
[[email protected] ~]# wget /etc/yum.repos.d/CentOS-Base.repo http://mirrors.163.com/.help/CentOS7-Base-163.repo

(3) 解決 Perl 軟體編譯問題

[[email protected] ~]# echo 'export LC_ALL=C'>> /etc/profile
[[email protected] ~]# tail -1 /etc/profile
export LC_ALL=C
[[email protected] ~]# source /etc/profile
[[email protected] ~]# echo $LC_ALL
C

(4)關閉 Nagios Server 端防火牆及 SELinux

[[email protected] ~]# systemctl disable firewalld.service
[[email protected] ~]# systemctl stop firewalld.service
[[email protected] ~]# systemctl status firewalld.service
* firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

[[email protected] ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config  
# 修改配置檔案可使配置永久生效,需重啟系統
[[email protected] ~]# cat /etc/selinux/config|grep SELINUX=disabled
SELINUX=disabled
[[email protected] ~]# getenforce
Disabled

(5)解決系統時間同步問題

[[email protected] ~]# echo '#time sync by nagios-server at 2018-09-16' >>/var/spool/cron/root
[[email protected] ~]# echo '*/5 * * * * /usr/sbin/ntpdate ntp1.aliyun.com >/dev/null 2&1' >> /var/spool/cron/root
[[email protected] ~]# crontab -l
#time sync by nagios-server at 2018-10-13
*/2 * * * * /usr/sbin/ntpdate ntp1.aliyun.com
>/dev/null 2>&1

(6) 安裝 Nagios 伺服器端所需軟體包(LAMP 環境)

[[email protected] ~]# yum install gcc glibc glibc-common -y
[[email protected] ~]# yum install gd gd-devel -y
[[email protected] ~]# yum install httpd php php-gd -y
[[email protected] ~]# rpm -qa httpd php
httpd-2.4.6-80.el7.centos.1.x86_64
php-5.4.16-45.el7.x86_64

MySQL 安裝參見【Web 叢集實戰】12_LNMP 之 MySQL 的安裝與配置

(7)建立 Nagios 伺服器端需要的使用者及組

[[email protected] ~]# /usr/sbin/useradd nagios
[[email protected] ~]# /usr/sbin/useradd apache -M -s /sbin/nologin
useradd: user 'apache' already exists
[[email protected] ~]# /usr/sbin/groupadd nagcmd
[[email protected] ~]# /usr/sbin/usermod -a -G nagcmd nagios
[[email protected] ~]# /usr/sbin/usermod -a -G nagcmd apache
[[email protected] ~]# id -n -G nagios
nagios nagcmd
[[email protected] ~]# id -n -G apache
apache nagcmd
[[email protected] ~]# groups nagios
nagios : nagios nagcmd
[[email protected] ~]# groups apache
apache : apache nagcmd

(8)下載所需軟體包

[[email protected] ~]# cd /home/ylt/tools/
[[email protected] tools]# mkdir nagios -p
[[email protected] tools]# cd nagios/

[[email protected] nagios]# wget https://sourceforge.net/projects/nagios/files/nagios-3.x/nagios-3.5.1/nagios-3.5.1.tar.gz/download
[[email protected] nagios]# ll
total 1724
-rw-r--r-- 1 root root 1763584 Aug 31  2013 nagios-3.5.1.tar.gz

[[email protected] nagios]# wget https://nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz#_ga=2.27512634.762344303.1539496511-137884230.1539496511	
[[email protected] nagios]# ll nagios-plugins-2.2.1.tar.gz
-rw-r--r-- 1 root root 2728818 Apr 20  2017 nagios-plugins-2.2.1.tar.gz

[[email protected] nagios]# wget https://sourceforge.net/projects/nagios/files/nrpe-2.x/nrpe-2.12/nrpe-2.12.tar.gz/download
[[email protected] nagios]# ll nrpe-2.12.tar.gz
-rw-r--r-- 1 root root 405725 Mar 11  2008 nrpe-2.12.tar.gz

(9)啟動 LAMP 環境的 HTTP 服務

[[email protected] tools]# systemctl start httpd
[[email protected] tools]# lsof -i:80
COMMAND  PID   USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
httpd   1352   root    4u  IPv6  21968      0t0  TCP *:http (LISTEN)
httpd   1353 apache    4u  IPv6  21968      0t0  TCP *:http (LISTEN)
httpd   1354 apache    4u  IPv6  21968      0t0  TCP *:http (LISTEN)
httpd   1355 apache    4u  IPv6  21968      0t0  TCP *:http (LISTEN)
httpd   1356 apache    4u  IPv6  21968      0t0  TCP *:http (LISTEN)
httpd   1357 apache    4u  IPv6  21968      0t0  TCP *:http (LISTEN)

2.2 安裝 Nagios 伺服器端

[[email protected] nagios]# tar xf nagios-3.5.1.tar.gz
[[email protected] nagios]# ll
total 1728
drwxrwxr-x 15 root root    4096 Aug 31  2013 nagios
-rw-r--r--  1 root root 1763584 Aug 31  2013 nagios-3.5.1.tar.gz
[[email protected] nagios]# cd nagios/
[[email protected] nagios]# ./configure --with-command-group=nagcmd
Review the options above for accuracy.  If they look okay,
type 'make all' to compile the main program and CGIs.

[[email protected] nagios]# make all

Enjoy.

[[email protected] nagios]# make install
  make install-init
     - This installs the init script in /etc/rc.d/init.d

  make install-commandmode
     - This installs and configures permissions on the
       directory for holding the external command file

  make install-config
     - This installs sample config files in /usr/local/nagios/etc

make[1]: Leaving directory `/home/ylt/tools/nagios/nagios'

[[email protected] nagios]# make install-init

*** Init script installed ***

[[email protected] nagios]# make install-commandmode

*** External command directory configured ***

[[email protected] nagios]# make install-config

*** Config files installed ***

(1) 安裝 Nagios Web 配置檔案及建立登入使用者

  • 安裝 Nagios Web 配置檔案
[[email protected] nagios]# make install-webconf

*** Nagios/Apache conf file installed ***
  • 建立登入使用者
[[email protected] nagios]# cd ..
[[email protected] nagios]# htpasswd -bc /usr/local/nagios/etc/htpasswd.users nagios nagios
Adding password for user ylt
[[email protected] nagios]# cat /usr/local/nagios/etc/htpasswd.users
nagios:$apr1$l7AGreUZ$LUP7tkFCcLoJ21cACkOvU/
  • 重新載入 Apache 服務
[[email protected] nagios]# systemctl reload httpd

(2)新增監控報警資訊接收的 Email 地址

[[email protected] nagios]# sed -i 's#[email protected]#[email protected]#g' /usr/local/nagios/etc/objects/contacts.cfg
[[email protected] nagios]# sed -n '35p' /usr/local/nagios/etc/objects/contacts.cfg
        email                           [email protected]       ; 
  • 使用第三方郵件服務商提供的服務傳送郵件
[[email protected] nagios]# tail -2 /etc/mail.rc
set [email protected]
smtp=smtp.qq.com smtp-auth-user=1622320046 smtp-auth-password=password smtp-auto=login

(3)解決 Web 端使用者 nagios 沒有被許可檢視服務資源的問題,將 nagiosadmin 改成 nagios

[[email protected] etc]# cat cgi.cfg|grep ^authorized_for
authorized_for_system_information=nagios
authorized_for_configuration_information=nagios
authorized_for_system_commands=nagios
authorized_for_all_services=nagios
authorized_for_all_hosts=nagios
authorized_for_all_service_commands=nagios
authorized_for_all_host_commands=nagios

(4)配置啟動 Apache 服務

[[email protected] nagios]# systemctl enable httpd
Created symlink from /etc/systemd/system/multi-user.target.wants/httpd.service to /usr/lib/systemd/system/httpd.service.
[[email protected] nagios]# systemctl restart httpd
[[email protected] nagios]# netstat -lntup|grep httpd
tcp6       0      0 :::80                   :::*                    LISTEN      1932/httpd
  • 客戶端計算機上的瀏覽器訪問

使用者名稱和密碼提示視窗

(4)安裝 Nagios 外掛軟體包

  • 首先安裝基礎依賴包
[[email protected] nagios]# yum install perl-devel openssl-devel -y
  • 然後安裝 Nagios plugins 外掛包
[[email protected] nagios]# tar xf nagios-plugins-2.2.1.tar.gz
[[email protected] nagios]# cd nagios-plugins-2.2.1/
[[email protected] nagios-plugins-2.2.1]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules --with-mysql
[[email protected] nagios-plugins-2.2.1]# make
[[email protected] nagios-plugins-2.2.1]# make install
  • 檢查外掛個數
[[email protected] nagios-plugins-2.2.1]# ls /usr/local/nagios/libexec/|wc -l
62

(5)安裝 nrpe 軟體

[[email protected] nagios-plugins-2.2.1]# ls /usr/local/nagios/libexec/check_nrpe
ls: cannot access /usr/local/nagios/libexec/check_nrpe: No such file or directory
[[email protected] nagios-plugins-2.2.1]# cd ../
[[email protected] nagios]# tar xf nrpe-2.12.tar.gz
[[email protected] nagios]# cd nrpe-2.12/
[[email protected] nrpe-2.12]# ./configure
[[email protected] nrpe-2.12]# make all
[[email protected] nrpe-2.12]# make install-plugin
[[email protected] nrpe-2.12]# make install-daemon
[[email protected] nrpe-2.12]# make install-daemon-config
  • 檢查 check_nrpe 外掛
[[email protected] nagios]# ls /usr/local/nagios/libexec/check_nrpe
/usr/local/nagios/libexec/check_nrpe
[[email protected] nagios]# ls /usr/local/nagios/libexec/|wc -l
63

(6)配置並啟動 Nagios 服務

  • 新增 Nagios 服務到開機自啟動
[[email protected] nagios]# /sbin/chkconfig nagios on
[[email protected] nagios]# chkconfig --list nagios
nagios          0:off   1:off   2:on    3:on    4:on    5:on    6:off

[[email protected] ~]# echo "/etc/init.d/nagios start" >>/etc/rc.local
[[email protected] ~]# tail -1 /etc/rc.local
/etc/init.d/nagios start
  • 驗證 Nagios 配置檔案
[[email protected] ~]# /etc/init.d/nagios checkconfig
Running configuration check... OK.
[[email protected] ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
  • 修改 /etc/init.d/nagios 實現上述命令列檢查語法的詳細輸出
[[email protected] ~]# grep 'checkconfig)' -n -A 2 /etc/init.d/nagios
181:    checkconfig)
182-            printf "Running configuration check..."
183-            $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1;
# 刪除指令碼中的 > /dev/null 2>&1
[[email protected] ~]# vim /etc/init.d/nagios
[[email protected] ~]# grep 'checkconfig)' -n -A 2 /etc/init.d/nagios
181:    checkconfig)
182-            printf "Running configuration check..."
183-            $NagiosBin -v $NagiosCfgFile;
  • 檢查語法
[[email protected] ~]# /etc/init.d/nagios checkconfig
Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
 OK.
  • 啟動 Nagios 服務
[[email protected] ~]# /etc/init.d/nagios restart
Restarting nagios (via systemctl):  Warning: nagios.service changed on disk. Run 'systemctl daemon-reload' to reload units.
                                                           [  OK  ]
[[email protected] ~]# systemctl daemon-reload

[[email protected] ~]# /etc/init.d/nagios restart
Restarting nagios (via systemctl):                         [  OK  ]
[[email protected] ~]# ps -ef|grep nagios|grep -v grep
nagios     1408      1  0 15:54 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
[[email protected] ~]# netstat -lntup|grep nagios
# 無輸出

3. Nagios 客戶端安裝

3.1 Nagios 客戶端安裝準備

(1)準備 2 臺伺服器或 VM 虛擬機器

HOSTNAME IP 說明
web001 192.168.2.152 被監控的客戶端伺服器
web002 192.168.2.144 被監控的客戶端伺服器

(2)環境準備和伺服器端步驟相同

3.2 在 Nagios 客戶端安裝軟體

(1)下載所需軟體包

[[email protected] ~]# yum install gcc glibc-common -y

[[email protected] ~]# mkdir /home/ylt/tools/nagios
[[email protected] ~]# cd /home/ylt/tools/nagios
[[email protected] nagios]# wget https://sourceforge.net/projects/nagios/files/nagios-3.x/nagios-3.5.1/nagios-3.5.1.tar.gz/download
[[email protected] nagios]# wget https://nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz#_ga=2.27512634.762344303.1539496511-137884230.1539496511
[[email protected] nagios]# wget https://sourceforge.net/projects/nagios/files/nrpe-2.x/nrpe-2.12/nrpe-2.12.tar.gz/download

[[email protected] nagios]# ll
total 4792
-rw-r--r-- 1 root root 1763584 Aug 31  2013 nagios-3.5.1.tar.gz
-rw-r--r-- 1 root root 2728818 Apr 20  2017 nagios-plugins-2.2.1.tar.gz
-rw-r--r-- 1 root root  405725 Mar 11  2008 nrpe-2.12.tar.gz

(2) 新增 nagios 使用者

[[email protected] nagios]# /usr/sbin/useradd nagios -M -s /sbin/nologin
[[email protected] nagios]# id nagios
uid=1003(nagios) gid=1003(nagios) groups=1003(nagios)

(3)安裝 nagios-plugins 外掛

[[email protected] nagios]# yum install perl-devel perl-CPAN openssl-devel -y

[email protected] nagios]# tar xf nagios-plugins-2.2.1.tar.gz
[[email protected] nagios]# cd nagios-plugins-2.2.1/
[[email protected] nagios-plugins-2.2.1]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-perl-modules --with-mysql
[[email protected] nagios-plugins-2.2.1]# make
[[email protected] nagios-plugins-2.2.1]# make install
[[email protected] nagios-plugins-2.2.1]# cd ../

[[email protected] nagios]# ls /usr/local/nagios/libexec/|wc -l
62

(4)安裝 Nagios 客戶端 nrpe 軟體

[[email protected] nagios]# tar xf nrpe-2.12.tar.gz
[[email protected] nagios]# cd nrpe-2.12/
[[email protected] nrpe-2.12]# ./configure
[[email protected] nrpe-2.12]# make all
[[email protected] nrpe-2.12]# make install-plugin
[[email protected] nrpe-2.12]# make install-daemon
[[email protected] nrpe-2.12]# make install-daemon-config

(5)配置監控記憶體、磁碟 I/O 指令碼外掛

[[email protected] nagios]# wget https://github.com/yanglt7/picture/blob/master/check_iostat
[[email protected] nagios]# wget https://github.com/yanglt7/picture/blob/master/check_memory.pl
[[email protected] nagios]# yum install dos2unix -y
[[email protected] nagios]# /bin/cp /home/ylt/tools/nagios/check_memory.pl /usr/local/nagios/libexec/
[[email protected] nagios]# /bin/cp /home/ylt/tools/nagios/check_iostat /usr/local/nagios/libexec/
[[email protected] nagios]# chmod 755 /usr/local/nagios/libexec/check_memory.pl
[[email protected] nagios]# chmod 755 /usr/local/nagios/libexec/check_iostat
[[email protected] nagios]# dos2unix /usr/local/nagios/libexec/check_memory.pl
dos2unix: converting file /usr/local/nagios/libexec/check_memory.pl to Unix format ...
[[email protected] nagios]# dos2unix /usr/local/nagios/libexec/check_iostat
dos2unix: converting file /usr/local/nagios/libexec/check_iostat to Unix format ...
[[email protected] nagios]# chmod a+x /usr/local/nagios/libexec/check_iostat
[[email protected] nagios]# chmod a+x /usr/local/nagios/libexec/check_memory.pl

3.3 配置 Nagios 客戶端 nrpe 服務

  • 配置監控當前 client 的 nagios server 端的 IP
[[email protected] nagios]# cd /usr/local/nagios/etc/
[[email protected] etc]# sed -n '79p' nrpe.cfg
allowed_hosts=127.0.0.1
[[email protected] etc]# sed -i 's#allowed_hosts=127.0.0.1#allowed_hosts=127.0.0.1,192.168.2.151#g' nrpe.cfg
[[email protected] etc]# sed -n '79p' nrpe.cfg
allowed_hosts=127.0.0.1,192.168.2.151
  • 將 199~203行內容註釋掉,新增要監控的內容
[[email protected] etc]# vim nrpe.cfg
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 10 -c 3
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 15% -c 7% -p /
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
command[check_iostat]=/usr/local/nagios/libexec/check_iostat -s sda -w 30,200,20 -c 50,250,50
  • 啟動 nagios client nrpe 守護程序
[[email protected] etc]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
  • 檢查啟動結果
[[email protected] etc]# netstat -lntup|grep nrpe
tcp        0      0 0.0.0.0:5666            0.0.0.0:*               LISTEN      3063/nrpe
[[email protected] etc]# ps -ef|grep nrpe|grep -v grep
nagios     3152      1  0 19:18 ?        00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
  • 將 nrpe 加入開機自啟
[[email protected] etc]# echo "#nagios nrpe process cmd by ylt at 20181014" >>/etc/rc.local
[[email protected] etc]# echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d" >>/etc/rc.local
[[email protected] etc]# tail -2 /etc/rc.local
#nagios nrpe process cmd by ylt at 20181014
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

4. Nagios 伺服器端監控

4.1 Nagios 伺服器端監控基礎介紹

(1)nagios 伺服器端核心配置檔案

  • nagios 主配置檔案為 nagios.cfg,預設在 /usr/local/nagios/etc 目錄下,/usr/local/nagios/etc 目錄下有個 objects (類似 Nginx 的 extra 目錄),裡面放的是主配置檔案 nagios.cfg 包含的其他 Nagios 配置檔案。
[[email protected] ~]# cd /usr/local/nagios/etc
[[email protected] etc]# tree
.
|-- cgi.cfg
|-- htpasswd.users
|-- nagios.cfg
|-- nrpe.cfg
|-- objects
|   |-- commands.cfg
|   |-- contacts.cfg
|   |-- localhost.cfg
|   |-- printer.cfg
|   |-- switch.cfg
|   |-- templates.cfg
|   |-- timeperiods.cfg
|   `-- windows.cfg
`-- resource.cfg

1 directory, 13 files

(2)配置主配置檔案 nagios.cfg

  • 在 nagios.cfg 找到 cfg_file,增加如下主機和服務的配置檔案
[[email protected] etc]# vim nagios.cfg
  34 cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
  35 cfg_file=/usr/local/nagios/etc/objects/services.cfg
  36 cfg_dir=/usr/local/nagios/etc/objects/services
  • 註釋掉如下行
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
  • 儲存後根據已有資料生成 hosts.cfg 主機檔案
[[email protected] etc]# cd objects/
[[email protected] objects]# head -51 localhost.cfg >hosts.cfg
[[email protected] objects]# chown nagios.nagios /usr/local/nagios/etc/objects/hosts.cfg
  • 生成新的空的 service.cfg 檔案
[[email protected] objects]# touch services.cfg
[[email protected] objects]# chown nagios.nagios services.cfg
  • 生成服務的配置檔案目錄,所有放到此目錄下的配置(*.cfg)都會自動被包含到主配置檔案中生效
[[email protected] objects]# mkdir services
[[email protected] objects]# chown -R nagios.nagios services
  • 檢查生成結果
[[email protected] objects]# ls -lrt
total 56
-rw-rw-r-- 1 nagios nagios 10812 Oct 14 14:21 templates.cfg
-rw-rw-r-- 1 nagios nagios  7716 Oct 14 14:21 commands.cfg
-rw-rw-r-- 1 nagios nagios  3208 Oct 14 14:21 timeperiods.cfg
-rw-rw-r-- 1 nagios nagios  5403 Oct 14 14:21 localhost.cfg
-rw-rw-r-- 1 nagios nagios  4019 Oct 14 14:21 windows.cfg
-rw-rw-r-- 1 nagios nagios  3124 Oct 14 14:21 printer.cfg
-rw-rw-r-- 1 nagios nagios  3293 Oct 14 14:21 switch.cfg
-rw-rw-r-- 1 nagios nagios  2165 Oct 14 14:37 contacts.cfg
-rw-r--r-- 1 nagios nagios  1870 Oct 14 19:35 hosts.cfg
-rw-r--r-- 1 nagios nagios     0 Oct 14 19:36 services.cfg
drwxr-xr-x 2 nagios nagios  4096 Oct 14 19:37 services

4.2 配置 Nagios 伺服器端監控項

(1)配置 hosts.cfg,定義要監控的 Nagios 客戶端主機

[[email protected] objects]# cat hosts.cfg
#
# HOST DEFINITION
#

# Define a host for the local machine

define host{
        use                     linux-server            ; Name of host template to use
                                                        ; This host definition will inherit all variables that are defined
                                                        ; in (or inherited by) the linux-server host template definition.
        host_name               web001
        alias                   web001
        address                 192.168.2.152
        }
define host{
        use                     linux-server            ; Name of host template to use
                                                          ; This host definition will inherit all variables that are defined
                                                          ; in (or inherited by) the linux-server host template definition.
        host_name               web002
        alias                   web002
        address                 192.168.2.144
        }
#
# HOST GROUP DEFINITION
#

# Define an optional hostgroup for Linux machines

define hostgroup{
        hostgroup_name  linux-servers ; The name of the hostgroup
        alias           Linux Servers ; Long name of the group
        members         web001,web002     ; Comma separated list of hosts that belong to this group
        }

(2)配置 services.cfg,定義要監控的資源服務

define service {
        use                     generic-service
        host_name               web001,web002
        service_description     Disk Partition
        check_command                   check_nrpe!check_disk
}
define service {
        use                     generic-service
        host_name               web001,web002
        service_description     Swap Useage
        check_command                   check_nrpe!check_swap
}
define service {
        use                     generic-service
        host_name               web001,web002
        service_description     MEM Useage
        check_command                   check_nrpe!check_mem
}
define service {
        use                     generic-service
        host_name               web001,web002
        service_description     Current Load
        check_command                   check_nrpe!check_load
}
define service {
        use                     generic-service
        host_name               web001,web002
        service_description     Disk Iostat
        check_command                   check_nrpe!check_iostat!5!11
}
define service {
        use                     generic-service
        host_name               web001,web002
        service_description     PING
        check_command                   check_ping!100.0,20%!500.0,60%
}

(3)配置 command.cfg,加入 check_nrpe 的外掛配置

[[email protected] objects]# tail -5 commands.cfg
# 'check_nrpe' command definition
define command{
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

(4)檢查語法

[[email protected] objects]# /etc/init.d/nagios checkconfig
Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
 OK.
  • 監控成果

監控主機針對本地各系統狀態監控的成果

(5) 新增 http 服務的 URL 地址及埠監控

  • 從 Nagios 伺服器端發起的監控,如 URL 地址、埠監控等。此類服務一般都開啟了對外提供服務的業務。這樣的業務,一般採用主動監控的方式。
  • URL 監控的實質是通過命令列理解 HTTP 的監控原理
[[email protected] ~]# /usr/local/nagios/libexec/check_http -H 192.168.2.152
HTTP OK: HTTP/1.1 200 OK - 258 bytes in 0.001 second response time |time=0.000986s;;;0.000000 size=258B;;;0
  • 下面是對域名 URL 地址 進行監控的配置

  • 編輯 services.cfg 檔案

[[email protected] objects]# sed -n '37,49p' services.cfg
#url examples http://blog.yangyangyang.org
define service {
        use                     generic-service
        host_name               web001
        service_description     blog_url
        check_command                   check_weburl!-H blog.yangyangyang.org
        }
define service {
        use                                     generic-service
        host_name                               web001
        service_description                     blog_url1
        check_command                   check_weburl!-H blog.yangyangyang.org -u /ylt.html
        }
# -u 後加域名後面的地址,即檢查真正的 URL 地址 http://blog.yangyangyang.org/ylt.html
  • 編輯 commands.cfg,增加 check_weburl 外掛配置
[[email protected] objects]# sed -n '144,154p' commands.cfg
# 'check_http' command definition
define command{
        command_name    check_http
        command_line    $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
        }

# 'check_weburl' command definition
define command{
        command_name    check_weburl
        command_line    $USER1$/check_http $ARG1$ -w 10 -c 30
        }

(6)配置好 URL 後檢查 Nagios 語法

  • 測試域名 URL 監控,需要修改 Nagios 伺服器端的 hosts 檔案
192.168.2.148 blog.yangyangyang.org
  • 如果被監控端沒有新建域名下的 HTML 檔案,將產生 403 或 404 錯誤,需:
[[email protected] ~]# touch /var/www/html/index.html
[[email protected] ~]# touch /var/www/html/ylt.html
  • 此時檢查語法
[[email protected] objects]# /etc/init.d/nagios checkconfig
  • 使其生效
[[email protected] ~]# /etc/init.d/nagios reload
[[email protected] objects]# /usr/local/nagios/libexec/check_http -H blog.yangyangyang.org
HTTP OK: HTTP/1.1 200 OK - 258 bytes in 0.001 second response time |time=0.000952s;;;0.000000 size=258B;;;0

[[email protected] objects]# /usr/local/nagios/libexec/check_http -H blog.yangyangyang.org -u /ylt.html
HTTP OK: HTTP/1.1 200 OK - 258 bytes in 0.001 second response time |time=0.001255s;;;0.000000 size=258B;;;0

(7)監控任意埠例項

[[email protected] ~]# /usr/local/nagios/libexec/check_tcp -H 192.168.2.152 -p 80
TCP OK - 0.000 second response time on 192.168.2.152 port 80|time=0.000350s;;;0.000000;10.000000
[[email protected] objects]# sed -n '50,60p' services.cfg
define service {
        use                                     generic-service
        host_name                               web001
        service_description                     ssh_52017
        check_command                   check_tcp!52017
        }
define service {
        use                                     generic-service
        host_name                               web001
        service_description                     http_80
        check_command                   check_tcp!80
[[email protected] objects]# tail -4 commands.cfg
define command{
        command_name    check_memcached_11211
        command_line    $USER1$/check_tcp -H $HOSTADDRESS$ -p 11211 -t 5 -E -s 'stats\r\nquit\r\n' -e 'uptime' -M crit
        }

[[email protected] objects]# tail -6 services.cfg
define service {
        use                                     generic-service
        host_name                               web001
        service_description                     Memcached_11211
        check_command                   check_memcached_11211
        }

(8)監控 Memcached 服務


  • 最終監控成果

監控成果

4.3 Nagios 的除錯

(1)檢查 Nagios 語法並優化配置 Nagios 啟動指令碼,見 2.2 安裝 Nagios 伺服器端(6)配置並啟動 Nagios 服務
(2)通過日誌排查問題

[[email protected] ~]# tail /usr/local/nagios/var/nagios.log
[1539792000] CURRENT SERVICE STATE: web002;Swap Useage;CRITICAL;HARD;3;Connection refused or timed out
[1539837678] Warning: A system time change of 0d 12h 41m 9s (forwards in time) has been detected.  Compensating...
[1539837700] HOST NOTIFICATION: nagiosadmin;web002;DOWN;notify-host-by-email;CRITICAL - Host Unreachable (192.168.2.144)
[1539838070] SERVICE ALERT: web001;Disk Iostat;CRITICAL;SOFT;1;CHECK_NRPE: Error - Could not complete SSL handshake.
[1539838120] SERVICE ALERT: web001;Current Load;CRITICAL;SOFT;1;CHECK_NRPE: Error - Could not complete SSL handshake.
[1539838190] SERVICE ALERT: web001;Disk Iostat;CRITICAL;SOFT;2;CHECK_NRPE: Error - Could not complete SSL handshake.
[1539838210] SERVICE ALERT: web001;Disk Partition;CRITICAL;SOFT;1;CHECK_NRPE: Error - Could not complete SSL handshake.
[1539838220] SERVICE ALERT: web001;MEM Useage;CRITICAL;SOFT;1;CHECK_NRPE: Error - Could not complete SSL handshake.
[1539838240] SERVICE ALERT: web001;Current Load;CRITICAL;SOFT;2;CHECK_NRPE: Error - Could not complete SSL handshake.
[1539838250] SERVICE ALERT: web001;Swap Useage;CRITICAL;SOFT;1;CHECK_NRPE: Error - Could not complete SSL handshake.

[[email protected] ~]# tail /var/log/messages
Oct 18 12:50:20 nagios-server nagios: SERVICE ALERT: web001;MEM Useage;CRITICAL;SOFT;1;CHECK_NRPE: Error - Could not complete SSL handshake.
Oct 18 12:50:35 nagios-server systemd: Started Session 89 of user ylt.
Oct 18 12:50:35 nagios-server systemd-logind: New session 89 of user ylt.
Oct 18 12:50:35 nagios-server systemd: Starting Session 89 of user ylt.
Oct 18 12:50:35 nagios-server dbus[646]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Oct 18 12:50:35 nagios-server dbus[646]: [system] Successfully activated service 'org.freedesktop.problems'
Oct 18 12:50:40 nagios-server nagios: SERVICE ALERT: web001;Current Load;CRITICAL;SOFT;2;CHECK_NRPE: Error - Could not complete SSL handshake.
Oct 18 12:50:50 nagios-server nagios: SERVICE ALERT: web001;Swap Useage;CRITICAL;SOFT;1;CHECK_NRPE: Error - Could not complete SSL handshake.
Oct 18 12:51:10 nagios-server systemd-logind: Removed session 83.
Oct 18 12:51:15 nagios-server su: (to root) ylt on pts/1

5. 伺服器端 Nagios 圖形監控顯示和管理

5.1 伺服器端安裝 PNP 生成圖形

(1)PNP 出圖基礎依賴軟體安裝

[[email protected] ~]# yum install cairo pango zlib zlib-level freetype freetype-devel gd gd-devel -y
[[email protected] ~]# rpm -qa cairo pango zlib zlib-level freetype freetype-devel gd gd-devel
zlib-1.2.7-17.el7.x86_64
gd-devel-2.0.35-26.el7.x86_64
gd-2.0.35-26.el7.x86_64
freetype-2.4.11-15.el7.x86_64
freetype-devel-2.4.11-15.el7.x86_64
pango-1.40.4-1.el7.x86_64
cairo-1.14.8-2.el7.x86_64
[[email protected] ~]# yum install libart_lgpl libart_lgpl-devel -y
[[email protected] ~]# rpm -qa libart_lgpl libart_lgpl-devel
libart_lgpl-2.3.21-10.el7.x86_64
libart_lgpl-devel-2.3.21-10.el7.x86_64
[[email protected] ~]# yum install rrdtool rrdtool-devel -y
[[email protected] ~]# rpm -qa rrdtool rrdtool-devel
rrdtool-1.4.8-9.el7.x86_64
rrdtool-devel-1.4.8-9.el7.x86_64

[[email protected] ~]# which rrdtool
/bin/rrdtool

(2)安裝出圖 Web 介面展示軟體 PNP

[[email protected] ~]# cd /home/ylt/tools/

[[email protected] tools]# wget https://sourceforge.net/projects/pnp4nagios/files/PNP/pnp-0.4.14/pnp-0.4.14.tar.gz/download
[[email protected] tools]$ yum install perl-Time-HiRes -y
[[email protected] tools]# tar zxf pnp-0.4.14.tar.gz
[[email protected] pnp-0.4.14]# ./configure --with-rrdtool --with-perfdata-dir=/usr/local/nagios/share/perfdata
[[email protected] pnp-0.4.14]# make all
[[email protected] pnp-0.4.14]# make install
[[email protected] pnp-0.4.14]# make install-config
[[email protected] pnp-0.4.14]# make install-init

[[email protected] pnp-0.4.14]# ll /usr/local/nagios/libexec/ |grep process
-rwxr-xr-x 1 nagios nagios  31804 Oct 18 18:50 process_perfdata.pl

(3)Nagios 出圖相關配置

  • vim nagios.cfg +834,將引數對應的值從 0 改成 1
[[email protected] etc]# sed -n '834p' nagios.cfg
process_performance_data=1
  • 繼續修改,在 846 行開始,取消兩項引數開頭的註釋
[[email protected] etc]# sed -n '846,847p' nagios.cfg
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
  • 修改 commands.cfg,定義出圖獲取資料的命令
[[email protected] etc]# sed -n '234,245p' objects/commands.cfg
# 'process-host-perfdata' command definition
define command{
        command_name    process-host-perfdata
        command_line    /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out
        }


# 'process-service-perfdata' command definition
define command{
        command_name    process-service-perfdata
        command_line    /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
        }
  • 檢查語法,使配置生效
[[email protected] etc]# /etc/init.d/nagios checkconfig
[[email protected] etc]# /etc/init.d/nagios reload

5.2 配置主機及服務獲取狀態資料出圖

(1)設定讓被監控的主機記錄資料

[[email protected] ~]# cd /usr/local/nagios/etc/objects/

[[email protected] objects]# sed -n '23,42p' hosts.cfg
# Define a host for the local machine

define host{
        use                     linux-server            ; Name of host template to use
                                                        ; This host definition will inherit all variables that are defined
                                                        ; in (or inherited by) the linux-server host template definition.
        host_name               web001
        alias                   web001
        address                 192.168.2.152
        process_perf_data       1 #<==此行表示將記錄 web001 主機的狀態資料
        }
define host{
        use                     linux-server            ; Name of host template to use
                                                          ; This host definition will inherit all variables that are defined
                                                          ; in (or inherited by) the linux-server host template definition.
        host_name               web002
        alias                   web002
        address                 192.168.2.144
        process_perf_data       1
        }

(2)設定讓被監控主機對應的服務記錄資料

[[email protected] objects]# head -7 se