1. 程式人生 > >nagios 服務端與客戶端監控安裝與詳細配置,各配置文件詳解

nagios 服務端與客戶端監控安裝與詳細配置,各配置文件詳解

this sql 引入 apache2 cpu load fine 宕機 pri require


nagios 安裝與部署——————

1、安裝前準備

(1)創建nagios用戶和用戶組
[root@localhost ~]#groupadd nagios
useradd nagios
useradd -G nagios nagios
usermod -G nagios apache

[root@localhost ~]#mkdir /usr/local/nagios
[root@localhost ~]#chown -R nagios.nagios /usr/local/nagios
(2)開啟系統sendmail服務
在nagios監控服務器上開啟sendmail服務的主要作用是讓nagios在檢測到故障時可以發送報警郵件,
目前幾乎所有的linux發行版本都默認自帶了sendmail服務,所以,在安裝系統時只需開啟sendmail服務即可,
並且不需要在sendmail上做任何配置。

(3) 安裝Apache+php

yum install httpd httpd-devel php php-mysql php-common php-gd php-mbstring php-mcrypt php-devel php-xml gcc glibc glibc-common gd gd-devel openssl openssl-devel


-----------------------------------------------------------------

2、 編譯安裝Nagios
nagios下載:wget http://sourceforge.net/projects/nagios/?source=directory
[root@localhost ~]# tar -zxvf nagios-3.2.0.tar.gz
[root@localhost ~]# cd nagios-3.2.0
[root@localhost nagios-3.2.0]#./configure --prefix=/usr/local/nagios --with-command-group=nagios
#指定nagios的安裝目錄,這裏指定nagios安裝到/usr/local/nagios目錄
[root@localhost nagios-3.2.0]#make all
[root@localhost nagios-3.2.0]#make install
# make install用來安裝nagios的主程序,CGI和HTML文件
[root@localhost nagios-3.2.0]# make install-init
#通過make install-init命令可以在/etc/rc.d/init.d目錄下創建nagios啟動腳本
[root@localhost nagios-3.2.0]# make install-commandmode
#通過make install-commandmode命令來配置目錄權限
[root@localhost nagios-3.2.0]# make install-config
#make install-cofig命令用來安裝nagios示例配置文件,這裏安裝的路徑是/usr/local/nagios/etc
[root@localhost nagios-3.2.0]#make install-webconf
--------------------------------------------------------------------

3、Nagios目錄介紹
Nagios安裝完成後,各個目錄結構以及功能說明如下表所示:
## bin--nagios 可執行程序所在目錄
## etc--nagios 配置文件所在目錄
## sbin--nagios cgi 文件所在目錄,也就是執行外部命令所需文件所在的目錄
## share--nagios 網頁文件所在的目錄
## libexec--nagios 外部插件所在的目錄
## var--nagios 日誌文件、lock等文件所在的目錄
## var/archives-- nagios日誌自動歸檔目錄
## var/rw--用來存放外部命令文件的目錄

-----------------------------------------------------------------------

4、 安裝Nagios插件
這裏下載的版本是nagios-plugins-1.4.14。
註意:插件版本與nagios版本的關聯並不大。
[root@localhost nagios]#tar -zxvf nagios-plugins-1.4.14.tar.gz
[root@localhost nagios]#cd nagios-plugins-1.4.14
[root@localhost nagios-plugins-1.4.14]#./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios
[root@localhost nagios-plugins-1.4.14]# make
[root@localhost nagios-plugins-1.4.14]# make install
安裝完成,在/usr/local/nagios下的libexec目錄下,生成很多可執行文件,
這些正是nagios所需要的插件。

————————————————————————————————————————————————————
--------------------------------------------------------------------------
—|———————(建議不使用中文漢化版)——————
|
此|5、安裝Nagios中文化插件
| 中文插件下載地址:
段| http://sourceforge.net/projects/nagios-cn/files/
|下載對應nagios版本的中文插件,然後開始安裝:
內|[root@localhost ~]#tar xvfj nagios-cn-3.2.0.tar.bz2
|[root@localhost nagios-cn-3.2.0]#cd nagios-cn-3.2.0
容|[root@localhost nagios-cn-3.2.0]#./configure
|[root@localhost nagios-cn-3.2.0]#make all
僅|[root@localhost nagios-cn-3.2.0]#make install
|
供|-------------------------------------------------------------------
|
參|6、安裝與配置apache和php
| apache和php不是安裝nagios所必須的,但是nagios提供了web監控界面,
考| 通過web監控界面可以清晰的看到被監控主機、資源的運行狀態,因此,安裝
| 一個web服務是很必要的。
。| 需要註意的是,nagios在nagios3.1.x版本以後,配置web監控界面時需要
| php的支持。這裏我們下載的nagios版本為nagios-3.2.0,因此在編譯安裝完
不| 成apache後,還需要編譯php模塊,這裏選取的php版本為php5.3.2。
|
是| (1)安裝apache與php
| 首先安裝apache,步驟如下:
配| [root@nagiosserver ~]# tar zxvf httpd-2.0.63.tar.gz
| [root@nagiosserver ~]#cd httpd-2.0.63
置| [root@nagiosserver ~]#./configure --prefix=/usr/local/apache2
| [root@nagiosserver ~]#make
必| [root@nagiosserver ~]#make install
| 接著安裝php,步驟如下:
需| [root@nagiosserver ~]# tar zxvf php-5.3.2.tar.gz
| [root@nagiosserver ~]#cd php-5.3.2
項| [root@nagiosserver ~]#./configure --prefix=/usr/local/php --with- | apxs2=/usr/local/apache2/bin/apxs
| [root@nagiosserver ~]#make
| [root@nagiosserver ~]#make install
| 從安裝步驟可知,apache安裝路徑為/usr/local/apache2,而php安裝路徑為/usr/local/php。
|(2)配置apache
| 找到apache配置文件/usr/local/apache2/conf/httpd.conf
| 找到:
| User nobody
| Group #-1
| 修改為
| User nagios
| Group nagios
| 然後找到
| DirectoryIndex index.html index.html.var
| 修改為
| DirectoryIndex index.html index.php
| 接著增加如下內容:
| AddType application/x-httpd-php .php
|
| ######為了安全其間,一般情況下要讓nagios的web監控界面必須經過授權才能訪問,
| 這需要增加驗證配置,即在httpd.conf文件最後添加如下信息:
|
| #setting for nagios
| ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
| <Directory "/usr/local/nagios/sbin">
| AuthType Basic
| Options ExecCGI
| AllowOverride None
| Order allow,deny
| Allow from all
| AuthName "Nagios Access"
| AuthUserFile /usr/local/nagios/etc/htpasswd
| Require valid-user
| </Directory>
|
|
| Alias /nagios "/usr/local/nagios/share"
| <Directory "/usr/local/nagios/share">
| AuthType Basic
| Options None
| AllowOverride None
| Order allow,deny
| Allow from all
| AuthName "nagios Access"
| AuthUserFile /usr/local/nagios/etc/htpasswd
| Require valid-user
| </Directory>
—|#############
————————————————————————————————————————————————————————
————————————————————————————————————————————————————————
(3)創建apache目錄驗證文件
在上面的配置中,指定了目錄驗證文件htpasswd,下面要創建這個文件:
[root@localhost nagios]#/usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password: (輸入密碼)
Re-type new password: (再輸入一次密碼)
Adding password for user nagiosadmin
這樣就在/usr/local/nagios/etc目錄下創建了一個htpasswd驗證文件,當通過http://ip/nagios/
訪問時就需要輸入用戶名和密碼了。
最後,啟動服務:
[root@ nagiosserver ~]#service httpd restart

----------------------------------------------------------------------------------

1、nagios默認配置文件介紹
nagios安裝完畢後,默認的配置文件在/usr/local/nagios/etc目錄下,每個文件或目錄含義
如下表所示:
cgi.cgf--控制cgi訪問的配置文件
nagios.cfg--nagios 主配置文件
resource.cfg--變量定義文件,或者叫資源文件,通過在此文件中定義變量,以便讓其他配置文件引用,如 $USER1$
objetcs--objetcs是一個目錄,在此目錄下有很多配置文件模版,用於定義Nagios對象
objetcs/commands.cfg 命令定義配置文件,裏面定義的命令可以被其他配置文件引用
objetcs/contacts.cfg 定義聯系人和聯系組的配置文件
objetcs/localhost.cfg 定義監控本地主機的配置文件
objetcs/printer.cfg 定義監控打印機的一個配置文件模版,默認沒有啟用此文件
objetcs/switch.cfg 監控路由器的一個配置文件模版,默認沒有啟用此文件
objetcs/templates.cfg 定義主機、服務的一個模版配置文件,可以在其他配置文件中引用
objetcs/timeperiods.cfg 定義nagios 監控時間段的配置文件
objetcs/windows.cfg 控windows 主機的一個配置文件模版,默認沒有啟用此文件

---------------------------------------------------------------------------------

2、配置文件之間的關系
在nagios的配置過程中涉及到的幾個定義有:主機、主機組,服務、服務組,聯系人、
聯系人組,監控時間,監控命令等,從這些定義可以看出,nagios各個配置文件之間是互為關聯,
彼此引用的。
成功配置出一臺nagios監控系統,必須要弄清楚每個配置文件之間依賴與被依賴的關系,
最重要的有四點:
第一:定義監控哪些主機、主機組、服務和服務組
第二:定義這個監控要用什麽命令實現,
第三:定義監控的時間段,
第四:定義主機或服務出現問題時要通知的聯系人和聯系人組。

----------------------------------------------------------------------------------

3、開始配置nagios
為了能更清楚的說明問題,同時也為了維護方便,建議將nagios各個定義對象創建獨立的配置文件:
即為:
|-創建hosts.cfg文件來定義主機和主機組 ——|
|-創建services.cfg文件來定義服務 ——|(這兩個文件也可創建在一起)

用默認的contacts.cfg文件來定義聯系人和聯系人組
用默認的commands.cfg文件來定義命令
用默認的timeperiods.cfg來定義監控時間段
用默認的templates.cfg文件作為資源引用文件
——————————————————————————————————————————
例: templates.cfg文件

nagios主要用於監控主機資源以及服務,在nagios配置中稱為對象,為了不必重復定義
一些監控對象,Nagios引入了一個模板配置文件,將一些共性的屬性定義成模板,以便於
多次引用。這就是templates.cfg的作用。
下面詳細介紹下templates.cfg文件中每個參數的含義:
define contact{
name generic-contact ##聯系人名稱
service_notification_period 24x7 ##當服務出現異常時,發送通知的時間段,這個時間段“7*24”在timeperiods.cfg 文件中定義
host_notification_period 24x7 ##當主機出現異常時,發送通知的時間段。
service_notification_options w,u,c,r,f,s ##定義“通知可以被發出的情況”。w即warn,表示警告。u即unknown表示不明狀態。c即criticle,表示緊急狀態。r即recover,表示恢復狀態。也就是在服務器出現警告、未知、緊急、重新恢復狀態時都發送通知給使用者。
host_notification_options d,u,r,f,s ##定義主機在什麽狀態下需要發送通知給使用者,d即down,表示宕機狀態。u即unreachable,表示不可達狀態。r即recovery,表示重新恢復狀態
service_notification_commands notify-service-by-email ## 服務故障時,發送通知郵件和短信。這裏發送的是郵件。其中 “notiify-service-by-email”在command.cfg文件中定義
host_notification_commands notify-host-by-email ##主機故障時,發送通知的方式,可以是郵箱和短信。
register 0


define host{
name generic-host ##主機名稱,這裏的主機名,並不是直接對應到真正機器的主機名,乃是對應到在主機配置文件所設定的主機名
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_period 24x7 ##指定“發送 通知”的時間段,也就是可以在什麽時候發送通知給使用者
register 0
}

——————————————————————————————————————————————


添加 主控端 nrep 插件
http://jaist.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.13/nrpe-2.13.tar.gz

tar zxvf nrpe-2.13.tar.gz
cd nrpe-2.13
./configure
make all
make install-plugin

vi /usr/local/nagios/etc/objects/commands.cfg 添加

define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

}

檢查配置文件是否正確

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg


#############################################################

添加監控主機:——

被監控端主機上操作:

添加用戶和組:
useradd -s /sbin/nologin nagios

安裝nagios插件:
http://7.down.119g.com:7766/7/52DB48B15572B98C6FCD8AAEC2EF4D2AAD7640D3/nagios-plugins-1.4.16.tar.gz
tar zxvf nagios-plugins-1.4.14.tar.gz
cd nagios-plugins-1.4.14
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios
make && make install

http://jaist.dl.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.13/nrpe-2.13.tar.gz
tar zxvf nrpe-2.13.tar.gz
cd nrpe-2.13
./configure
make all
make install-plugin
make install-daemon
make install-daemon-config
make install-xinetd

創建 nrpe 運行腳本:
vim nrpe.sh
killall -9 nrpe
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
添加權限: chmod 777 nrpe.sh

編輯 nrpe.cgi
vim /usr/local/nagios/etc/nrpe.cfg

將allowed_hosts= 後面加上主控端的ip地址:如:192.168.34.105(多個地址需用逗號“,”隔開,不用添加空格)
server_port=5666 為 nrpe 服務監控端口號

添加監控項:
command[check_users]=/usr/local/nagios/libexec/check_users -w 10 -c 20
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sd1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda6
command[check_sd2]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda3
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_cpu]=/usr/local/nagios/libexec/check_cpu.sh -w 50 -c 80
command[check_mem]=/usr/local/nagios/libexec/check_mem.sh -w 50 -c 80
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_ip_connets]=/usr/local/nagios/libexec/ip_conn.sh 300 50
command[check_server]=/usr/local/nagios/libexec/check_server.sh
command[check_heartbeat]=/usr/local/nagios/libexec/check_heartbeat.sh

command[check_df]=/usr/local/nagios/libexec/check_disk -x /dev/shm -w 15% -c 5%
command[check_megaraid_sas]=/usr/local/nagios/libexec/check_megaraid_sas (監控“RAID信息”需下載check_megaraid_sas腳本具體操作步驟如下)

————————————————————————————————————————————

在需要被監控主機上進行以下操作:
1、 a. 查看服務器類型:dmidecode -s system-product-name
b. 檢查RAID卡信息:dmesg | grep RAID
c. 確認是否已經安裝工具:
[root@localhost ~]# rpm -qa | egrep 'Lib_Utils|MegaCli'
Lib_Utils-1.00-09.noarch
MegaCli-8.02.21-1.noarch
如果還沒有安裝,建議下載安裝使用最新的MegaCli, 這樣就支持更多的SAS硬盤類型的監控。安裝完成後,如何正確安裝,執行MegaCli會有如下提示:
[root@localhost ~]# MegaCli
Fatal error - Command Tool invoked with wrong parameters
Exit Code: 0x01
d. 使用MegaCli查看相關信息(not necessary)
# MegaCli -help (查看命令幫助)
# MegaCli -adpCount (查看適配器個數)
#MegaCli -LdGetNum -aALL (查看邏輯盤個數)
# MegaCli -LdInfo -LALL -aAll (顯示所有邏輯盤信息)

2、腳本安裝

下載check_megaraid_sas腳本,該腳本通過MegaCli命令來獲取監控信息的Nagios插件, 使用perl編寫的。
下載地址: http://www.techno-obscura.com/~delgado/code/check_megaraid_sas
修改該腳本內容:
# vi check_megaraid_sas
a. 查找第35行:
use lib qw(/usr/lib/nagios/plugins /usr/lib64/nagios/plugins); # possible pathes to your Nagios plugins and utils.pm
修改為:
use lib qw(/usr/local/nagios/libexec); # possible pathes to your Nagios plugins and utils.pm
說明:/usr/local/nagios/libexec 為nrpe 在監控端主機上的路徑。

b. 查找第52-53行:
my $megaclibin = '/usr/sbin/MegaCli'; # the full path to your MegaCli binary
my $megacli = "sudo $megaclibin"; # how we actually call MegaCli
修改為:
my $megaclibin = '/usr/sbin/MegaCli'; # the full path to your MegaCli binary
my $megacli = "$megaclibin"; # how we actually call MegaCli
說明:/usr/sbin/MegaCli為MegaCli的絕對路徑。

c. 移動腳本位置,增加執行權限:
# cp check_megaraid_sas /usr/local/nagios/libexec/check_megaraid_sas
# chmod 755 /usr/local/nagios/libexec/check_megaraid_sas
# /usr/local/nagios/libexec/check_megaraid_sas -h (查看使用幫助)
# /usr/local/nagios/libexec/check_megaraid_sas (檢查狀態)
OK: 0:0:RAID-10:4 drives:1.089TB:Optimal Drives:4


————————————————————————————————————————————

檢測服務端口是否開啟:
netstat -anpt | grep nrpe

檢測配置信息是否正確:
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 (在主控端 檢測則輸入被控端 ip 地址)
返回信息為“NRPE v2.12”(版本號),表明NRPE可以和被監控端正常通信

#######################################################################################

主控端配置:

1、定義如何監控遠程主機及服務
通過NPRE監控遠程Linux主機要使用check_nrpe插件進行,其語法格式如下:
check_nrpe -H <host> [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]
示例:

define command
{
command_name check_swap_nrpe
command_line $USER1$check_nrpe -H "$HOSTADDRESS$" -c "check_swap"
}


如果還希望在監控遠程LINUX主機時還能向其傳遞參數,則可以使用類似如下方式進行:
#cd /etc/nagios/objects/
#vi commands.cfg \\增加以下內容
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

2、創建被監控的主機和服務配置文件:
如:vim RHEL.cfg
define host{
use rhel-name
host_name RHEL-97
alias 紅帽-97(192.168.34.97)
address 192.168.34.97
}

define service{
use rhel-sys
host_name RHEL-97
service_description disk-磁盤空間
check_command check_nrpe!check_df
}

define service{
use rhel-sys
host_name RHEL-97
service_description 系統負載
check_command check_nrpe!check_load
}

define service{
use rhel-raid
host_name RHEL-97
service_description RAID信息
check_command check_nrpe!check_megaraid_sas
}



define service{
use generic-service
host_name RHEL-97
service_description CHECK USERS
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name RHEL-97
service_description Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name RHEL-97
service_description SDA1
check_command check_nrpe!check_sd1
}
define service{
use generic-service
host_name RHEL-97
service_description SDA2
check_command check_nrpe!check_sd2
}
define service{
use generic-service
host_name RHEL-97
service_description Zombie
check_command check_nrpe!check_zombie_procs
}
define service{
use generic-service
host_name RHEL-97
service_description total procs
check_command check_nrpe!check_total_procs
}

3、增加監控腳本
比如CPU、內存、LVS等、需要自己寫腳本來做,註意2個點就OK,控制輸入(參數等)、格式化輸出。只要輸出格式符合Nagios的格式識別方式就行!

如:


內存監控:

vi check_mem.sh

#!/bin/bash
# check memory script
# sunny 2008.2.15
# Total memory
TOTAL=`free -m | head -2 |tail -1 |gawk '{print $2}'`
# Free memory
FREE=`free -m | head -2 |tail -1 |gawk '{print $4}'`
# to calculate free percent
# use the expression free * 100 / total
FREETMP=`expr $FREE \* 100 / $TOTAL`
if [ $FREETMP -ge 15 ]
then
echo "OK: The total memory is $TOTAL MB,the free memory is $FREE MB($FREETMP%)"
exit 0
fi
if [ $FREETMP -ge 6 ] || [ $FREETMP -lt 15 ]
then
echo "WARNING: The total memory is $TOTAL MB,the free memory is $FREE MB($FREETMP%)"
exit 1
fi
if [ $FREETMP -le 5 ]
then
echo "ERROR: The total memory is $TOTAL MB,the free memory is $FREE MB($FREETMP%)"
exit 2
fi


LVS監控:

vi check_lvs.sh


MYSQL監控:

在需要監控的mysql數據庫上建一個專門給Nagios使用的庫


mysql>create database nagdb default CHARSET=utf8;
mysql> grant select on nagdb.* to 'nagios'@'192.168.1.100';
mysql> update mysql.user set Password = PASSWORD('nagios') where user='nagios';

#/usr/local/nagios/libexec/check_mysql -H 192.168.1.101 -u nagios -d nagdb -p nagios -w 10 -c 30


memcached監控:
使用插件,用perl語言寫的,需要安裝多個依賴包,比較坑爹。。我也不容易啊

(1)安裝模塊

#yum -y install perl-Carp-Clan perl-Cache-Memcached perl-Nagios-Plugin

--如果不能安裝
#wget http://dag.wieers.com/rpm/packages/rpmforge-release/rpmforge-release-0.5.2-2.rf.src.rpm
#rpm -ivh rpmforge-release-0.5.2-2.rf.src.rpm
#yum -y install perl-Nagios-Plugin.noarch perl-Carp-Clan.noarch perl-Cache-Memcached.noarch

--如果perl-Nagios-Plugin無法安裝
wget http://packages.sw.be/perl-Nagios-Plugin/perl-Nagios-Plugin-0.33-1.el5.rf.noarch.rpm
rpm -ivh perl-Nagios-Plugin-0.33-1.el5.rf.noarch.rpm --force --nodeps

(2)插件安裝
下載Nagios-Plugins-Memcached-0.02.tar.gz後安裝【依賴包較多,請註意查看.pm文件的存放位置】
#tar xzvf Nagios-Plugins-Memcached-0.02.tar.gz
#cd Nagios-Plugins-Memcached-0.02
#yum -y install perl-CPAN
# perl Makefile.PL

--執行後會出現一些提示讓你選擇,按照自己想法選或者一路回車都能通過
# make

--這時他會下載一些運行時需要的東西
# make install

--默認會把check_memcached文件放到/usr/bin/check_memcached
--沒關系 把他拷貝到nagios的libexec下
#cp /usr/local/bin/check_memcached /usr/local/nagios/libexec/
#chown nagios.nagios check_memcached


在commands.cfg裏面加上這麽幾條(這裏我沒有把check_memcached裝在memcached服務器上,而是通過Nagios的check_memcached直接去訪問memcached服務器的11211端口,當然你也可以把他裝在memcached服務器上利用check_nrpe來取他的值)
define command {
command_name check_memcached_11211
command_line $USER1$/check_memcached -H 192.168.1.101:11211 --size-warning 80 --size-critical 90
}
上面這個是來監控memcached的內存使用比例
define command {
command_name memcached_response_11211
command_line /usr/local/bin/check_memcached -H 192.168.1.101 -w 300 -c 500
}
這個是用來監控memcached是否還有應答
define command {
command_name check_memcached_hit
command_line /usr/local/bin/check_memcached -H 192.168.1.101 --hit-warning 10 --hit-critical 5
}
./check_memcached -H 192.168.108.96 -w 300 -c 500


——————————————————————————————————————————


由於在 RHEL.cfg 裏添加了 rhel-raid、rhel-sys、rhel-name、rhel-service 等引用資源,所以需在 templates.cfg 中進行定義。


vim /usr/local/nagios/etc/objects/templates.cfg

define host{
name rhel-name ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 3 ; Check each Linux host 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day
; Note that the notification_period variable is being overridden from
; the value that is inherited from the generic-host template!
notification_interval 120 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}


####

define service{
name rhel-sys ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 5 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}


####

define service{
name rhel-raid ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 5 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options w,c ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 0 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

#####

define service{
name rhel-service ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 3 ; Check the service every 10 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 10 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

#####
______________________________________________________________

添加郵件報警配置:

vim contacts.cfg

將email 後的域名修改為報警郵箱即可

如需添加contactgroup 組用戶,則須在contacts.cfg 中相關資源中 contact_groups 後添加定義的組名(組名之間用逗號“,”隔開)


在nagios.cfg中添加定義的監控主機:

vim /nagios.cfg

cfg_file=/usr/local/nagios/etc/objects/RHEL/RHEL.cfg
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

_________________________________________________________

增加mysql監控
(1)下載
#yum install perl-Class-DBI-mysql
http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=174&cf_id=30
http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=174&cf_id=36

(2)
# cp check_mysqld.pl /usr/local/nagios/libexec
# chmod 755 /usr/local/nagios/libexec/check_mysqld.pl
# chown nagios.nagios /usr/local/nagios/libexec/check_mysqld.pl
# cp check_mysqld.php /usr/local/pnp4nagios/share/templates.dist
# chown nagios.nagios /usr/local/pnp4nagios/share/templates.dist/check_mysqld.php
# chmod 755 /usr/local/pnp4nagios/share/templates.dist/check_mysqld.php

(3)
# vi command.cfg
define command{
command_name check_mysqld
command_line $USER1$/check_mysqld.pl -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -D $ARG3$ -a uptime,threads_connected,questions,slow_queries,open_tables -w ',,,,' -c ',,,,' -A $USER21$
}

(4)(——這條選配)
#vi resouce.cfg
$USER7$=nagios
$USER21$='com_select,com_update,com_insert,com_insert_select,com_commit,com_delete,com_rollback,aborted_clients,aborted_connects,binlog_cache_disk_use,binlog_cache_use,bytes_received,bytes_sent,connections,created_tmp_disk_tables,created_tmp_files,created_tmp_tables,delayed_errors,delayed_insert_threads,delayed_writes,handler_update,handler_write,handler_delete,handler_read_first,handler_read_key,handler_read_next,handler_read_prev,handler_read_rnd,handler_read_rnd_next,key_blocks_not_flushed,key_blocks_unused,key_blocks_used,key_read_requests,key_reads,key_write_requests,key_writes,max_used_connections,not_flushed_delayed_rows,open_files,open_streams,open_tables,opened_tables,prepared_stmt_count,qcache_free_blocks,qcache_free_memory,qcache_hits,qcache_inserts,qcache_lowmem_prunes,qcache_not_cached,qcache_queries_in_cache,qcache_total_blocks,questions,select_full_join,select_rangle_check,slow_launch_threads,slow_queries,table_locks_immediate,table_locks_waited,threads_cached,threads_connected,threads_created,threads_running'

(5)
vim templates.cfg

define host{
name mysql-server ; The name of this host template
use generic-host ; This template inherits other values from the generic-host template
check_period 24x7 ; By default, Linux hosts are checked round the clock
check_interval 5 ; Actively check the host every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 3 ; Check each Linux host 10 times (max)
check_command check-host-alive ; Default command to check Linux hosts
notification_period workhours ; Linux admins hate to be woken up, so we only notify during the day
; Note that the notification_period variable is being overridden from
; the value that is inherited from the generic-host template!
notification_interval 120 ; Resend notifications every 2 hours
notification_options d,u,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}


#vi mysql.cfg
define service{
use generic-service,mysql-server
host_name mysql
service_description Mysqld_pnp
check_command check_mysqld!nagios!nagios!nagdb
}

這裏貼一個監控配置:
vi mysql.cfg
define host{
use linux-server,mysql-server
host_name mysql
alias My mysql Host
address 192.168.34.101
}

define service{
use generic-service,mysql-server
host_name mysql
service_description Mysqld
check_command check_mysql!nagios!nagios!10!60
}
#define service{
# use generic-service,mysql-server
# host_name mysql
# service_description Mysqld_pnp
# check_command check_mysqld!nagios!nagios!nagdb
#}
define service{
use generic-service,mysql-server
host_name mysql
service_description CHECK USERS
check_command check_nrpe!check_users
}
# Create a service for monitoring the uptime of the server
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,mysql-server
host_name mysql
service_description Load
check_command check_nrpe!check_load
}
# Create a service for monitoring CPU load
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,mysql-server
host_name mysql
service_description SDA1
check_command check_nrpe!check_sd1
}
# Create a service for monitoring memory usage
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,mysql-server
host_name mysql
service_description SDA2
check_command check_nrpe!check_sd2
}
# Create a service for monitoring C:\ disk usage
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,mysql-server
host_name mysql
service_description Zombie
check_command check_nrpe!check_zombie_procs
}
# Create a service for monitoring the W3SVC service
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,mysql-server
host_name mysql
service_description total procs
check_command check_nrpe!check_total_procs
}
define service{
use generic-service,mysql-server
host_name mysql
service_description Cpu
check_command check_nrpe!check_cpu
}
define service{
use generic-service,mysql-server
host_name mysql
service_description Mem
check_command check_nrpe!check_mem
}
#define service{
# use generic-service,mysql-server
# host_name mysql
# service_description Http
# check_command check_http!/
# }
define service{
use generic-service,mysql-server
host_name mysql
service_description Ping
check_command check_ping!100.0,20%!500.0,60%
}
#define service{
# use generic-service,mysql-server
# host_name mysql
# service_description check_memcached_11211
# check_command check_memcached_11211!80!100
# }
#define service{
# use generic-service,mysql-server
# host_name mysql
# service_description check_memcached_response_11211
# check_command check_memcached_response_11211!300!500
# }
#define service{
# use generic-service,mysql-server
# host_name mysql
# service_description check_memcached_hit
# check_command check_memcached_hit!10!5
# }





(至此,所有配置就都完成了)
############################################################

測試:
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (檢測配置文件)

主控端上測試nagios被監控端命令:
#/usr/local/nagios/libexec/check_nrpe -H 192.168.34.97 -c check_megaraid_sas (或者check_df等監控腳本都行)
正常情況下返回值為監控項的相應信息,則表示正常。

若一切正常則重啟服務:service nagios restart
service httpd restart

/usr/local/nagios/bin/nagios -s /usr/local/nagios/etc/nagios.cfg (對nagios進行啟動時間評估)
vim nrpe.sh
killall -9 nrpe
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
____________________________________________________________________

nagios 錯誤 解決方案


錯誤1】在nagios頁面中,有個Map鏈接,一點開就報錯:
The requested URL /nagios/cgi-bin/statusmap.cgi was not found on this server
--解決:
statusmap.cgi依賴gd開發包
通過yum安裝gd開發包,然後重新編譯configuration及make nagios cgi部分
yum -y install gd gd-devel
./configure --with-gd-lib=/usr/lib --with-gd-inc=/usr/include
#make all
#make install
#make install-init
#make install-config
#make install-commandmode
make install-config

2】普通用戶(除nagiosadmin外所有用戶)點nagios頁面中的service等鏈接,都出現如下錯誤:
It appears as though you do not have permission to view information for any of the hosts you requested...
If you believe this is an error, check the HTTP server authentication requirements for accessing this CGI
and check the authorization options in your CGI configuration file.
---原因:
認證用戶不正確,編輯etc/cgi.cfg,該文件裏默認的是nagiosadmin,如果新建的用戶要想查看的話,得添加進去,多用戶用逗號分開
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin
如果不是 nagiosadmin 需要到後面添加,例子 authorized_for_system_information=nagiosadmin,admin

或者是監控主機服務中 host-namg 或者service_description 參數中含有中文字符

3】如果提示“Whoops! Error: Could not read object configuration data! ”,這是因為沒有啟動nagios後臺進程,執行以下命令
解決方法:/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

4】安裝nrpe時提示錯誤:configure: error: cannot find ssl headers

安裝nrpe,編譯的時候提示以下信息checking for SSL headers... configure: error: Cannot find ssl headers原因是缺少openssl-devel包,
yum -y install openssl-devel 問題解決

5】在 web 端頁面,主機或服務顯示不了,需多次刷新後才會出現,且點擊某項服務是 顯示
Error: Service Status Not Found!

解決方法:重啟主控端機器(Apache、nagios 等需設置開機啟動)

6】當在被控端 nrpe.cfg 文件中定義一條新的命令時,
如:command[check_df]=/usr/local/nagios/libexec/check_disk -x / -w 15% -c 5%
在用 /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_df 檢測時報錯:
顯示 “DISK CRITICAL - /root/.gvfs is not accessible: 權限不夠”,且在等於號後面的
命令本身正確的情況下,可以在參數後加 【-A -i '.gvfs'】,改變後的命令為:

command[check_df]=/usr/local/nagios/libexec/check_disk -x / -w 15% -c 5% -A -i '.gvfs'
再重啟 nrpe 可解決問題。















nagios 服務端與客戶端監控安裝與詳細配置,各配置文件詳解