1. 程式人生 > >nagios報 check_oracle_rman_backup_problems告警處理思路

nagios報 check_oracle_rman_backup_problems告警處理思路

check_oracle_rman_b

本人不是Oracle DBA,不懂Oracle,告警了運維又不管,說是DBA的活,反正在他們眼裏無論是MySQL,Oracle,SYBASE還是Redis,MongoDB都是DBA,和他們沒關系。。。。。
1.打開nrpe.cfg,找到check_oracle_rman_backup_problems監控項,執行一下
cat /usr/local/nagios/etc/nrpe.cfg![](http://i2.51cto.com/images/blog/201803/09/6ac77908871d3a4587a289d7f718f8a4.png?x-oss-process=image/watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=)
2.找到check_oracle_health腳本(perl語言)監控的,那就打開看看是如何取值監控的唄
通過rman-backup-problems搜索到在@mode數組
br/>![](http://i2.51cto.com/images/blog/201803/09/6ac77908871d3a4587a289d7f718f8a4.png?x-oss-process=image/watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=)
2.找到check_oracle_health腳本(perl語言)監控的,那就打開看看是如何取值監控的唄
通過rman-backup-problems搜索到在@mode數組

並找到如下代碼,其中sql就是我們最終要找的,這是關於rman備份狀態監控
elsif ($params{mode} =~ /server::instance::rman::backup::problems/) {
$self->{rman_backup_problems} = $self->{handle}->fetchrow_array(q{
SELECT COUNT(*) FROM v$rman_status
WHERE
operation = ‘BACKUP‘
AND
status != ‘COMPLETED‘
AND
status != ‘RUNNING‘
AND
start_time > sysdate-3
});
} elsif ($params{mode} =~ /server::instance::rman::backup::problems/) {
$self->add_nagios(
$self->check_thresholds($self->{rman_backup_problems}, 1, 2),
sprintf "rman had %d problems during the last 3 days",
$self->{rman_backup_problems});
$self->add_perfdata(sprintf "rman_backup_problems=%d;%d;%d",
$self->{rman_backup_problems},
$self->{warningrange}, $self->{criticalrange});
現在知道這個是由於rman備份造成,那就執行下sql和備份日誌,發現如下錯誤
Deleting the following obsolete backups and copies:
Type Key Completion Time Filename/Handle


Control File Copy 69 2017-12-20 11:22:41 /data/ora11g/product/11.2.0/db_1/dbs/snapcf_oradb2.f
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_1 channel at 03/06/2018 01:15:28
ORA-19606: Cannot copy or restore to snapshot control file
知道錯誤,那就好解決啦,網上一搜總結如下:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/data/ora11g/product/11.2.0/db_1/dbs/snapcf_oradb2.f_bak‘;
crosscheck controlfilecopy ‘/data/ora11g/product/11.2.0/db_1/dbs/snapcf_oradb2.f‘;
delete expired controlfilecopy ‘/data/ora11g/product/11.2.0/db_1/dbs/snapcf_oradb2.f‘;
CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/data/ora11g/product/11.2.0/db_1/dbs/snapcf_oradb2.f‘;
CONFIGURE SNAPSHOT CONTROLFILE NAME clear;
總結,這裏需要你能看懂perl面向對象編程,這裏package xxx相當於class 聲明類,new函數就是常說的構造函數,我覺的不會不可怕,不會可以去學,順便了解了一下perl語言,還是有收獲的

nagios報 check_oracle_rman_backup_problems告警處理思路