前言

    zabbix-server昨天出了個問題,不停的重啟。昨天擺弄到晚上也不搞清楚原因,按照網上說的各種操作,各種CacheSize、TimeOut、StartPollers都改了,還有什麼Include的日誌也不貼說個丟,,,想著今天一早來處理下,結果出了生產事故。

    剛好最近超融合不穩定,凌晨的時候,生產環境有臺伺服器(註冊中心和配置中心)無故重啟,然後導致一系列的問題,這個不在這裡贅述,來講一下zabbix這個事吧。

環境

CentOS Linux release 7.6.1810 (Core)
mysql 5.7 # docker啟動,資料落盤

zabbix參照官方文件 安裝的5.0TLS+CentOS7+Mysql+Nginx版。

zabbix_server (Zabbix) 5.0.5
Revision eaa427cf19 26 October 2020, compilation time: Oct 26 2020 12:20:11 Copyright (C) 2020 Zabbix SIA
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law. This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/). Compiled with OpenSSL 1.0.2k-fips 26 Jan 2017
Running with OpenSSL 1.0.2k-fips 26 Jan 2017

PS:本人對zabbix瞭解不多,只是會安照官方和網上的文件安裝配置,自己會做一些自定義的監控配置。

問題

zabbix-server不停重啟,登入頁面也打不開,zabbix-server.log報錯如下:

  2148:20210603:143421.801 Starting Zabbix Server. Zabbix 5.0.5 (revision eaa427cf19).
2148:20210603:143421.801 ****** Enabled features ******
2148:20210603:143421.801 SNMP monitoring: YES
2148:20210603:143421.801 IPMI monitoring: YES
2148:20210603:143421.801 Web monitoring: YES
2148:20210603:143421.801 VMware monitoring: YES
2148:20210603:143421.801 SMTP authentication: YES
2148:20210603:143421.801 ODBC: YES
2148:20210603:143421.801 SSH support: YES
2148:20210603:143421.801 IPv6 support: YES
2148:20210603:143421.801 TLS support: YES
2148:20210603:143421.801 ******************************
2148:20210603:143421.801 using configuration file: /etc/zabbix/zabbix_server.conf
...
...
2179:20210603:143423.081 ================================
2179:20210603:143423.081 Please consider attaching a disassembly listing to your bug report.
2179:20210603:143423.081 This listing can be produced with, e.g., objdump -DSswx zabbix_server.
2179:20210603:143423.081 ================================
2148:20210603:143423.082 One child process died (PID:2179,exitcode/signal:1). Exiting ...
zabbix_server [2148]: Error waiting for process with PID 2179: [10] No child processes
2148:20210603:143423.088 syncing history data...
2148:20210603:143423.097 syncing history data... 100.000000%
2148:20210603:143423.097 syncing history data done
2148:20210603:143423.097 syncing trend data...
2148:20210603:143423.102 syncing trend data done
2148:20210603:143423.102 Zabbix Server stopped. Zabbix 5.0.5 (revision eaa427cf19).

處理過程

  日誌裡是沒有體現出記憶體、快取、MySQL等問題,於是網上各種檢索。做了各種操作,全套重啟、修改CacheSize、檢視子程序鎖死情況、清資料庫。

  後面把MySQL直接初始化,發現zabbix-server啟動了幾分鐘,然後又開始無間斷重啟。登入頁也報錯 Database error Connection timed out,檢視zabbix-server.conf沒有問題。然後找官方的安裝文件,發現zabbix是front、server分離的。。。emmm這個時候好像找到問題了。

  檢查前端的配置發現/etc/zabbix/web/zabbix.conf.php下的mysql資訊竟然不對???WTF!!!趕緊修改。然後重啟

systemctl stop zabbix-server zabbix-agent rh-nginx116-nginx rh-php72-php-fpm
過了幾分鐘,zabbix-server又開始重啟,然後想到網上的一篇文件,修改報警媒介型別裡mail的配置-安全連結:改成STARTTLS(純文字通訊協議擴充套件)。終於恢復了。。。

PS:

使用一些開源軟體的時候,還是要多瞭解一下軟體本身的架構,維護起來也會更加得心應手。

特別感謝:

https://blog.csdn.net/liuxiangyang_/article/details/100024641

https://yunwei365.blog.csdn.net/article/details/103677447

https://blog.csdn.net/h106140873/article/details/104311586