[運維]ELK實現日誌監控告警
ELK(Elasticsearch+LogStash+Kibana),最近使用ELK處理了一些平臺日誌,下面以「mysql連線數監控」記錄部署流程。
背景
平臺缺失針對mysql連線數的告警,一旦mysql連線數打滿,將直接影響平臺的使用。另外,對於日誌資訊既沒有視覺化介面進行操作,也沒有一套有效的實時監控策略。
收益
1. 當異常觸發時能夠及時通過簡訊、郵件等方式通知相關負責人員
2. 建立日誌視覺化介面,使得日誌分析更加便捷
1. 軟體版本
軟體 | 版本 |
---|---|
Logstash | v2.3.4 |
Filebeat | v1.3.1 |
ElasticSearch | v2.3.3 |
Kibana | v4.5.1 |
ElastAlert | v0.1.4 |
2. 解決方案
2.1. 監控架構圖
2.2. mysql連線數查詢
mysql的連線通常是一個請求佔用一個連線,如果該請求(insert,delete,update,select)長時間沒有執行完畢,則會造成連線的堆積,迅速地消耗完資料庫的連線數,目前ph平臺線上資料庫的最大連線數是1000個。
這裡使用一個shell指令碼來持續監控mysql連線數情況,每分鐘查詢一次mysql的連線數,並寫入到日誌檔案
日誌樣例參考:
mysql連線數日誌樣例
shell指令碼: mysql連線數查詢指令碼
輪詢機制: crontab任務,每分鐘輪詢一次
# query mysql connection
* * * * * /bin/sh /home/disk5/query_mysql_connection_log.sh > /dev/null
mysql連線數日誌樣例
2017-01-20 00:01:01 machine_0001=4
2017-01-20 00:01:01 machine_0002=56
2017-01-20 00:01:01 machine_0003=13
2017-01-20 00:01:01 machine_0004=87
2017-01-20 00:01:01 total_connection_number=160
==========
2.3. FileBeat
配置
FileBeat配置檔案請參見:附錄-filebeat配置檔案
FileBeat
負責監控mysql連線數查詢產生的log(參考mysql連線數日誌樣例),並將不以===開頭的內容上報到LogStash
配置資訊
配置項 | 配置值 |
---|---|
是否合併多行 | No |
輪詢時間間隔 | 120s |
文件型別 | mysql_connection_log |
監控路徑 | /home/disk5/logs/mysql_connection_* |
篩選規則 | 不以===開頭的log |
2.4. LogStash
配置
LogStash配置檔案請參見:附錄-logstash配置檔案
正則匹配使用grok debug
工具進行除錯(grok debug)
描述
收集FileBeat傳送過來的log資訊,獲取日誌時間和錯誤資訊
輸入
2017-01-20 10:18:01 machine_0001=62
正則匹配
%{TIMESTAMP_ISO8601:time}\s+%{USER:machine}=%{NUMBER:connection_num}
其中:TIMESTAMP_ISO8601、USER、NUMBER是LogStash的grok pattern變數
將得到:
time欄位:日誌時間
machine欄位:機器host
connection_num欄位:機器持有mysql的連線數
輸出
time = 2017-01-20 10:18:01
machine = machine_0001
connection_num = 62
3.5. Elasticsearch
配置
模板名稱:template_mysql_connection_log
ES索引(index): mysql-connection-log-%{+YYYY.MM.dd}
ES型別(type): mysql_connection_log
欄位資訊
欄位名 | 欄位型別 | 備註 |
---|---|---|
message | string | 原始log資訊 |
tags | string | |
@timestamp | date | log產生時間 |
host | string | |
count | long | |
source | string | |
input_type | string | |
type | string | |
offset | long | |
@version | string | |
machine | string | 機器host |
connection_num | long | 機器持有mysql的連線數 |
3.6. Kibana
檢視各機器連線數趨勢
趨勢圖
3.7. ElastAlert
配置
ElastAlert配置檔案請參見:附錄-elastalert配置檔案
每10秒輪詢Elasticsearch的mysql-connection-log-*
索引,若在10分鐘內mysql總連線數超過750個的次數超過2次,則向相關人員傳送告警簡訊
附錄
mysql連線數查詢指令碼
#!/bin/bash
source /etc/profile
# Title: Online Query Mysql Connection
# Author: ouyangyewei
#
# Create: ouyangyewei, 2017/01/18
# Update: ouyangyewei, 2017/01/19, add total_connection_number
FID=`readlink -f $0 | md5sum | awk '{print $1}'`
LOG_FILE=/home/disk5/logs/mysql_connection_$(date +"%Y-%m-%d").log
# ----------------------------------------------
function get_process_list() {
mysql -uroot \
-pxxx \
-hxxx \
-P3306 \
-e 'show processlist' \
--silent \
--skip-column-names | awk '
{
if ($3=="user" && $4!="NULL") {
split($4, machine, ":");
print machine[1];
}
if ($3!="user" && $4!="user"){
split($3, machine, ":");
print machine[1];
}
}' | sort | uniq -c > /tmp/$FID
}
function run() {
# get current mysql connection status
get_process_list;
TIMESTAMP=`date +"%F %T"`
if [[ -f /tmp/$FID ]]; then
sum=0
while read line
do
machine=`echo $line | awk '{print $2}'`
connect_number=`echo $line | awk '{print $1}'`
sum=$(($sum+$connect_number))
echo "$TIMESTAMP $machine=$connect_number" >> $LOG_FILE
done < /tmp/$FID
echo "$TIMESTAMP total_connection_number=$sum" >> $LOG_FILE
echo "---------------------------------------" >> $LOG_FILE
# remove tmp file
rm -rf /tmp/$FID
fi
}
# ----------------------------------------------
# starup
run
FileBeat配置檔案
filebeat:
prospectors:
-
paths:
- /home/disk5/logs/mysql_connection_*
input_type: log
document_type: mysql_connection_log
ignore_older: 84h
scan_frequency: 120s
exclude_lines: ["^==="]
output:
logstash:
hosts: ["xxx:8044"]
logging:
level: debug
to_files: true
to_syslog: false
files:
path: /var/log/mybeat
name: mybeat.log
keepfiles: 7
LogStash配置檔案
input {
beats {
port => 8044
}
}
filter {
if [type] == "mysql_connection_log" {
grok {
patterns_dir => ["/conf/patterns"]
match => {
"message" => "%{TIMESTAMP_ISO8601:time}\s+%{USER:machine}=%{NUMBER:connection_num}"
}
remove_field => ["beat"]
}
date {
match => ["time", "yy-MM-dd HH:mm:ss"]
remove_field => ["time"]
}
}
}
output {
if [type]=="mysql_connection_log" {
elasticsearch {
hosts => ["xxx:8096","xxx:8096"]
index => "mysql-connection-log-%{+YYYY.MM.dd}"
}
}
}
Elasticsearch模板
curl -XPUT 'localhost:9200/_template/template_mysql_connection_log?pretty' -d'
{
"order": 0,
"template": "mysql-connection-log-*",
"settings": {},
"mappings": {
"palo-log": {
"properties": {
"tags": {
"index": "not_analyzed",
"type": "string"
},
"message": {
"index": "not_analyzed",
"type": "string"
},
"@version": {
"type": "string"
},
"@timestamp": {
"format": "strict_date_optional_time||epoch_millis",
"type": "date"
},
"source": {
"index": "not_analyzed",
"type": "string"
},
"offset": {
"type": "long"
},
"type": {
"index": "not_analyzed",
"type": "string"
},
"input_type": {
"index": "not_analyzed",
"type": "string"
},
"count": {
"type": "long"
},
"host": {
"index": "not_analyzed",
"type": "string"
},
"machine": {
"index": "not_analyzed",
"type": "string"
},
"connection_num": {
"index": "not_analyzed",
"type": "long"
}
}
}
},
"aliases": {}
}'
ElastAlert配置檔案
# Alert when the rate of events exceeds a threshold
# (Optional)
# Elasticsearch host
es_host: xx.xx.xx.xx
# (Optional)
# Elasticsearch port
es_port: 8096
# (OptionaL) Connect with SSL to Elasticsearch
#use_ssl: True
# (Optional) basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword
# (Required)
# Rule name, must be unique
name: MysqlConnectionRule
# (Required)
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency
# type: any
# (Required)
# Index to search, wildcard supported
index: mysql-connection-log-*
# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
num_events: 3
# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:
minutes: 10
# hours: 1
# (Required)
# A list of Elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
filter:
- range:
connection_num:
from: 750
# (Required)
# The alert is use when a match is found
alert:
- command
command: [
"curl",
"-X POST",
"-d",
'{"appId":"xxx", "token":"xxx", "alertList":[{"channel":"sms", "description":"Mysql連線數告警:當前總連線數為%(connection_num)s!", "receiver":"ouyangyew"}]}',
"http://xxx.baidu.com/alert/push"
]
# (required, email specific)
# a list of email addresses to send alerts to
# email:
# - "[email protected]"