1. 程式人生 > >[運維]ELK實現日誌監控告警

[運維]ELK實現日誌監控告警

ELK(Elasticsearch+LogStash+Kibana),最近使用ELK處理了一些平臺日誌,下面以「mysql連線數監控」記錄部署流程。

背景

平臺缺失針對mysql連線數的告警,一旦mysql連線數打滿,將直接影響平臺的使用。另外,對於日誌資訊既沒有視覺化介面進行操作,也沒有一套有效的實時監控策略。

收益
1. 當異常觸發時能夠及時通過簡訊、郵件等方式通知相關負責人員
2. 建立日誌視覺化介面,使得日誌分析更加便捷

1. 軟體版本

軟體 版本
Logstash v2.3.4
Filebeat v1.3.1
ElasticSearch v2.3.3
Kibana v4.5.1
ElastAlert v0.1.4

2. 解決方案

2.1. 監控架構圖

mysql_connection_monitor

2.2. mysql連線數查詢

mysql的連線通常是一個請求佔用一個連線,如果該請求(insert,delete,update,select)長時間沒有執行完畢,則會造成連線的堆積,迅速地消耗完資料庫的連線數,目前ph平臺線上資料庫的最大連線數是1000個。

這裡使用一個shell指令碼來持續監控mysql連線數情況,每分鐘查詢一次mysql的連線數,並寫入到日誌檔案

日誌樣例參考:

mysql連線數日誌樣例
shell指令碼: mysql連線數查詢指令碼
輪詢機制: crontab任務,每分鐘輪詢一次

# query mysql connection
* * * * * /bin/sh /home/disk5/query_mysql_connection_log.sh > /dev/null

mysql連線數日誌樣例

2017-01-20 00:01:01 machine_0001=4
2017-01-20 00:01:01 machine_0002=56
2017-01-20 00:01:01 machine_0003=13
2017-01-20 00:01:01 machine_0004=87
2017-01-20 00:01:01 total_connection_number=160
==========

2.3. FileBeat配置

FileBeat配置檔案請參見:附錄-filebeat配置檔案

FileBeat負責監控mysql連線數查詢產生的log(參考mysql連線數日誌樣例),並將不以===開頭的內容上報到LogStash

配置資訊

配置項 配置值
是否合併多行 No
輪詢時間間隔 120s
文件型別 mysql_connection_log
監控路徑 /home/disk5/logs/mysql_connection_*
篩選規則 不以===開頭的log

2.4. LogStash配置

LogStash配置檔案請參見:附錄-logstash配置檔案

正則匹配使用grok debug工具進行除錯(grok debug

描述

收集FileBeat傳送過來的log資訊,獲取日誌時間和錯誤資訊

輸入

2017-01-20 10:18:01 machine_0001=62

正則匹配

%{TIMESTAMP_ISO8601:time}\s+%{USER:machine}=%{NUMBER:connection_num}

其中:TIMESTAMP_ISO8601、USER、NUMBER是LogStash的grok pattern變數

將得到:
time欄位:日誌時間
machine欄位:機器host
connection_num欄位:機器持有mysql的連線數

輸出

time = 2017-01-20 10:18:01
machine = machine_0001
connection_num = 62

3.5. Elasticsearch配置

模板名稱:template_mysql_connection_log
ES索引(index): mysql-connection-log-%{+YYYY.MM.dd}
ES型別(type): mysql_connection_log

欄位資訊

欄位名 欄位型別 備註
message string 原始log資訊
tags string
@timestamp date log產生時間
host string
count long
source string
input_type string
type string
offset long
@version string
machine string 機器host
connection_num long 機器持有mysql的連線數

3.6. Kibana檢視各機器連線數趨勢

趨勢圖
ph_mysql_connection_line_chart_sample

3.7. ElastAlert配置

ElastAlert配置檔案請參見:附錄-elastalert配置檔案

每10秒輪詢Elasticsearch的mysql-connection-log-*索引,若在10分鐘內mysql總連線數超過750個的次數超過2次,則向相關人員傳送告警簡訊

附錄

mysql連線數查詢指令碼

#!/bin/bash
source /etc/profile

# Title:    Online Query Mysql Connection
# Author:   ouyangyewei
#
# Create:   ouyangyewei, 2017/01/18
# Update:   ouyangyewei, 2017/01/19, add total_connection_number

FID=`readlink -f $0 | md5sum | awk '{print $1}'`
LOG_FILE=/home/disk5/logs/mysql_connection_$(date +"%Y-%m-%d").log
# ----------------------------------------------

function get_process_list() {
  mysql -uroot \
  -pxxx \
  -hxxx \
  -P3306 \
  -e 'show processlist' \
  --silent \
  --skip-column-names | awk '
    {
      if ($3=="user" && $4!="NULL") {
        split($4, machine, ":");
        print machine[1];
      }
      if ($3!="user" && $4!="user"){
        split($3, machine, ":");
        print machine[1];
      }
    }' | sort | uniq -c > /tmp/$FID
}

function run() {
  # get current mysql connection status
  get_process_list;

  TIMESTAMP=`date +"%F %T"`
  if [[ -f /tmp/$FID ]]; then
    sum=0
    while read line
    do
      machine=`echo $line | awk '{print $2}'`
      connect_number=`echo $line | awk '{print $1}'`
      sum=$(($sum+$connect_number))
      echo "$TIMESTAMP $machine=$connect_number" >> $LOG_FILE
    done < /tmp/$FID
    echo "$TIMESTAMP total_connection_number=$sum" >> $LOG_FILE
    echo "---------------------------------------" >> $LOG_FILE

    # remove tmp file
    rm -rf /tmp/$FID
  fi
}
# ----------------------------------------------

# starup
run

FileBeat配置檔案

filebeat:
  prospectors:
    -
      paths:
        - /home/disk5/logs/mysql_connection_*
      input_type: log
      document_type: mysql_connection_log
      ignore_older: 84h
      scan_frequency: 120s
      exclude_lines: ["^==="]

output:
  logstash:
    hosts: ["xxx:8044"]

logging:
  level: debug
  to_files: true
  to_syslog: false
  files:
    path: /var/log/mybeat
    name: mybeat.log
    keepfiles: 7

LogStash配置檔案

input {
  beats {
    port => 8044
  }
}
filter {
  if [type] == "mysql_connection_log" {
    grok {
      patterns_dir => ["/conf/patterns"]
      match => {
        "message" => "%{TIMESTAMP_ISO8601:time}\s+%{USER:machine}=%{NUMBER:connection_num}"
      }
      remove_field => ["beat"]
    }
    date {
      match => ["time", "yy-MM-dd HH:mm:ss"]
      remove_field => ["time"]
    }
  }
}
output {
  if [type]=="mysql_connection_log" {
    elasticsearch {
      hosts => ["xxx:8096","xxx:8096"]
      index => "mysql-connection-log-%{+YYYY.MM.dd}"
    }
  }
}

Elasticsearch模板

curl -XPUT 'localhost:9200/_template/template_mysql_connection_log?pretty' -d'
{
  "order": 0,
  "template": "mysql-connection-log-*",
  "settings": {},
  "mappings": {
    "palo-log": {
      "properties": {
        "tags": {
          "index": "not_analyzed",
          "type": "string"
        },
        "message": {
          "index": "not_analyzed",
          "type": "string"
        },
        "@version": {
          "type": "string"
        },
        "@timestamp": {
          "format": "strict_date_optional_time||epoch_millis",
          "type": "date"
        },
        "source": {
          "index": "not_analyzed",
          "type": "string"
        },
        "offset": {
          "type": "long"
        },
        "type": {
          "index": "not_analyzed",
          "type": "string"
        },
        "input_type": {
          "index": "not_analyzed",
          "type": "string"
        },
        "count": {
          "type": "long"
        },
        "host": {
          "index": "not_analyzed",
          "type": "string"
        },
        "machine": {
          "index": "not_analyzed",
          "type": "string"
        },
        "connection_num": {
          "index": "not_analyzed",
          "type": "long"
        }
      }
    }
  },
  "aliases": {}
}'

ElastAlert配置檔案

# Alert when the rate of events exceeds a threshold

# (Optional)
# Elasticsearch host
es_host: xx.xx.xx.xx

# (Optional)
# Elasticsearch port
es_port: 8096

# (OptionaL) Connect with SSL to Elasticsearch
#use_ssl: True

# (Optional) basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword

# (Required)
# Rule name, must be unique
name: MysqlConnectionRule

# (Required)
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency
# type: any

# (Required)
# Index to search, wildcard supported
index: mysql-connection-log-*

# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
num_events: 3

# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:
    minutes: 10
    # hours: 1

# (Required)
# A list of Elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
filter:
- range:
    connection_num:
      from: 750

# (Required)
# The alert is use when a match is found
alert:
- command
command: [
  "curl",
  "-X POST",
  "-d",
  '{"appId":"xxx", "token":"xxx", "alertList":[{"channel":"sms", "description":"Mysql連線數告警:當前總連線數為%(connection_num)s!", "receiver":"ouyangyew"}]}',
  "http://xxx.baidu.com/alert/push"
]

# (required, email specific)
# a list of email addresses to send alerts to
# email:
# - "[email protected]"