cAdvisor+Prometheus+Grafana監控docker

Prometheus Grafana Docker · 發表 2019-03-17 12:37:00

摘要： cAdvisor+Prometheus+Grafana監控docker 一、cAdvisor(需要監控的主機都要安裝) 官方地址： https://github.com/google/cadvisor CAdvisor是谷歌開發的用於分析執行中容器的資源佔用和效能指標的開...

一、cAdvisor(需要監控的主機都要安裝)

官方地址： https://github.com/google/cadvisor

CAdvisor是谷歌開發的用於分析執行中容器的資源佔用和效能指標的開源工具。CAdvisor是一個執行時的守護程序，負責收集、聚合、處理和輸出執行中容器的資訊。

注意在查詢相關資料後發現這是最新版cAdvisor的bug，換成版本為google/cadvisor:v0.24.1 就ok了，對映主機埠預設是8080,可以修改。

sudo docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
--publish=8090:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:v0.24.1

cAdvisor exposes a web UI at its port:

http://<hostname>:<port>/

下圖為cAdvisor的web介面，資料實時重新整理但是不能儲存。

檢視json格式

http://192.168.247.212:8090/metrics

二、Prometheus

官方地址： https://prometheus.io/

隨著容器技術的迅速發展，Kubernetes 已然成為大家追捧的容器叢集管理系統。Prometheus 作為生態圈 Cloud Native Computing Foundation（簡稱：CNCF）中的重要一員,其活躍度僅次於 Kubernetes, 現已廣泛用於 Kubernetes 叢集的監控系統中。本文將簡要介紹 Prometheus 的組成和相關概念，並例項演示 Prometheus 的安裝，配置及使用，以便開發人員和雲平臺運維人員可以快速的掌握 Prometheus。

Prometheus 簡介

Prometheus 是一套開源的系統監控報警框架。它啟發於 Google 的 borgmon 監控系統，由工作在 SoundCloud 的 google 前員工在 2012 年建立，作為社群開源專案進行開發，並於 2015 年正式釋出。2016 年，Prometheus 正式加入 Cloud Native Computing Foundation，成為受歡迎度僅次於 Kubernetes 的專案。

作為新一代的監控框架，Prometheus 具有以下特點：

強大的多維度資料模型：

時間序列資料通過 metric 名和鍵值對來區分。
所有的 metrics 都可以設定任意的多維標籤。
資料模型更隨意，不需要刻意設定為以點分隔的字串。
可以對資料模型進行聚合，切割和切片操作。
支援雙精度浮點型別，標籤可以設為全 unicode。

靈活而強大的查詢語句（PromQL）：在同一個查詢語句，可以對多個 metrics 進行乘法、加法、連線、取分數位等操作。

易於管理： Prometheus server 是一個單獨的二進位制檔案，可直接在本地工作，不依賴於分散式儲存。

高效：平均每個取樣點僅佔 3.5 bytes，且一個 Prometheus server 可以處理數百萬的 metrics。

使用 pull 模式採集時間序列資料，這樣不僅有利於本機測試而且可以避免有問題的伺服器推送壞的 metrics。

可以採用 push gateway 的方式把時間序列資料推送至 Prometheus server 端。

可以通過服務發現或者靜態配置去獲取監控的 targets。

有多種視覺化圖形介面。

易於伸縮。

需要指出的是，由於資料採集可能會有丟失，所以 Prometheus 不適用對採集資料要 100% 準確的情形。但如果用於記錄時間序列資料，Prometheus 具有很大的查詢優勢，此外，Prometheus 適用於微服務的體系架構

Prometheus 組成及架構

Prometheus 生態圈中包含了多個元件，其中許多元件是可選的：

Prometheus Server: 用於收集和儲存時間序列資料。
Client Library: 客戶端庫，為需要監控的服務生成相應的 metrics 並暴露給 Prometheus server。當 Prometheus server 來 pull 時，直接返回實時狀態的 metrics。
Push Gateway: 主要用於短期的 jobs。由於這類 jobs 存在時間較短，可能在 Prometheus 來 pull 之前就消失了。為此，這次 jobs 可以直接向 Prometheus server 端推送它們的 metrics。這種方式主要用於服務層面的 metrics，對於機器層面的 metrices，需要使用 node exporter。
Exporters: 用於暴露已有的第三方服務的 metrics 給 Prometheus。
Alertmanager: 從 Prometheus server 端接收到 alerts 後，會進行去除重複資料，分組，並路由到對收的接受方式，發出報警。常見的接收方式有：電子郵件，pagerduty，OpsGenie, webhook 等。一些其他的工具。

Prometheus 架構圖

安裝步驟：

wget https://github.com/prometheus/prometheus/releases/download/v2.8.0/prometheus-2.8.0.linux-amd64.tar.gz
tar -xf prometheus-2.8.0.linux-amd64.tar.gz
cd prometheus-2.8.0.linux-amd64
修改配置檔案prometheus.yml，新增以下內容
static_configs:
- targets: ['192.168.247.211:9090']
- job_name: 'docker'
static_configs:
- targets:
- "192.168.247.211:8090"
- "192.168.247.212:8090"

cp prometheus promtool /usr/local/bin/

啟動：
nohup prometheus --config.file=./prometheus.yml &

我的完整簡單prometheus.yml配置檔案：

# my global config
global:
scrape_interval:15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'

# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.

static_configs:
- targets: ['192.168.247.211:9090']
- job_name: 'docker'
static_configs:
- targets: 
- "192.168.247.211:8090"
- "192.168.247.212:8090"

訪問：http://192.168.247.211:9090

三、Grafana

官方地址： https://grafana.com/

安裝步驟：

wget https://dl.grafana.com/oss/release/grafana-6.0.1-1.x86_64.rpm
sudo yum localinstall grafana-6.0.1-1.x86_64.rpm -y
systemctl daemon-reload
systemctl start grafana-server
systemctl status grafana-server
#設定開機自啟動
Enable the systemd service so that Grafana starts at boot.
sudo systemctl enable grafana-server.service

1.訪問：http://192.168.247.211:3000/login

預設密碼：admin/admin