Prometheus 實戰於原始碼分析之API與聯邦

阿新 • • 發佈：2019-01-26

在進行原始碼講解關於prometheus還有一些配置和使用，需要解釋一下。首先是API的使用，prometheus提供了一套HTTP的介面

curl http://localhost:9090/api/v1/query?query=go_goroutines|python -m json.tool

{
    "data": {
        "result": [
            {
                "metric": {
                    "__name__": "go_goroutines",
                    "instance" 
: "localhost:9090",
                    "job": "prometheus"
                },
                "value": [
                    1493347106.901,
                    "119"
                ]
            },
            {
                "metric": {
                    "__name__": "go_goroutines",
                    "instance" 
: "10.39.0.45:9100",
                    "job": "node"
                },
                "value": [
                    1493347106.901,
                    "13"
                ]
            },
            {
                "metric": {
                    "__name__": "go_goroutines",
                    "instance" 
: "10.39.0.53:9100",
                    "job": "node"
                },
                "value": [
                    1493347106.901,
                    "11"
                ]
            }
        ],
        "resultType": "vector"
    },
    "status": "success"
}

上面演示一個查詢go_goroutines這一個監控指標的資料。讓然也可以基於開始時間和截止時間查詢，但更強大的功能應該是支援OR查詢


[root@slave3 ~]# curl -g 'http://localhost:9090/api/v1/series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}'|python -m json.tool
{
    "data": [
        {
            "__name__": "up",
            "instance": "10.39.0.53:9100",
            "job": "node"
        },
        {
            "__name__": "up",
            "instance": "localhost:9090",
            "job": "prometheus"
        },
        {
            "__name__": "up",
            "instance": "10.39.0.45:9100",
            "job": "node"
        },
        {
            "__name__": "process_start_time_seconds",
            "instance": "localhost:9090",
            "job": "prometheus"
        }
    ],
    "status": "success"
}

查詢一個系列的資料，當然還可以通過DELETE去刪除系列。還記得上一篇說的設定job和targets了嗎？也可以通過API查詢

 curl http://localhost:9090/api/v1/label/job/values
{"status":"success","data":["node","prometheus"]}

當然有哪些監控物件也可以查詢

curl http://localhost:9090/api/v1/targets|python -m json.tool
{
    "data": {
        "activeTargets": [
            {
                "discoveredLabels": {
                    "__address__": "10.39.0.53:9100",
                    "__metrics_path__": "/metrics",
                    "__scheme__": "http",
                    "job": "node"
                },
                "health": "up",
                "labels": {
                    "instance": "10.39.0.53:9100",
                    "job": "node"
                },
                "lastError": "",
                "lastScrape": "2017-04-28T02:47:40.871586825Z",
                "scrapeUrl": "http://10.39.0.53:9100/metrics"
            },
            {
                "discoveredLabels": {
                    "__address__": "10.39.0.45:9100",
                    "__metrics_path__": "/metrics",
                    "__scheme__": "http",
                    "job": "node"
                },
                "health": "up",
                "labels": {
                    "instance": "10.39.0.45:9100",
                    "job": "node"
                },
                "lastError": "",
                "lastScrape": "2017-04-28T02:47:45.144032466Z",
                "scrapeUrl": "http://10.39.0.45:9100/metrics"
            },
            {
                "discoveredLabels": {
                    "__address__": "localhost:9090",
                    "__metrics_path__": "/metrics",
                    "__scheme__": "http",
                    "job": "prometheus"
                },
                "health": "up",
                "labels": {
                    "instance": "localhost:9090",
                    "job": "prometheus"
                },
                "lastError": "",
                "lastScrape": "2017-04-28T02:47:44.079111193Z",
                "scrapeUrl": "http://localhost:9090/metrics"
            }
        ]
    },
    "status": "success"
}

查詢這些target。alertmanagers也是通過/api/v1/alertmanagers可以查詢的。對應prometheus的本地儲存還有一些關鍵的配置需要注意：
prometheus_local_storage_memory_series：當前的系列數量在記憶體中儲存。
prometheus_local_storage_open_head_chunks：開啟頭塊的數量。
prometheus_local_storage_chunks_to_persist：仍然需要將其持續到磁碟的記憶體塊數。
prometheus_local_storage_memory_chunks：目前在記憶中的塊數。如果減去前兩個，則可以得到持久化塊的數量（如果查詢當前沒有使用，則它們是可驅動的）。
prometheus_local_storage_series_chunks_persisted：每個批次持續存在塊數的直方圖。
prometheus_local_storage_rushed_mode如果prometheus斯處於“衝動模式”，則為1，否則為0。可用於計算prometheus處於衝動模式的時間百分比。
prometheus_local_storage_checkpoint_last_duration_seconds：最後一個檢查點需要多長時間
prometheus_local_storage_checkpoint_last_size_bytes：最後一個檢查點的大小（以位元組為單位）。
prometheus_local_storage_checkpointing是1，而prometheus是檢查點，否則為0。可以用來計算普羅米修斯檢查點的時間百分比。
prometheus_local_storage_inconsistencies_total：找到儲存不一致的計數器。如果大於0，請重新啟動伺服器進行恢復。
prometheus_local_storage_persist_errors_total：反對持續錯誤。
prometheus_local_storage_memory_dirty_series：當前髒系列數量。
process_resident_memory_bytes廣義地說，prometheus程序所佔據的實體記憶體。
go_memstats_alloc_bytes：去堆大小（分配的物件在使用中加分配物件不再使用，但尚未被垃圾回收）。

prometheus還另一個高階應用就是叢集聯邦，通過定義slave，這樣就可以在每個資料中心部署一個，然後通過聯邦匯聚。

- scrape_config:
  - job_name: dc_prometheus
    honor_labels: true
    metrics_path: /federate
    params:
      match[]:
        - '{__name__=~"^job:.*"}'   # Request all job-level time series
    static_configs:
      - targets:
        - dc1-prometheus:9090
        - dc2-prometheus:9090

當然如果儲存量不夠還可以通過分片去採集，

global:
  external_labels:
    slave: 1  # This is the 2nd slave. This prevents clashes between slaves.
scrape_configs:
  - job_name: some_job
    # Add usual service discovery here, such as static_configs
    relabel_configs:
    - source_labels: [__address__]
      modulus:       4    # 4 slaves
      target_label:  __tmp_hash
      action:        hashmod
    - source_labels: [__tmp_hash]
      regex:         ^1$  # This is the 2nd slave
      action:        keep

上面定義hash的方式去決定每個prometheus負責的targe他。

- scrape_config:
  - job_name: slaves
    honor_labels: true
    metrics_path: /federate
    params:
      match[]:
        - '{__name__=~"^slave:.*"}'   # Request all slave-level time series
    static_configs:
      - targets:
        - slave0:9090
        - slave1:9090
        - slave3:9090
        - slave4:9090

下面定義了多個slave。這樣資料就可以分片儲存了。

Prometheus 實戰於原始碼分析之API與聯邦

Prometheus 實戰於原始碼分析之API與聯邦

Prometheus 實戰於原始碼分析之collector

netty原始碼分析之-SimpleChannelInboundHandler與ChannelInboundHandlerAdapter詳解(6)

spark mllib原始碼分析之DecisionTree與GBDT

Mybatis原始碼分析之Spring與Mybatis整合MapperScannerConfigurer處理過程原始碼分析

Realm原始碼分析之copyToRealm與copyToRealmOrUpdate

netty原始碼分析之-EventLoop與執行緒模型（1）

Mybatis深入原始碼分析之Mapper與介面繫結原理原始碼分析

symfony原始碼分析之容器的生成與使用

雲客Drupal8原始碼分析之臨時儲存與訊息服務

docker原始碼分析之容器日誌處理與log-driver實現

Yarn原始碼分析之旅---總體架構---概述與總體架構

LDD3原始碼分析之與硬體通訊&中斷處理

Lighttpd原始碼分析之狀態機與外掛

雲客Drupal8原始碼分析之表單Form API

分散式訊息佇列RocketMQ原始碼分析之2 -- Broker與NameServer心跳機制

Netty學習之旅----原始碼分析記憶體分配與釋放原理

Netty原始碼分析之ChannelPipeline(一)—ChannelPipeline的構造與初始化

Netty原始碼分析之ChannelPipeline(二)—ChannelHandler的新增與刪除

Netty原始碼分析之ByteBuf(一)—ByteBuf中API及型別概述

Prometheus 實戰於原始碼分析之API與聯邦

相關推薦