解決 Prometheus 不能獲取 Kubernetes 叢集上 Windows 節點的 Metrics

阿新 • • 發佈：2018-12-12

背景

接上一篇快速搭建 Windows Kubernetes , 我們發現原來在 Windows Kubernetes 會有一些與在 Linux 上使用不一樣的體驗，俗稱坑，例如 hostAliases。對於我們希望真的把 Windows 放入生產，感覺除了基本的 Pod、Volume、Service 、Log 以外，我們還需要監控。一般來講我們會用 Prometheus 來做監控，然後通過 Grafana 來展示，但是 Prometheus 的 Node Exporter 是為 *nix 設計的，所以在 Windows 上我們的自己想辦法了。在 Prometheus Node Exporter 裡推薦使用

WMI exporter ，感興趣的童鞋可以去試試，本文主要還是想從一個原始的角度去分析處理，來理解怎麼去寫一個 Prometheus 的採集程式。

前提

一套 Windows Kuberentes
一個 Prometheus 環境

步驟

首先得找到 Kubelet 在 Windows 上暴露出來得資料格式，因為 cadivsor 並不支援 Windows, 社群有位同志寫了一個相對簡單的實現來支援；他這個的實現還是保持 Linux 上的一樣，是從 <Node_IP>:10255/stats/summary上 expose metrics, metrics-server 與 kubectl top

的資料也是來源於此，大致如下：

{
  "node": {
   "nodeName": "35598k8s9001",
   "startTime": "2018-08-26T07:25:08Z",
   "cpu": {
    "time": "2018-09-10T01:44:52Z",
    "usageCoreNanoSeconds": 8532520000000
   },
   "memory": {
    "time": "2018-09-10T01:44:52Z",
    "availableBytes": 14297423872,
    "usageBytes": 1978798080,
    "workingSetBytes": 734490624,
    "rssBytes": 0,
    "pageFaults": 0,
    "majorPageFaults": 0
   },
   "fs": {
    "time": "2018-09-10T01:44:52Z",
    "availableBytes": 15829303296,
    "capacityBytes": 32212250624,
    "usedBytes": 16382947328
   },
   "runtime": {
    "imageFs": {
     "time": "2018-09-10T01:44:53Z",
     "availableBytes": 15829303296,
     "capacityBytes": 32212250624,
     "usedBytes": 16382947328,
     "inodesUsed": 0
    }
   }
  },
  "pods": [
   {
    "podRef": {
     "name": "stdlogserverwin-5fbcc5648d-ztqsq",
     "namespace": "default",
     "uid": "f461a0b4-ab36-11e8-93c4-0017fa0362de"
    },
    "startTime": "2018-08-29T02:55:15Z",
    "containers": [
     {
      "name": "stdlogserverwin",
      "startTime": "2018-08-29T02:56:24Z",
      "cpu": {
       "time": "2018-09-10T01:44:54Z",
       "usageCoreNanoSeconds": 749578125000
      },
      "memory": {
       "time": "2018-09-10T01:44:54Z",
       "workingSetBytes": 83255296
      },
      "rootfs": {
       "time": "2018-09-10T01:44:54Z",
       "availableBytes": 15829303296,
       "capacityBytes": 32212250624,
       "usedBytes": 0
      },
      "logs": {
       "time": "2018-09-10T01:44:53Z",
       "availableBytes": 15829303296,
       "capacityBytes": 32212250624,
       "usedBytes": 16382947328,
       "inodesUsed": 0
      },
      "userDefinedMetrics": null
     }
    ],
    "cpu": {
     "time": "2018-08-29T02:56:24Z",
     "usageNanoCores": 0,
     "usageCoreNanoSeconds": 749578125000
    },
    "memory": {
     "time": "2018-09-10T01:44:54Z",
     "availableBytes": 0,
     "usageBytes": 0,
     "workingSetBytes": 83255296,
     "rssBytes": 0,
     "pageFaults": 0,
     "majorPageFaults": 0
    },
    "volume": [
     {
      "time": "2018-08-29T02:55:16Z",
      "availableBytes": 17378648064,
      "capacityBytes": 32212250624,
      "usedBytes": 14833602560,
      "inodesFree": 0,
      "inodes": 0,
      "inodesUsed": 0,
      "name": "default-token-wv5fc"
     }
    ],
    "ephemeral-storage": {
     "time": "2018-09-10T01:44:54Z",
     "availableBytes": 15829303296,
     "capacityBytes": 32212250624,
     "usedBytes": 16382947328
    }
   }
  ]
}

從上面可以看到，它包含了本機和 pod 的一些 metrics，相對 cadvisor 能提供的少了一些，但是基本監控是沒問題的。接下來我們需要寫一個小程式把資料轉換成 Prometheus 能解析的資料。接下來用 python 寫個小栗子, 先宣告下我們要 expose 的 stats 物件

class Node:
    def __init__(self, name, cpu, memory):
        self.name = name
        self.cpu = cpu
        self.memory = memory

class Pod:
    def __init__(self, name, namespace,cpu, memory):
        self.name = name
        self.namespace = namespace
        self.cpu = cpu
        self.memory = memory

class Stats:
    def __init__(self, node, pods):
        self.node = node
        self.pods = pods

使用 Prometheus 的 python-client 來寫一個 polling 的程式，去轉換 kubelet stats 資料。

from urllib.request import urlopen
from stats import Node
from stats import Pod
from stats import Stats
import json
import asyncio
import prometheus_client as prom
import logging
import random

def getMetrics(url):
    #獲取資料集
    response = urlopen(url)
    string = response.read().decode('utf-8')
    json_obj = json.loads(string)
    #用之前定義好的 stats 的物件來做 mapping
    node = Node('','','')
    node.name = json_obj['node']['nodeName']
    node.cpu = json_obj['node']['cpu']['usageCoreNanoSeconds']
    node.memory = json_obj['node']['memory']['usageBytes']

    pods_array = json_obj['pods']

    pods_list = []

    for item in pods_array:
        pod = Pod('','','','')
        pod.name = item['podRef']['name']
        pod.namespace = item['podRef']['namespace']
        pod.cpu = item['cpu']['usageCoreNanoSeconds']
        pod.memory = item['memory']['workingSetBytes']
        pods_list.append(pod)

    stats = Stats('','')
    stats.node = node
    stats.pods = pods_list
    return stats

#寫個簡單的日誌輸出格式
format = "%(asctime)s - %(levelname)s [%(name)s] %(threadName)s %(message)s"
logging.basicConfig(level=logging.INFO, format=format)
#宣告我們需要匯出的 metrics 及對應的  label 供未來查詢使用
g1 = prom.Gauge('node_cpu_usageCoreNanoSeconds', 'CPU useage of the node', labelnames=['node_name'])
g2 = prom.Gauge('node_mem_usageBytes', 'Memory useage of the node', labelnames=['node_name'])
g3 = prom.Gauge('pod_cpu_usageCoreNanoSeconds', 'Memory useage of the node', labelnames=['pod_name','pod_namespace'])
g4 = prom.Gauge('pod_mem_usageBytes', 'Memory useage of the node', labelnames=['pod_name','pod_namespace'])

async def expose_stats(url):
    while True:
        stats = getMetrics(url)
        #以列印 node 本身的監控資訊為例
        logging.info("nodename: {} value {}".format(stats.node.name, stats.node.cpu))
        # 為當前要 poll 的 metrics 賦值
        g1.labels(node_name=stats.node.name).set(stats.node.cpu)
        g2.labels(node_name=stats.node.name).set(stats.node.memory)
        pods_array = stats.pods
        for item in pods_array:
            g3.labels(pod_name=item.name,pod_namespace=item.namespace).set(item.memory)
            g4.labels(pod_name=item.name,pod_namespace=item.namespace).set(item.cpu)
        await asyncio.sleep(1)
if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    # 啟動一個 http server 來做 polling
    prom.start_http_server(8000)
    t0_value = 50
    #可以在每一臺 Windows 機器上都啟動一個這樣的程式，也可以遠端部署指令碼來做 exposing
    url = 'http://localhost:10255/stats/summary'
    tasks = [loop.create_task(expose_stats(url))]
    try:
        loop.run_forever()
    except KeyboardInterrupt:
        pass
    finally:
        loop.close()

寫完以後就可以啟動這個程式了，訪問他的 8000 埠就能看到相關的資料

接下來需要在 prometheus 里加入配置，增加一個收集物件，如下例：

- job_name: python_app
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /
  scheme: http
  static_configs:
  - targets:
    - localhost:8000

這樣在 Prometheus 的頁面上能查詢到相關的資訊了

提問?

kubelet 返回的usageNanoCores 和 usageCoreNanoSeconds 怎麼換算成我們通常理解的 CPU 使用百分比

Ref:

解決 Prometheus 不能獲取 Kubernetes 叢集上 Windows 節點的 Metrics

背景

前提

步驟

提問?

解決 Prometheus 不能獲取 Kubernetes 叢集上 Windows 節點的 Metrics

kubernetes 叢集新增node 節點並將應用分配到新增節點

Kubernetes叢集的主節點備份與恢復

如何使用Rancher 2.0在Kubernetes叢集上部署Istio

在Kubernetes叢集上部署和管理JFrog Artifactory

在Kubernetes叢集上部署高可用Harbor映象倉庫

安裝istio的Bookinfo，在kubeadm安裝的kubernetes單節點叢集上

prometheus 結合 kubernetes時，提示User cannot list services at the cluster scope.如何解決？

微信小程式檢視層_獲取介面上的節點資訊

部署 Prometheus Operator 監控 Kubernetes 叢集

使用kubeadm安裝單節點kubernetes叢集，在vmware虛擬機器centos7

【解決】獲取到ckeditor富文字編輯器body節點下的內容

使用Gardener在Google Cloud Platform上建立Kubernetes叢集

Windows 的java客戶端實現上傳檔案到Linux的Hadoop叢集上(注意ip和埠是否一致)

使用kubectl檢視Kubernetes叢集裡的node節點資訊

CDH5.16.1叢集增加新節點 Ubuntu 16.04上搭建CDH5.16.1叢集 Ubuntu 16.04上搭建CDH5.16.1叢集

Windows平臺使用DirectShow獲取UVC攝像頭上按鍵後的抓拍圖

使用amd64架構master節點管理arm架構kubernetes叢集

【解決】自己編寫Wordcount程式碼上傳叢集上執行時報錯：Exception in thread "main" java.lang.ClassNotFoundException: WordCount

如何在多Kubernetes叢集和多租戶環境中使用Prometheus監控

解決 Prometheus 不能獲取 Kubernetes 叢集上 Windows 節點的 Metrics

背景

前提

步驟

提問?

相關推薦