1. 程式人生 > >利用prometheus監控K8S

利用prometheus監控K8S

分布式文件 分布式文件系 現在 created hub vision 應該 ava ets

prometheus它是一個主動拉取的數據庫,在K8S中應該展示圖形的grafana數據實例化要保存下來,使用分布式文件系統加動態PV,但是在本測試環境中使用本地磁盤,安裝采集數據的agent使用DaemonSet來部署,DaemonSet的特性就是在每個node上部署一個服務進程,這一切都是自動的部署。

此處只講如何用prometheus來監控K8S集群,關於prometheus的知識參考官方文檔。

部署前提: 準備好所需要的文件

$ ls -l 
Prometheus/prometheus#:/data/Prometheus/prometheus# ls -l 
total 28
drwxr-xr-x 2 root root 4096 Jan 15 02:53 grafana
drwxr-xr-x 2 root root 4096 Jan 15 03:11 kube-state-metrics
-rw-r--r-- 1 root root   60 Jan 14 06:48 namespace.yaml
drwxr-xr-x 2 root root 4096 Jan 15 03:22 node-directory-size-metrics
drwxr-xr-x 2 root root 4096 Jan 15 03:02 node-exporter
drwxr-xr-x 2 root root 4096 Jan 15 02:55 prometheus
drwxr-xr-x 2 root root 4096 Jan 15 02:37 rbac

$ ls grafana/
grafana-configmap.yaml  grafana-core-deployment.yaml  grafana-import-dashboards-job.yaml  grafana-pvc-claim.yaml  grafana-pvc-volume.yaml  grafana-service.yaml

$ ls prometheus/
configmap.yaml  deployment.yaml  prometheus-rules.yaml  service.yaml

grafana和 prometheus 都是部署文件,node-exporter、kube-state-metrics、node-directory-size-metrics這三個是采集器,相當於prometheus的agent

文件準備好了,現在開始一步一步來部署:

1,,創建所需Namespace

因為prometheus 部署的所有的deploy、pod、svc都是在monitoring完成的,所以需要事先創建之。

 $ cat namespace.yaml 
 apiVersion: v1
 kind: Namespace
 metadata:
  name: monitoring
  
 $ kubectl create -f namespace.yaml 
 namespace "monitoring" created

2,創建grafana的pv、 pvc

grafana# cat grafana-pvc-volume.yaml 
kind: PersistentVolume
apiVersion: v1
metadata:
  name: grafana-pv-volume
  labels:
    type: local
spec:
  storageClassName: grafana-pv-volume
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Recycle
  hostPath:
    path: "/data/volume/grafana"
    
grafana# cat grafana-pvc-claim.yaml 
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: grafana-pvc-volume
  namespace: "monitoring"
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: grafana-pv-volume
  
$ kubectl create -f grafana/grafana-pvc-volume.yaml -f grafana/grafana-pvc-claim.yaml 
persistentvolume "grafana-pv-volume" created
persistentvolumeclaim "grafana-pvc-volume" created

$ kubectl get pvc -n monitoring
NAME          STATUS           VOLUME       CAPACITY   ACCESS MODES   STORAGECLASS     AGE
grafana-pvc-volume   Bound     grafana-pv-volume   10Gi       RWO     grafana-pv-volume   52s
 
狀態bound已綁定到了 grafana-pv-volume

3,創建grafana應用,這些應用都是第三方的,都會有自已的配置,通過configmap來定義

grafana# ls
grafana-configmap.yaml  grafana-core-deployment.yaml  grafana-import-dashboards-job.yaml  grafana-pvc-claim.yaml  grafana-pvc-volume.yaml  grafana-service.yaml
grafana# kubectl create -f ./    #grafana目錄下所有文件都創建
configmap "grafana-import-dashboards" created
deployment "grafana-core" created
job "grafana-import-dashboards" created
service "grafana" created 


grafana# kubectl get deployment,pod -n monitoring 
NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/grafana-core   1         1         1            0           1m

NAME                              READY     STATUS              RESTARTS   AGE
po/grafana-core-9c7f66868-7q8lx   0/1       ContainerCreating   0          1m
運行po/grafana-core 容器時會下載鏡像: grafana/grafana:4.2.0

grafana創建的應用 簡單的自已描述了下:

      grafana-pv-volume=/data/volume/grafana =10G    
      grafana-pvc-volume=5G--->grafana-pv-volume
      ---configmap=grafana-import-dashboards     
      Job=grafana-import-dashboards
                  
      Deployment=grafana-core     replicas: 1  containers=grafana-core   mount:  grafana-pvc-volume:/var
      service=grafana     port: 3000  = nodePort: 30161     (3000是grafana服務的默認端口)

4, 現在grafana的核心應用已部署好了,現在來部署prometheus的RBAC

prometheus/rbac# ls
grant_serviceAccount.sh  prometheus_rbac.yaml
#先創建RBAC文件:
prometheus/rbac# kubectl create -f prometheus_rbac.yaml 
clusterrolebinding "prometheus-k8s" created
clusterrolebinding "kube-state-metrics" created
clusterrole "kube-state-metrics" created
serviceaccount "kube-state-metrics" created
clusterrolebinding "prometheus" created
clusterrole "prometheus" created
serviceaccount "prometheus-k8s" created
prometheus/rbac#

5,創建prometheus的deloyment,service

prometheus/prometheus# ls
configmap.yaml  deployment.yaml  prometheus-rules.yaml  service.yaml
prometheus/prometheus# 
在configmap.yaml中要註意的是在1.7以後,獲取cadvsion監控pod等的信息時,用的是kubelet的4194端口,
註意以下這段:這是采集cadvision信息,必須是通過kubelet的4194端口,所以Kubelet必須監聽著,4194部署了cadvsion來獲取pod中容器信息
prometheus/prometheus#cat configmap.yaml
 # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L37
      - job_name: 'kubernetes-nodes'
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:10255'
            target_label: __address__
      - job_name: 'kubernetes-cadvisor'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc.cluster.local:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}:4194/proxy/metrics

      # https://github.com/prometheus/prometheus/blob/master/documentation/examples/prometheus-kubernetes.yml#L79

prometheus-rules.yaml 這是它的發現規則文件

deployment.yaml service.yaml 這兩個是部署的文件, deployment部署中資源限制建議放大一點

現在部署prometheus目錄下所有文件:

prometheus/prometheus# kubectl create -f ./
configmap "prometheus-core" created
deployment "prometheus-core" created
configmap "prometheus-rules" created
service "prometheus" created
prometheus/prometheus# 

prometheus/prometheus# kubectl get deployment,pod -n monitoring 
NAME                     DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/grafana-core      1         1         1            1           16m
deploy/prometheus-core   1         1         1            1           1m

NAME                                  READY     STATUS    RESTARTS   AGE
po/grafana-core-9c7f66868-wm68j       1/1       Running   0          16m
po/prometheus-core-6dc6777c5b-5nc7j   1/1       Running   0          1m

prometheus應用的部署,簡單描述下創建的內容:

    Deployment= prometheus-core   replicas: 1    containers=prometheus   image: prom/prometheus:v1.7.0    containerPort: 9090(webui)
    Service    name: prometheus   NodePort-->port: 9090 -webui

6,prometheus部署完了現在來部署它的agent,也就是采集器:

Prometheus/prometheus# ls node-directory-size-metrics/
daemonset.yaml
Prometheus/prometheus# ls kube-state-metrics/
deployment.yaml  service.yaml
Prometheus/prometheus# ls node-exporter/
exporter-daemonset.yaml  exporter-service.yaml
Prometheus/prometheus# 
#其中兩個用的是daemonset

Prometheus/prometheus# kubectl create -f node-exporter/ -f kube-state-metrics/ -f node-directory-size-metrics/
daemonset "prometheus-node-exporter" created
service "prometheus-node-exporter" created
deployment "kube-state-metrics" created
service "kube-state-metrics" created
daemonset "node-directory-size-metrics" created
Prometheus/prometheus# 

Prometheus/prometheus# kubectl get deploy,pod,svc -n monitoring 
NAME                        DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/grafana-core         1         1         1            1           26m
deploy/kube-state-metrics   2         2         2            2           1m
deploy/prometheus-core      1         1         1            1           11m

NAME                                     READY     STATUS    RESTARTS   AGE
po/grafana-core-9c7f66868-wm68j          1/1       Running   0          26m
po/kube-state-metrics-694fdcf55f-bqcp8   1/1       Running   0          1m
po/kube-state-metrics-694fdcf55f-nnqqd   1/1       Running   0          1m
po/node-directory-size-metrics-n9wx7     2/2       Running   0          1m
po/node-directory-size-metrics-ppscw     2/2       Running   0          1m
po/prometheus-core-6dc6777c5b-5nc7j      1/1       Running   0          11m
po/prometheus-node-exporter-kchmb        1/1       Running   0          1m
po/prometheus-node-exporter-lks5m        1/1       Running   0          1m

NAME                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
svc/grafana                    NodePort    10.254.231.25   <none>        3000:30161/TCP   26m
svc/kube-state-metrics         ClusterIP   10.254.156.51   <none>        8080/TCP         1m
svc/prometheus                 NodePort    10.254.239.90   <none>        9090:37318/TCP   10m
svc/prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         1m
Prometheus/prometheus#

--------
Prometheus/prometheus# kubectl get pod -o wide -n monitoring 
NAME                                  READY     STATUS    RESTARTS   AGE       IP             NODE
prometheus-node-exporter-kchmb        1/1       Running   0          4m        10.3.1.16      10.3.1.16
prometheus-node-exporter-lks5m        1/1       Running   0          4m        10.3.1.17      10.3.1.17

#這兩個是exporter,用的是daemonset 分別在這兩個node上運行了。這樣就可以采集到所有數據了。

如上部署完成,以下是用自已的話簡單描述下:

 node-exporter/exporter-daemonset.yaml 文件:
       DaemonSet=prometheus-node-exporter   
          containers: name: prometheus-node-exporter    image: prom/node-exporter:v0.14.0
          containerPort: 9100   hostPort: 9100  hostNetwork: true    #它用的是主機的9100端口
      
		Prometheus/prometheus/node-exporter# kubectl get  daemonset,pod -n monitoring 
		NAME                             DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
		ds/node-directory-size-metrics   2         2         2         2            2           <none>          16h
		ds/prometheus-node-exporter      2         2         2         2            2           <none>          16h
           因為它是daemonset,所以相應的也會運行著兩個Pod: prometheus-node-exporter

      Service=prometheus-node-exporter   clusterIP: None   port: 9100  type: ClusterIP   #它沒有clusterIP
                  
	# kubectl get  service -n monitoring 
	NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
	prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         16h
kube-state-metrics/deployment.yaml 文件:
      Deployment=kube-state-metrics replicas: 2   containers-->name: kube-state-metrics  image: gcr.io/google_containers/kube-state-metrics:v0.5.0 
                 containerPort: 8080
       
      Service     name: kube-state-metrics   port: 8080  #沒有映射
                                 #kubectl get deployment,pod,svc -n monitoring                               
			NAME                        DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
			deploy/kube-state-metrics   2         2         2            2           16h
			
			NAME                                     READY     STATUS    RESTARTS   AGE
			po/kube-state-metrics-694fdcf55f-2mmd5   1/1       Running   0          11h
			po/kube-state-metrics-694fdcf55f-bqcp8   1/1       Running   0          16h
			
			NAME                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
			svc/kube-state-metrics         ClusterIP   10.254.156.51   <none>        8080/TCP         16h
node-directory-size-metrics/daemonset.yaml 文件:
        #因為是daemonset,所以未定義replicas數量,直接運行在每個node之上,但是它沒有創建service
      DaemonSet : name: node-directory-size-metrics  
                  containers-->name: read-du  image: giantswarm/tiny-tools   mountPath: /mnt/var   mountPath: /tmp
                  containers--> name: caddy    image: dockermuenster/caddy:0.9.3 containerPort: 9102
                               mountPath: /var/www   hostPath /var
                            
		kubectl get daemonset,pod,svc -n monitoring 
		NAME                             DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
		ds/node-directory-size-metrics   2         2         2         2            2           <none>          16h

		
		NAME                                     READY     STATUS    RESTARTS   AGE
		po/node-directory-size-metrics-n9wx7     2/2       Running   0          16h
		po/node-directory-size-metrics-ppscw     2/2       Running   0          16h
		
		NAME                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
                     沒有node-directory-size-metrics的service


到此 prometheus算是部署完成了,最後來看下它暴露的端口:

Prometheus/prometheus# kubectl get svc -o wide -n monitoring 
NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE       SELECTOR
grafana                    NodePort    10.254.231.25   <none>        3000:30161/TCP   31m       app=grafana,component=core
kube-state-metrics         ClusterIP   10.254.156.51   <none>        8080/TCP         6m        app=kube-state-metrics
prometheus                 NodePort    10.254.239.90   <none>        9090:37318/TCP   16m       app=prometheus,component=core
prometheus-node-exporter   ClusterIP   None            <none>        9100/TCP         6m        app=prometheus,component=node-exporter
Prometheus/prometheus#

7,訪問、使用prometheus

如上可以看到grafana的端口號是30161,NodeIP:30161 就可以打開grafana,默認admin/admin

技術分享圖片

登錄後,添加數據源:

技術分享圖片

添加Prometheus的數據源:

將Prometheus的作為數據源的相關參數如下圖所示:

技術分享圖片

添加完後,導入模板文件:

技術分享圖片

技術分享圖片

技術分享圖片

部署完成。

利用prometheus監控K8S