1. 程式人生 > >.Net Core服務監控報警指標上報Prometheus+Grafana

.Net Core服務監控報警指標上報Prometheus+Grafana

## **前言** 簡單整合Prometheus+Grafana,指標的上報收集視覺化。 ## **Prometheus** `Prometheus`是一個監控平臺,監控從HTTP埠收集受監控目標的指標。在微服務的架構裡`Prometheus`多維度的資料收集是非常強大的 我們首先下載安裝`Prometheus`和`node_exporter`,`node_exporter`用於監控CPU、記憶體、磁碟、I/O等資訊 * [ Prometheus下載地址 ](https://prometheus.io/download/) * [node_exporter下載地址](https://github.com/prometheus/node_exporter/releases/) 下載完成後解壓以管理員執行 `prometheus.exe` 訪問` http://localhost:9090/` 出現一下頁面說明啟動成功啦 ## **.Net Core獲取指標** 有了`Prometheus`,我們還需要給`Prometheus`提供獲取監控資料的介面,我們新建一個WebApi專案,並匯入`prometheus-net.AspNetCore`包,在`Configure`中加入`UseMetricServer`中介軟體 ``` csharp public void Configure(IApplicationBuilder app, IWebHostEnvironment env) { app.UseMetricServer(); } ``` 啟動專案訪問`http://localhost:5000/metrics`就可以看基本的一些監控資訊啦,包括執行緒數,控制代碼數,3個GC的回收計數等資訊。 ``` xml # HELP process_num_threads Total number of threads # TYPE process_num_threads gauge process_num_threads 29 # HELP process_working_set_bytes Process working set # TYPE process_working_set_bytes gauge process_working_set_bytes 44441600 # HELP process_private_memory_bytes Process private memory size # TYPE process_private_memory_bytes gauge process_private_memory_bytes 69660672 # HELP dotnet_total_memory_bytes Total known allocated memory # TYPE dotnet_total_memory_bytes gauge dotnet_total_memory_bytes 2464584 # HELP dotnet_collection_count_total GC collection count # TYPE dotnet_collection_count_total counter dotnet_collection_count_total{generation="1"} 0 dotnet_collection_count_total{generation="0"} 0 dotnet_collection_count_total{generation="2"} 0 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1592448124.2853072 # HELP process_open_handles Number of open handles # TYPE process_open_handles gauge process_open_handles 413 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 2225187631104 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 1.171875 ``` `Help` 是收集指標的說明,`Type`收集指標的型別 但是作為HTTP應用怎麼能沒有HTTP的監控和計數呢,只需要加加入`UseHttpMetrics`中介軟體就可以對HTTP請求監控和計數,主要注意的是`UseHttpMetrics`最好放在`UseEndpoints`和`UseRouting`中間 ```csharp public void Configure(IApplicationBuilder app, IWebHostEnvironment env) { app.UseMetricServer(); app.UseRouting(); app.UseHttpMetrics(); app.UseEndpoints(endpoints =>
{ endpoints.MapControllers(); }); } ``` 啟動專案繼續訪問`http://localhost:5000/metrics` ``` xml # HELP http_requests_in_progress The number of requests currently in progress in the ASP.NET Core pipeline. One series without controller/action label values counts all in-progress requests, with separate series existing for each controller-action pair. # TYPE http_requests_in_progress gauge ``` 可以看到已經有了,我們隨便請求一下服務看看效果,會幫我們記錄下總耗時,總請求數,和每次請求的耗時數
但是單單有上面那些資料好像還不太好定位一下很奇葩的問題,這時候我們可以獲取`Runtime`的一些資料,方法童謠很簡單。匯入`prometheus-net.DotNetRuntime` 包,它可以幫助我們看到如下指標 * 垃圾回收的收集頻率和時間 * 服務佔用堆大小 * 物件堆分配的位元組 * JIT編譯和JIT CPU消耗率 * 執行緒池大小,排程延遲以及增長/縮小的原因 * 鎖爭用情況 我們只需要在`Program`的`Main`方法中啟動收集器就可以啦。 ``` csharp public static void Main(string[] args) { DotNetRuntimeStatsBuilder.Default().StartCollecting(); CreateHostBuilder(args).Build().Run(); } ``` 啟動專案繼續訪問`http://localhost:5000/metrics`測試一下 ``` xml # HELP dotnet_collection_count_total GC collection count # TYPE dotnet_collection_count_total counter dotnet_collection_count_total{generation="1"} 0 dotnet_collection_count_total{generation="0"} 0 dotnet_collection_count_total{generation="2"} 0 # HELP process_private_memory_bytes Process private memory size # TYPE process_private_memory_bytes gauge process_private_memory_bytes 75141120 # HELP dotnet_gc_pause_ratio The percentage of time the process spent paused for garbage collection # TYPE dotnet_gc_pause_ratio gauge dotnet_gc_pause_ratio 0 # HELP http_requests_received_total Provides the count of HTTP requests that have been processed by the ASP.NET Core pipeline. # TYPE http_requests_received_total counter # HELP dotnet_gc_collection_seconds The amount of time spent running garbage collections # TYPE dotnet_gc_collection_seconds histogram dotnet_gc_collection_seconds_sum 0 dotnet_gc_collection_seconds_count 0 dotnet_gc_collection_seconds_bucket{le="0.001"} 0 dotnet_gc_collection_seconds_bucket{le="0.01"} 0 dotnet_gc_collection_seconds_bucket{le="0.05"} 0 dotnet_gc_collection_seconds_bucket{le="0.1"} 0 dotnet_gc_collection_seconds_bucket{le="0.5"} 0 dotnet_gc_collection_seconds_bucket{le="1"} 0 dotnet_gc_collection_seconds_bucket{le="10"} 0 dotnet_gc_collection_seconds_bucket{le="+Inf"} 0 # HELP dotnet_total_memory_bytes Total known allocated memory # TYPE dotnet_total_memory_bytes gauge dotnet_total_memory_bytes 4925936 # HELP dotnet_threadpool_num_threads The number of active threads in the thread pool # TYPE dotnet_threadpool_num_threads gauge dotnet_threadpool_num_threads 0 # HELP dotnet_threadpool_scheduling_delay_seconds A breakdown of the latency experienced between an item being scheduled for execution on the thread pool and it starting execution. # TYPE dotnet_threadpool_scheduling_delay_seconds histogram dotnet_threadpool_scheduling_delay_seconds_sum 0.015556 dotnet_threadpool_scheduling_delay_seconds_count 10 dotnet_threadpool_scheduling_delay_seconds_bucket{le="0.001"} 0 dotnet_threadpool_scheduling_delay_seconds_bucket{le="0.01"} 10 dotnet_threadpool_scheduling_delay_seconds_bucket{le="0.05"} 10 dotnet_threadpool_scheduling_delay_seconds_bucket{le="0.1"} 10 dotnet_threadpool_scheduling_delay_seconds_bucket{le="0.5"} 10 dotnet_threadpool_scheduling_delay_seconds_bucket{le="1"} 10 dotnet_threadpool_scheduling_delay_seconds_bucket{le="10"} 10 dotnet_threadpool_scheduling_delay_seconds_bucket{le="+Inf"} 10 # HELP process_working_set_bytes Process working set # TYPE process_working_set_bytes gauge process_working_set_bytes 50892800 # HELP process_num_threads Total number of threads # TYPE process_num_threads gauge process_num_threads 32 # HELP dotnet_jit_method_seconds_total Total number of seconds spent in the JIT compiler # TYPE dotnet_jit_method_seconds_total counter dotnet_jit_method_seconds_total 0 dotnet_jit_method_seconds_total{dynamic="false"} 0.44558800000000004 dotnet_jit_method_seconds_total{dynamic="true"} 0.004122000000000001 # HELP dotnet_gc_pinned_objects The number of pinned objects # TYPE dotnet_gc_pinned_objects gauge dotnet_gc_pinned_objects 0 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1592449942.6063592 # HELP dotnet_gc_heap_size_bytes The current size of all heaps (only updated after a garbage collection) # TYPE dotnet_gc_heap_size_bytes gauge # HELP http_request_duration_seconds The duration of HTTP requests processed by an ASP.NET Core application. # TYPE http_request_duration_seconds histogram # HELP dotnet_contention_seconds_total The total amount of time spent contending locks # TYPE dotnet_contention_seconds_total counter dotnet_contention_seconds_total 0 # HELP dotnet_gc_pause_seconds The amount of time execution was paused for garbage collection # TYPE dotnet_gc_pause_seconds histogram dotnet_gc_pause_seconds_sum 0 dotnet_gc_pause_seconds_count 0 dotnet_gc_pause_seconds_bucket{le="0.001"} 0 dotnet_gc_pause_seconds_bucket{le="0.01"} 0 dotnet_gc_pause_seconds_bucket{le="0.05"} 0 dotnet_gc_pause_seconds_bucket{le="0.1"} 0 dotnet_gc_pause_seconds_bucket{le="0.5"} 0 dotnet_gc_pause_seconds_bucket{le="1"} 0 dotnet_gc_pause_seconds_bucket{le="10"} 0 dotnet_gc_pause_seconds_bucket{le="+Inf"} 0 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 2225201872896 # HELP dotnet_gc_finalization_queue_length The number of objects waiting to be finalized # TYPE dotnet_gc_finalization_queue_length gauge dotnet_gc_finalization_queue_length 0 # HELP dotnet_threadpool_io_num_threads The number of active threads in the IO thread pool # TYPE dotnet_threadpool_io_num_threads gauge dotnet_threadpool_io_num_threads 3 # HELP process_open_handles Number of open handles # TYPE process_open_handles gauge process_open_handles 436 # HELP dotnet_gc_collection_reasons_total A tally of all the reasons that lead to garbage collections being run # TYPE dotnet_gc_collection_reasons_total counter # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 0.890625 # HELP http_requests_in_progress The number of requests currently in progress in the ASP.NET Core pipeline. One series without controller/action label values counts all in-progress requests, with separate series existing for each controller-action pair. # TYPE http_requests_in_progress gauge # HELP dotnet_threadpool_adjustments_total The total number of changes made to the size of the thread pool, labeled by the reason for change # TYPE dotnet_threadpool_adjustments_total counter # HELP dotnet_jit_cpu_ratio The amount of total CPU time consumed spent JIT'ing # TYPE dotnet_jit_cpu_ratio gauge dotnet_jit_cpu_ratio 0.5728901224489797 # HELP process_cpu_count The number of processor cores available to this process. # TYPE process_cpu_count gauge process_cpu_count 8 # HELP dotnet_build_info Build information about prometheus-net.DotNetRuntime and the environment # TYPE dotnet_build_info gauge dotnet_build_info{version="3.3.1.0",target_framework=".NETCoreApp,Version=v5.0",runtime_version=".NET Core 5.0.0-preview.2.20160.6",os_version="Microsoft Windows 10.0.18363",process_architecture="X64"} 1 # HELP dotnet_jit_method_total Total number of methods compiled by the JIT compiler # TYPE dotnet_jit_method_total counter dotnet_jit_method_total{dynamic="false"} 830 dotnet_jit_method_total{dynamic="true"} 30 # HELP dotnet_gc_cpu_ratio The percentage of process CPU time spent running garbage collections # TYPE dotnet_gc_cpu_ratio gauge dotnet_gc_cpu_ratio 0 # HELP dotnet_threadpool_scheduled_total The total number of items the thread pool has been instructed to execute # TYPE dotnet_threadpool_scheduled_total counter dotnet_threadpool_scheduled_total 16 # HELP dotnet_gc_allocated_bytes_total The total number of bytes allocated on the small and large object heaps (updated every 100KB of allocations) # TYPE dotnet_gc_allocated_bytes_total counter dotnet_gc_allocated_bytes_total{gc_heap="soh"} 3008088 dotnet_gc_allocated_bytes_total{gc_heap="loh"} 805392 # HELP dotnet_contention_total The number of locks contended # TYPE dotnet_contention_total counter dotnet_contention_total 0 ``` 可以看到非常多的資訊啦,但是我們有時候不需要這麼多指標也可以自定義。 ``` csharp public static void Main(string[] args) { DotNetRuntimeStatsBuilder .Customize() .WithContentionStats() .WithJitStats() .WithThreadPoolSchedulingStats() .WithThreadPoolStats() .WithGcStats() .StartCollecting(); CreateHostBuilder(args).Build().Run(); } ``` JIT,GC和執行緒的監控是會影響到一點點效能,我們可以通過`sampleRate`這個列舉的值來控制取樣頻率 ```csharp public static void Main(string[] args) { DotNetRuntimeStatsBuilder .Customize() //每5個事件個採集一個 .WithContentionStats(sampleRate: SampleEvery.FiveEvents) //每10事件採集一個 .WithJitStats(sampleRate: SampleEvery.TenEvents) //每100事件採集一個 .WithThreadPoolSchedulingStats(sampleRate: SampleEvery.HundredEvents) .WithThreadPoolStats() .WithGcStats() .StartCollecting(); CreateHostBuilder(args).Build().Run(); } ``` 有了這些指標我們需要`Prometheus`來收集我們Api的指標,只需要修改`prometheus.yml`檔案然後重啟`Prometheus`就可以了。 ``` Xml scrape_configs: - job_name: mydemo scrape_interval: 15s scrape_timeout: 10s metrics_path: /metrics scheme: http static_configs: - targets: - localhost:5000 ``` 啟動Api專案和`Prometheus`,選中`dotnet_collection_count_total`點選`Excute`可以看到Api的指標是正常上報的。
Prometheus有了資料我們就需要一個炫酷的UI去展示上報的資料啦。 ## **Grafana** Prometheus有了資料就差一個漂亮的UI來展示的我們的指標了。Grafana是一個Go編寫的開源應用,用於把指標資料視覺化。是當下流行的時序資料展示工具。先下載,直接下載exe安裝,完成後能開啟`http://localhost:3000/`頁面就安裝成功了 * [ 下載地址 ](https://grafana.com/grafana/download?platform=windows) 先新增資料來源,選擇`Prometheus`為資料來源,並配置。 新增儀表盤 在`Import via panel json`中加入下面這個json,點選load, * [ 儀表盤json ](https://github.com/djluck/prometheus-net.DotNetRuntime/blob/master/examples/NET_runtime_metrics_dashboard.json) 選擇資料來源,點選`Import`就能看到儀表盤了 還可以去[這裡](https://grafana.com/grafana/dashboards?orderBy=name&direction=asc)新增很多現有的儀表盤。複製ID新增儀表盤。 ## **參考文章** [prometheus-net](https://github.com/prometheus-net/prometheus-net) [.NetCore下使用Prometheus實現系統監控和警報系列](https://www.cnblogs.com/liyouming/p/992889