1. 程式人生 > >Eureka Server prometheus監控服務健康狀態

Eureka Server prometheus監控服務健康狀態

背景

  服務程序監控一般都有相關元件處理了,早期業務出現特定服務使用的DB資源超過額配量,導致健康檢測失敗,服務陸續從Eureka下線了,業務監控在沒路由到特定節點時候,或者路由到特定節點但沒有碰到閾值場景不會觸發告警,意味著業務短暫性正常,服務陸續下線;Eureka server 作為註冊中心可以較早感知到服務註冊狀態,例項節點掛了(註冊上的例項少了)、節點狀態非UP 場景

監控方案

  • Eureka定時採集註冊資訊,例項節點數、例項節點狀態資訊
  • prometheus 定時採集Eureka server 採集到的資料
  • grafana 查詢及對資料告警

Eureka註冊資訊資料採集

metric 資料結構定義

  • 統計節點狀態

    type:Gauge

eureka_instance_status{client="{client}",status="{status}"}

client: eureka client application name

status 列舉

狀態列舉值
UP1
DOWN5
STARTING2
OUT_OF_SERVICE3
UNKNOW4

最近n時間內平均值大於1,表示異常,執行告警

  • 統計節點數量

    type:Gauge

eureka_instance_count{client="{client}",count="{count}"}

client: eureka client application name

count: client count

java pom 依賴

<!-- boot2.x 相容-->
<!-- The client -->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient</artifactId>
    <version>0.6.0</version>
</dependency>
<!-- Hotspot JVM metrics-->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_hotspot</artifactId>
    <version>0.6.0</version>
</dependency>
<!-- Exposition HTTPServer-->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_httpserver</artifactId>
    <version>0.6.0</version>
</dependency>
<!-- Pushgateway exposition-->
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>simpleclient_pushgateway</artifactId>
    <version>0.6.0</version>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
    <version>1.1.4</version>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-core</artifactId>
    <version>1.1.4</version>
</dependency>

java程式碼

@Component
public class InstanceStateCollector {

    @Autowired
    PeerAwareInstanceRegistry registry;

    private static final Logger log = LoggerFactory.getLogger(InstanceStateCollector.class);

    @Scheduled(cron = "*/5 * * * * ?")
    public void collect() {

        Applications applications = registry.getApplications();

        applications.getRegisteredApplications().forEach((registeredApplication) -> {
            Integer count = registeredApplication.size();
            String client = registeredApplication.getName();

            log.debug("client :{}, count :{}", client, count);
            PrometheusMetricsUtils.metricInstanceCount(client, count);

            registeredApplication.getInstances().forEach((instance) -> {
                String instanceId = instance.getInstanceId();
                log.debug("client :{}, instance :{}, status :{}", client, instanceId, instance.getStatus());
                PrometheusMetricsUtils.metricInstanceStatus(client, instanceId, instance.getStatus());

            });
        });
    }
}
@Service
public class PrometheusMetricsService {

    /**
     * 例項狀態統計
     * eureka_instance_status{client="{client}",status="{status}"}
     */
    private static final String EUREKA_INSTANCE_STATUS = "mall_eureka_instance_status";

    /**
     * 例項數量統計
     * eureka_instance_count{client="{client}",count="{count}"}
     */
    private static final String EUREKA_INSTANCE_COUNT = "mall_eureka_instance_count";

    private static final String LABEL_CLIENT = "client";

    private final Gauge instanceStatusGauge;
    private final Gauge instanceCountGauge;


    public PrometheusMetricsService(CollectorRegistry registry) {
        instanceStatusGauge = Gauge
                .build(EUREKA_INSTANCE_STATUS, "instance status")
                .labelNames(LABEL_CLIENT)
                .register(registry);

        instanceCountGauge = Gauge
                .build(EUREKA_INSTANCE_COUNT, "instance count")
                .labelNames(LABEL_CLIENT)
                .register(registry);
    }

    /**
     * 例項狀態埋點
     *
     * @param client   client name || application name
     * @param statusValue   status
     */
    void metricInstanceStatus(String client, Integer statusValue) {
        instanceStatusGauge.labels(client).set(statusValue);
    }

    /**
     * 例項數量埋點
     *
     * @param client client name || application name
     * @param count  count
     */
    void metricInstanceCount(String client, Integer count) {
        instanceCountGauge.labels(client).set(count);
    }



}

Prometheus採集Eureka server資料

prometheus.yml

  - job_name: 'mgmall-eureka'
    scrape_interval: 10s 
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['10.124.129.42:19110']

Grafana報表維護

報表

mall_eureka_instance_count{client="MGMALL-CONFIG"}
.....

![image-20190531140258528](/Users/yugj/Library/Application Support/typora-user-images/image-20190531140258528.png)

監控

avg() query(A,10s,now) is below 1

![image-20190531140319350](/Users/yugj/Library/Application Support/typora-user-images/image-2