1. 程式人生 > >.NET Core 3.1之深入原始碼理解HealthCheck(二)

.NET Core 3.1之深入原始碼理解HealthCheck(二)

寫在前面

前文討論了HealthCheck的理論部分,本文將討論有關HealthCheck的應用內容。

  • 可以監視記憶體、磁碟和其他物理伺服器資源的使用情況來了解是否處於正常狀態。
  • 執行狀況檢查可以測試應用的依賴項(如資料庫和外部服務終結點)以確認是否可用和正常工作。
  • 執行狀況探測可以由容器業務流程協調程式和負載均衡器用於檢查應用的狀態。

原始碼研究

在應用中引入HealthCheck,一般需要配置Startup檔案,如下所示:

   1:  public void ConfigureServices(IServiceCollection services)
   2:  {
   3:      services.AddHealthChecks();
   4:  }
   5:   
   6:  public void Configure(IApplicationBuilder app)
   7:  {
   8:       app.UseRouting();
   9:   
  10:       app.UseEndpoints(endpoints =>
  11:        {
  12:            endpoints.MapHealthChecks("/health");
  13:        });
  14:  }

其中services.AddHealthChecks();會把我們引入到HealthCheckService的擴充套件方法中,程式碼如下:

   1:  public static class HealthCheckServiceCollectionExtensions
   2:  {
   3:      public static IHealthChecksBuilder AddHealthChecks(this IServiceCollection services)
   4:      {
   5:          services.TryAddSingleton<HealthCheckService, DefaultHealthCheckService>();
   6:          services.TryAddEnumerable(ServiceDescriptor.Singleton<IHostedService, HealthCheckPublisherHostedService>());
   7:          return new HealthChecksBuilder(services);
   8:      }
   9:  }

該擴充套件方法會嘗試註冊一個HealthCheckService的單例物件。HealthCheckService本身是一個抽象類,它內部含有一個抽象方法,主要用於執行健康檢查並返回健康狀態的聚合資訊。抽象方法如下所示:

   1:  public abstract Task<HealthReport> CheckHealthAsync(
   2:              Func<HealthCheckRegistration, bool> predicate,
   3:              CancellationToken cancellationToken = default);

HealthCheckService有一個預設派生類,就是DefaultHealthCheckService,在其構造方法中,會去驗證是否有重複的健康檢查名稱存在,如果有,就會丟擲異常。另外名稱的檢查是不區分大小寫的。該類所實現的抽象方法作為健康檢查的核心功能,內部實現還是比較複雜的。

首先我們看一下該方法的實現原始碼:

   1:  public override async Task<HealthReport> CheckHealthAsync(
   2:      Func<HealthCheckRegistration, bool> predicate,
   3:      CancellationToken cancellationToken = default)
   4:  {
   5:      var registrations = _options.Value.Registrations;
   6:      if (predicate != null)
   7:      {
   8:          registrations = registrations.Where(predicate).ToArray();
   9:      }
  10:   
  11:      var totalTime = ValueStopwatch.StartNew();
  12:      Log.HealthCheckProcessingBegin(_logger);
  13:   
  14:      var tasks = new Task<HealthReportEntry>[registrations.Count];
  15:      var index = 0;
  16:      using (var scope = _scopeFactory.CreateScope())
  17:      {
  18:          foreach (var registration in registrations)
  19:          {
  20:              tasks[index++] = Task.Run(() => RunCheckAsync(scope, registration, cancellationToken), cancellationToken);
  21:          }
  22:   
  23:          await Task.WhenAll(tasks).ConfigureAwait(false);
  24:      }
  25:   
  26:      index = 0;
  27:      var entries = new Dictionary<string, HealthReportEntry>(StringComparer.OrdinalIgnoreCase);
  28:      foreach (var registration in registrations)
  29:      {
  30:          entries[registration.Name] = tasks[index++].Result;
  31:      }
  32:   
  33:      var totalElapsedTime = totalTime.GetElapsedTime();
  34:      var report = new HealthReport(entries, totalElapsedTime);
  35:      Log.HealthCheckProcessingEnd(_logger, report.Status, totalElapsedTime);
  36:      return report;
  37:  }

1、其內部有比較完善的監控機制,會在內部維護了一個Log功能,全程監控健康檢查的耗時,該日誌所記錄的健康檢查不僅僅是一個健康檢查集合的耗時,而且也記錄了每個Name的耗時。

2、該方法會通過await Task.WhenAll(tasks).ConfigureAwait(false);併發執行健康檢查。當然,我需要注意的是,過多的健康檢查任務將會導致系統性能的下降,這主要看如何取捨了

CheckHealthAsync內部還會呼叫一個私有方法RunCheckAsync,這是真正執行健康檢查的方法。RunCheckAsync方法執行完成後,會建立HealthReportEntry物件返回到CheckHealthAsync中,並組裝到HealthReport物件中,到此該抽象方法執行完畢。

以下是RunCheckAsync方法的原始碼​:

   1:  private async Task<HealthReportEntry> RunCheckAsync(IServiceScope scope, HealthCheckRegistration registration, CancellationToken cancellationToken)
   2:  {
   3:      cancellationToken.ThrowIfCancellationRequested();
   4:   
   5:      var healthCheck = registration.Factory(scope.ServiceProvider);
   6:   
   7:      using (_logger.BeginScope(new HealthCheckLogScope(registration.Name)))
   8:      {
   9:          var stopwatch = ValueStopwatch.StartNew();
  10:          var context = new HealthCheckContext { Registration = registration };
  11:   
  12:          Log.HealthCheckBegin(_logger, registration);
  13:   
  14:          HealthReportEntry entry;
  15:          CancellationTokenSource timeoutCancellationTokenSource = null;
  16:          try
  17:          {
  18:              HealthCheckResult result;
  19:   
  20:              var checkCancellationToken = cancellationToken;
  21:              if (registration.Timeout > TimeSpan.Zero)
  22:              {
  23:                  timeoutCancellationTokenSource = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
  24:                  timeoutCancellationTokenSource.CancelAfter(registration.Timeout);
  25:                  checkCancellationToken = timeoutCancellationTokenSource.Token;
  26:              }
  27:   
  28:              result = await healthCheck.CheckHealthAsync(context, checkCancellationToken).ConfigureAwait(false);
  29:   
  30:              var duration = stopwatch.GetElapsedTime();
  31:   
  32:              entry = new HealthReportEntry(
  33:                  status: result.Status,
  34:                  description: result.Description,
  35:                  duration: duration,
  36:                  exception: result.Exception,
  37:                  data: result.Data,
  38:                  tags: registration.Tags);
  39:   
  40:              Log.HealthCheckEnd(_logger, registration, entry, duration);
  41:              Log.HealthCheckData(_logger, registration, entry);
  42:          }
  43:          catch (OperationCanceledException ex) when (!cancellationToken.IsCancellationRequested)
  44:          {
  45:              var duration = stopwatch.GetElapsedTime();
  46:              entry = new HealthReportEntry(
  47:                  status: HealthStatus.Unhealthy,
  48:                  description: "A timeout occured while running check.",
  49:                  duration: duration,
  50:                  exception: ex,
  51:                  data: null);
  52:   
  53:              Log.HealthCheckError(_logger, registration, ex, duration);
  54:          }
  55:   
  56:          // Allow cancellation to propagate if it's not a timeout.
  57:          catch (Exception ex) when (ex as OperationCanceledException == null)
  58:          {
  59:              var duration = stopwatch.GetElapsedTime();
  60:              entry = new HealthReportEntry(
  61:                  status: HealthStatus.Unhealthy,
  62:                  description: ex.Message,
  63:                  duration: duration,
  64:                  exception: ex,
  65:                  data: null);
  66:   
  67:              Log.HealthCheckError(_logger, registration, ex, duration);
  68:          }
  69:   
  70:          finally
  71:          {
  72:              timeoutCancellationTokenSource?.Dispose();
  73:          }
  74:   
  75:          return entry;
  76:      }
  77:  }

來自官方的應用

  • 資料庫探測,例子可以是執行select 1 from tableName根據資料庫響應來判斷是否健康
  • Entity Framework Core DbContext 探測,DbContext 檢查確認應用可以與為 EF Core DbContext 配置的資料庫通訊。
  • 單獨的就緒情況和執行情況探測,在某些託管方案中,可能初始化是一個比較耗時的操作,應用正常執行,但是可能還不能正常處理請求並響應
  • 具有自定義響應編寫器的基於指標的探測,比如檢查記憶體佔用是否超標,cpu 是否佔用過高,連線數是否達到上限
  • 按埠篩選,指定埠,一般用於容器環境,根據容器啟動時配置的埠號進行響應
  • 分發執行狀況檢查庫,將檢查介面實現獨立一個類,並通過依賴注入獲取引數,檢查時根據引數編寫邏輯
  • 執行狀況檢查釋出伺服器,如果向 DI 新增 IHealthCheckPublisher,則執行狀態檢查系統將定期執行狀態檢查,並使用結果呼叫 PublishAsync。適用於需要推送的健康系統,而不是健康系統
  • 使用 MapWhen 限制執行狀況檢查,使用 MapWhen 對執行狀況檢查終結點的請求管道進行條件分支
  • 其他更多內容請參考https://docs.microsoft.com/zh-cn/aspnet/core/host-and-deploy/health-checks?view=aspnetcore-