Flink原始碼系列——TaskManager處理SubmitTask的過程

阿新 • • 發佈：2019-02-10

接《Flink原始碼系列——JobManager處理SubmitJob的過程》，在從JobManager中，將SubmitTask提交到TaskManager後，繼續分析TaskManager的處理邏輯。
TaskManager是個Actor，混入了LeaderSessionMessageFilter這個trait，所以在從JobManager接收到JobManagerMessages.LeaderSessionMessage[TaskMessages.SubmitTask[TaskDeploymentDescriptor]]這樣的一個封裝訊息後，會先在LeaderSessionMessageFilter這個trait的receive方法中，進行訊息的過濾，過濾邏輯如下：

abstract override def receive: Receive = {
  case leaderMessage @ LeaderSessionMessage(msgID, msg) =>
    leaderSessionID match {
      case Some(leaderId) =>
        if (leaderId.equals(msgID)) {
          super.receive(msg)
        } else {
          handleDiscardedMessage(leaderId, leaderMessage)
        }
      case 
 None =>
        handleNoLeaderId(leaderMessage)
    }
  case msg: RequiresLeaderSessionID =>
    throw new Exception(s"Received a message $msg without a leader session ID, even though" +
      s" the message requires a leader session ID.")
  case msg =>
    super.receive(msg)
}

邏輯拆分如下：

a、接收到的是一個LeaderSessionMessage訊息

a.1、當前TaskManager中有leaderSessionID

a.1.1、TaskManager所屬的JobManager的sessionID和訊息中的sessionID相同，則呼叫父類的receive方法
a.1.2、兩個sessionID不同，則說明是一個過期訊息，忽視該訊息

a.2、當前TaskManager沒有leaderSessionID，則列印個日誌，不做任何處理

b、接收到的是一個RequiresLeaderSessionID訊息，說明訊息需要leaderSessionID，但其又沒有封裝在LeaderSessionMessage中，屬於異常情況，丟擲異常

c、其他訊息，呼叫父類的receive方法

對於從JobManager接收到的上述訊息，經過上述處理邏輯後，就變成TaskMessages.SubmitTask[TaskDeploymentDescriptor]，並作為handleMessage方法的入參，SubmitTask是TaskMessage的子類，所以在handleMessage中的處理邏輯如下：

override def handleMessage: Receive = {
  ...

  case message: TaskMessage => handleTaskMessage(message)

  ...
}

然後會就進入handleTaskMessage方法，如下：

private def handleTaskMessage(message: TaskMessage): Unit = {
    ...

    case SubmitTask(tdd) => submitTask(tdd)

    ...
}

經過上述兩步轉化後，就會進入submitTask方法中，且入參就是TaskDeploymentDescriptor。

submitTask()方法的程式碼很長，但是邏輯不復雜，分塊說明如下：

/** 獲取當前JobManager的actor */
val jobManagerActor = currentJobManager match {
  case Some(jm) => jm
  case None =>
    throw new IllegalStateException("TaskManager is not associated with a JobManager.")
}

/** 獲取library快取管理器 */
val libCache = libraryCacheManager match {
  case Some(manager) => manager
  case None => throw new IllegalStateException("There is no valid library cache manager.")
}

/** 獲取blobCache */
val blobCache = this.blobCache match {
  case Some(manager) => manager
  case None => throw new IllegalStateException("There is no valid BLOB cache.")
}

/** 槽位編號校驗 */
val slot = tdd.getTargetSlotNumber
if (slot < 0 || slot >= numberOfSlots) {
  throw new IllegalArgumentException(s"Target slot $slot does not exist on TaskManager.")
}

/** 獲取一些連結相關 */
val (checkpointResponder,
  partitionStateChecker,
  resultPartitionConsumableNotifier,
  taskManagerConnection) = connectionUtils match {
  case Some(x) => x
  case None => throw new IllegalStateException("The connection utils have not been " +
                                                 "initialized.")
}

這部分邏輯就是獲取一些處理控制代碼，如果獲取不到，則丟擲異常，並校驗當前任務的槽位編號是否在有效範圍，以及一些連結資訊。

/** 構建JobManager的gateway */
val jobManagerGateway = new AkkaActorGateway(jobManagerActor, leaderSessionID.orNull)

/** 部分資料可能由於量較大，不方便通過rpc傳輸，會先持久化，然後在這裡再載入回來 */
try {
  tdd.loadBigData(blobCache.getPermanentBlobService);
} catch {
  case e @ (_: IOException | _: ClassNotFoundException) =>
    throw new IOException("Could not deserialize the job information.", e)
}

/** 獲取jobInformation */
val jobInformation = try {
  tdd.getSerializedJobInformation.deserializeValue(getClass.getClassLoader)
} catch {
  case e @ (_: IOException | _: ClassNotFoundException) =>
    throw new IOException("Could not deserialize the job information.", e)
}

/** 校驗jobID資訊 */
if (tdd.getJobId != jobInformation.getJobId) {
  throw new IOException(
    "Inconsistent job ID information inside TaskDeploymentDescriptor (" +
    tdd.getJobId + " vs. " + jobInformation.getJobId + ")")
}

/** 獲取taskInformation */
val taskInformation = try {
  tdd.getSerializedTaskInformation.deserializeValue(getClass.getClassLoader)
} catch {
  case [email protected](_: IOException | _: ClassNotFoundException) =>
    throw new IOException("Could not deserialize the job vertex information.", e)
}

/** 統計相關 */
val taskMetricGroup = taskManagerMetricGroup.addTaskForJob(
  jobInformation.getJobId,
  jobInformation.getJobName,
  taskInformation.getJobVertexId,
  tdd.getExecutionAttemptId,
  taskInformation.getTaskName,
  tdd.getSubtaskIndex,
  tdd.getAttemptNumber)

val inputSplitProvider = new TaskInputSplitProvider(
  jobManagerGateway,
  jobInformation.getJobId,
  taskInformation.getJobVertexId,
  tdd.getExecutionAttemptId,
  new FiniteDuration(
    config.getTimeout().getSize(),
    config.getTimeout().getUnit()))

/** 構建task */
val task = new Task(
  jobInformation,
  taskInformation,
  tdd.getExecutionAttemptId,
  tdd.getAllocationId,
  tdd.getSubtaskIndex,
  tdd.getAttemptNumber,
  tdd.getProducedPartitions,
  tdd.getInputGates,
  tdd.getTargetSlotNumber,
  tdd.getTaskStateHandles,
  memoryManager,
  ioManager,
  network,
  bcVarManager,
  taskManagerConnection,
  inputSplitProvider,
  checkpointResponder,
  blobCache,
  libCache,
  fileCache,
  config,
  taskMetricGroup,
  resultPartitionConsumableNotifier,
  partitionStateChecker,
  context.dispatcher)

log.info(s"Received task ${task.getTaskInfo.getTaskNameWithSubtasks()}")

上述邏輯還是在獲取各種資料，主要的目的根據以上獲取的變數，構建一個Task例項。

val execId = tdd.getExecutionAttemptId
// 將task新增到map
val prevTask = runningTasks.put(execId, task)
if (prevTask != null) {
  // 對於ID已經存在一個task，則恢復回來，並報告一個錯誤
  runningTasks.put(execId, prevTask)
  throw new IllegalStateException("TaskManager already contains a task for id " + execId)
}

// 一切都好，我們啟動task，讓它開始自己的初始化
task.startTaskThread()

sender ! decorateMessage(Acknowledge.get())

這裡的邏輯就是將新建的task加入到runningTasks這個map中，如果發現相同execID，已經存在執行的task，則先回滾，然後丟擲異常。
一切都執行順利的話，則啟動task，並給sender傳送一個ack訊息。

task的啟動，就是執行Task例項中的executingThread這個變量表示的執行緒。

public void startTaskThread() {
   executingThread.start();
}

而executingThread這個變數的初始化是在Task的建構函式的最後進行的。

executingThread = new Thread(TASK_THREADS_GROUP, this, taskNameWithSubtask);

並且將Task例項自身作為其執行物件，而Task實現了Runnable介面，所以最後就是執行Task中的run()方法。
run方法的邏輯，先是進行狀態的初始化，就是進入一個while迴圈，根據當前狀態，執行不同的操作，有可能正常退出迴圈，進行向下執行，有可能直接reture，有可能丟擲異常，邏輯如下：

while (true) {
   ExecutionState current = this.executionState;
   if (current == ExecutionState.CREATED) {
      /** 如果是CREATED狀態, 則先將狀態轉換為DEPLOYING, 然後退出迴圈 */
      if (transitionState(ExecutionState.CREATED, ExecutionState.DEPLOYING)) {
         /** 如果成功, 則說明我們可以開始啟動我們的work了 */
         break;
      }
   }
   else if (current == ExecutionState.FAILED) {
      /** 如果當前狀態是FAILED, 則立即執行失敗操作, 告訴TaskManager, 我們已經到達最終狀態了, 然後直接返回 */
      notifyFinalState();
      if (metrics != null) {
         metrics.close();
      }
      return;
   }
   else if (current == ExecutionState.CANCELING) {
      if (transitionState(ExecutionState.CANCELING, ExecutionState.CANCELED)) {
         /** 如果是CANCELING狀態, 則告訴TaskManager, 我們到達最終狀態了, 然後直接返回 */
         notifyFinalState();
         if (metrics != null) {
            metrics.close();
         }
         return;
      }
   }
   else {
      /** 如果是其他狀態, 則丟擲異常 */
      if (metrics != null) {
         metrics.close();
      }
      throw new IllegalStateException("Invalid state for beginning of operation of task " + this + '.');
   }
}

當從這個while迴圈正常退出後，繼續向下執行，就是一個try-catch-finally的結構。

這裡主要分析一下try塊中的邏輯。

1、任務引導

// activate safety net for task thread
LOG.info("Creating FileSystem stream leak safety net for task {}", this);
FileSystemSafetyNet.initializeSafetyNetForThread();

blobService.getPermanentBlobService().registerJob(jobId);

/**
 * 首先, 獲取一個 user-code 類載入器
 * 這可能涉及下載作業的JAR檔案和/或類。
 */
LOG.info("Loading JAR files for task {}.", this);

userCodeClassLoader = createUserCodeClassloader();
final ExecutionConfig executionConfig = serializedExecutionConfig.deserializeValue(userCodeClassLoader);

if (executionConfig.getTaskCancellationInterval() >= 0) {
   /** 嘗試取消task時, 兩次嘗試之間的時間間隔, 單位毫秒 */
   taskCancellationInterval = executionConfig.getTaskCancellationInterval();
}

if (executionConfig.getTaskCancellationTimeout() >= 0) {
   /** 取消任務的超時時間, 可以在flink的配置中覆蓋 */
   taskCancellationTimeout = executionConfig.getTaskCancellationTimeout();
}

/**
 * 例項化AbstractInvokable的具體子類
 * {@see StreamGraph#addOperator}
 * {@see StoppableSourceStreamTask}
 * {@see SourceStreamTask}
 * {@see OneInputStreamTask}
 */
invokable = loadAndInstantiateInvokable(userCodeClassLoader, nameOfInvokableClass);

/** 如果當前狀態'CANCELING'、'CANCELED'、'FAILED', 則丟擲異常 */
if (isCanceledOrFailed()) {
   throw new CancelTaskException();
}

這部分就是載入jar包，超時時間等獲取，然後例項化AbstractInvokable的一個具體子類，目前主要是StoppableSourceStreamTask、SourceStreamTask、OneInputStreamTask 這三個子類。
並且會對狀態進行檢查，如果處於’CANCELING’、’CANCELED’、’FAILED’其中的一個狀態，則丟擲CancelTaskException異常。

2、相關注冊

LOG.info("Registering task at network: {}.", this);

network.registerTask(this);

// add metrics for buffers
this.metrics.getIOMetricGroup().initializeBufferMetrics(this);

// register detailed network metrics, if configured
if (taskManagerConfig.getConfiguration().getBoolean(TaskManagerOptions.NETWORK_DETAILED_METRICS)) {
   // similar to MetricUtils.instantiateNetworkMetrics() but inside this IOMetricGroup
   MetricGroup networkGroup = this.metrics.getIOMetricGroup().addGroup("Network");
   MetricGroup outputGroup = networkGroup.addGroup("Output");
   MetricGroup inputGroup = networkGroup.addGroup("Input");

   // output metrics
   for (int i = 0; i < producedPartitions.length; i++) {
      ResultPartitionMetrics.registerQueueLengthMetrics(
         outputGroup.addGroup(i), producedPartitions[i]);
   }

   for (int i = 0; i < inputGates.length; i++) {
      InputGateMetrics.registerQueueLengthMetrics(
         inputGroup.addGroup(i), inputGates[i]);
   }
}

/** 接下來, 啟動為分散式快取進行檔案的後臺拷貝 */
try {
   for (Map.Entry<String, DistributedCache.DistributedCacheEntry> entry :
         DistributedCache.readFileInfoFromConfig(jobConfiguration))
   {
      LOG.info("Obtaining local cache file for '{}'.", entry.getKey());
      Future<Path> cp = fileCache.createTmpFile(entry.getKey(), entry.getValue(), jobId);
      distributedCacheEntries.put(entry.getKey(), cp);
   }
}
catch (Exception e) {
   throw new Exception(
      String.format("Exception while adding files to distributed cache of task %s (%s).", taskNameWithSubtask, executionId),
      e);
}

/** 再次校驗狀態 */
if (isCanceledOrFailed()) {
   throw new CancelTaskException();
}

這裡最後，也會進行狀態校驗，以便可以快速執行取消操作。

3、使用者程式碼初始化

TaskKvStateRegistry kvStateRegistry = network
      .createKvStateTaskRegistry(jobId, getJobVertexId());

Environment env = new RuntimeEnvironment(
   jobId, vertexId, executionId, executionConfig, taskInfo,
   jobConfiguration, taskConfiguration, userCodeClassLoader,
   memoryManager, ioManager, broadcastVariableManager,
   accumulatorRegistry, kvStateRegistry, inputSplitProvider,
   distributedCacheEntries, writers, inputGates,
   checkpointResponder, taskManagerConfig, metrics, this);

/** 讓task程式碼建立它的readers和writers */
invokable.setEnvironment(env);

// the very last thing before the actual execution starts running is to inject
// the state into the task. the state is non-empty if this is an execution
// of a task that failed but had backuped state from a checkpoint

if (null != taskStateHandles) {
   if (invokable instanceof StatefulTask) {
      StatefulTask op = (StatefulTask) invokable;
      op.setInitialState(taskStateHandles);
   } else {
      throw new IllegalStateException("Found operator state for a non-stateful task invokable");
   }
   // be memory and GC friendly - since the code stays in invoke() for a potentially long time,
   // we clear the reference to the state handle
   //noinspection UnusedAssignment
   taskStateHandles = null;
}

4、真正執行

/** 在我們將狀態切換到'RUNNING'狀態時, 我們可以方法cancel方法 */
this.invokable = invokable;

/** 將狀態從'DEPLOYING'切換到'RUNNING', 如果失敗, 已經是在同一時間, 發生了 canceled/failed 操作。 */
if (!transitionState(ExecutionState.DEPLOYING, ExecutionState.RUNNING)) {
   throw new CancelTaskException();
}

/** 告訴每個人, 我們切換到'RUNNING'狀態了 */
notifyObservers(ExecutionState.RUNNING, null);
taskManagerActions.updateTaskExecutionState(new TaskExecutionState(jobId, executionId, ExecutionState.RUNNING));

/** 設定執行緒上下文類載入器 */
executingThread.setContextClassLoader(userCodeClassLoader);

/** run，這裡就是真正開始執行處理邏輯的地方 */
invokable.invoke();

/** 確保, 如果task由於被取消而退出了invoke()方法, 我們可以進入catch邏輯塊 */
if (isCanceledOrFailed()) {
   throw new CancelTaskException();
}

其中的 invokable.invoke() 這句程式碼就是真正邏輯開始執行的地方，且一般會阻塞在這裡，直至任務執行完成，或者被取消，發生異常等。

5、結尾

/** 完成生產資料分割槽。如果這裡失敗, 我們也任務執行失敗 */
for (ResultPartition partition : producedPartitions) {
   if (partition != null) {
      partition.finish();
   }
}

/**
 * 嘗試將狀態從'RUNNING'修改為'FINISHED'
 * 如果失敗, 那麼task是同一時間被執行了 canceled/failed 操作
 */
if (transitionState(ExecutionState.RUNNING, ExecutionState.FINISHED)) {
   notifyObservers(ExecutionState.FINISHED, null);
}
else {
   throw new CancelTaskException();
}

這裡就是做收尾操作，以及把狀態從’RUNNING’轉換為’FINISHED’，並通知相關觀察者。

Flink原始碼系列——TaskManager處理SubmitTask的過程

1、任務引導

2、相關注冊

3、使用者程式碼初始化

4、真正執行

5、結尾

Flink原始碼系列——TaskManager處理SubmitTask的過程

Flink原始碼系列——JobManager處理SubmitJob的過程

Flink 原始碼解析 —— TaskManager 處理 SubmitJob 的過程

Flink原始碼系列——獲取StreamGraph的過程

Flink原始碼系列——獲取JobGraph的過程

Flink原始碼系列——Flink中一個簡單的資料處理功能的實現過程

SpringMVC原始碼--控制器Handler處理請求過程

Flink原始碼系列——指標監測

Flink原始碼解析(standalone)之taskmanager啟動

Spark原始碼系列（九）Spark SQL初體驗之解析過程詳解

雲星資料---Apache Flink實戰系列(精品版)】：Flink流處理API詳解與程式設計實戰002-Flink基於流的wordcount示例002

【雲星資料---Apache Flink實戰系列(精品版)】：Apache Flink實戰基礎002--flink特性：流處理特性介紹

zookeeper原始碼 — 五、處理寫請求過程

Flink 原始碼解析 —— Flink TaskManager 有什麼作用？

【Yii系列】處理請求

SQL系列學習存儲過程&事物語法

Linux發行版CentOS系列系統的安裝過程

Http請求處理整個過程

一次服務器被挖礦的處理解決過程

Spark原始碼系列:RDD repartition、coalesce 對比

Flink原始碼系列——TaskManager處理SubmitTask的過程

1、任務引導

2、相關注冊

3、使用者程式碼初始化

4、真正執行

5、結尾

相關推薦