Flink學習筆記：2、Flink介紹

阿新 • • 發佈：2019-02-15

2、Flink介紹

Some of you might have been already using Apache Spark in your day-to-day life and might have been wondering if I have Spark why I need to use Flink? The question is quite expected and the comparison is natural. Let me try to answer that in brief. The very first thing we need to understand here is Flink is based on the streaming first principle which means it is real streaming processing engine and not a fast processing engine that collects streams as mini batches. Flink considers batch processing as a special case of streaming whereas it is vice-versa in the case of Spark.

大致意思：可能我們工作中已經使用可Apache Spark，那現在為什麼需要使用Flink？這是因為Flink是基於流式優先原則，這意味著它是真正的流處理引擎，而不是一個快速批處理引擎，以小批量方式收集流。Flink認為批處理是流的一種特殊情況，而Spark則是反過來。Spark的流處理本質是微批處理。

2.1 History

Flink started as a research project named Stratosphere with the goal of building a next generation big data analytics platform at universities in the Berlin area. It was accepted as an Apache Incubator project on April 16, 2014.
Flink開始作為一個名為Stratosphere的研究專案，目標是在柏林地區的大學建立下一代大資料分析平臺。 2014年4月16日被接受為Apache孵化器專案。

From version 0.6, Stratosphere was renamed Flink. The latest versions of Flink are focused on supporting various features such as batch processing,stream processing, graph processing, machine learning, and so on.Flink 0.7 introduced the most important feature of Flink that is, Flink’s streaming API. Initially release only had the Java API. Later releases started supporting Scala API as well. Now let’s look the current architecture of Flink in the next section.
從0.6版本開始，Stratosphere改名為Flink。 Flink的最新版本重點支援批處理，流處理，圖形處理，機器學習等各種功能.Flink 0.7引入了Flink最重要的特性，即Flink的流媒體API。最初版本只有Java API。後來的版本也開始支援Scala API。現在我們來看下一節中Flink的當前體系結構。

2.2 Architecture（架構）

Flink 1.X’s architecture consists of various components such as deploy, core processing, and APIs. The following diagram shows the components, APIs, and libraries:

Flink 1.X的架構由各種元件組成，如部署，核心處理和API。下圖顯示了元件，API和庫：
這裡寫圖片描述
Flink has a layered architecture where each component is a part of a specific layer. Each layer is built on top of the others for clear abstraction. Flink is designed to run on local machines, in a YARN cluster, or on the cloud. Runtime is Flink’s core data processing engine that receives the program through APIs in the form of JobGraph. JobGraph is a simple parallel data flow with a set of tasks that produce and consume data streams.
Flink有一個分層架構，其中每個元件都是特定圖層的一部分。每個圖層都建立在其他圖層之上，以實現清晰的抽象。 Flink設計用於在本地機器，YARN群集或雲上執行。 Runtime是Flink的核心資料處理引擎，通過JobGraph的形式通過API接收程式。 JobGraph是一個簡單的並行資料流，包含一組產生和使用資料流的任務。

The DataStream and DataSet APIs are the interfaces programmers can use for defining the Job. JobGraphs are generated by these APIs when the programs are compiled. Once compiled, the DataSet API allows the optimizer to generate the optimal execution plan while DataStream API uses a stream build for efficient execution plans.
DataStream和DataSet API是程式設計師可以用來定義Job的介面。 JobGraphs是在編譯程式時由這些API生成的。編譯後，DataSet API允許優化器生成最佳執行計劃，而DataStream API則使用流生成來實現高效的執行計劃。

The optimized JobGraph is then submitted to the executors according to the deployment model. You can choose a local, remote, or YARN mode of deployment. If you have a Hadoop cluster already running, it is always better to use a YARN mode of deployment.
然後根據部署模型將優化的JobGraph提交給執行者。您可以選擇本地，遠端或YARN部署模式。如果您的Hadoop叢集已經在執行，那麼最好使用YARN部署模式。

2.3 Distributed execution （分散式執行）

Flink’s distributed execution consists of two important processes, master and worker. When a Flink program is executed, various processes take part in the execution, namely Job Manager, Task Manager, and Job Client.
Flink的分散式執行由兩個重要的程序組成，master程序和worker程序。執行Flink程式時，各個程序參與執行，即作業管理器，工作管理員和作業客戶端。

The following diagram shows the Flink program execution:
下圖顯示了Flink程式的執行：
這裡寫圖片描述

The Flink program needs to be submitted to a Job Client. The Job Client then submits the job to the Job Manager. It’s the Job Manager’s responsibility to orchestrate the resource allocation and job execution. The very first thing it does is allocate the required resources. Once the resource allocation is done, the task is submitted to the respective the Task Manager. On receiving the task, the Task Manager initiates a thread to start the execution. While the execution is in place, the Task Managers keep on reporting the change of states to the Job Manager. There can be various states such as starting the execution, in progress, or finished. Once the job execution is complete, the results are sent back to the client.

Flink程式需要提交給作業客戶端。作業客戶端然後將作業提交給作業管理器。作業管理者有責任編排資源分配和作業執行。它所做的第一件事是分配所需的資源。一旦資源分配完成，任務就被提交給相應的工作管理員。在接收任務時，工作管理員啟動一個執行緒開始執行。在執行到位的同時，任務經理不斷向作業管理器報告狀態變化。可以有各種狀態，如開始執行，進行中或完成。作業執行完成後，結果會發送回客戶端。

2.3.1 Job Manager

The master processes, also known as Job Managers, coordinate and manage the execution of the program. Their main responsibilities include scheduling tasks, managing checkpoints,failure recovery, and so on.
master程序也稱為作業管理器，負責協調和管理程式的執行。他們的主要職責包括排程任務，管理檢查點，故障恢復等。

There can be many Masters running in parallel and sharing these responsibilities. This helps in achieving high availability. One of the masters needs to be the leader. If the leader node goes down, the master node (standby) will be elected as leader.
可以有許多Master並行運作並分擔這些責任。這有助於實現高可用性。其中一個master需要成為leader。如果leader節點關閉，master節點（standby）將被選為leader。

The Job Manager consists of the following important components:

Actorsystem、Scheduler、Check pointing
作業管理器包含以下重要元件：
Actorsystem（演員系統）、排程、檢查指向

Flink internally uses the Akka actor system for communication between the Job Managers and the Task Managers.
Flink內部使用Akka角色系統來管理Job Manager和Task Manager。

2.3.2 Actor system

An actor system is a container of actors with various roles. It provides services such as scheduling, configuration, logging, and so on. It also contains a thread pool from where all actors are initiated. All actors reside in a hierarchy. Each newly created actor would be assigned to a parent. Actors talk to each other using a messaging system. Each actor has its own mailbox from where it reads all the messages. If the actors are local, the messages are shared through shared memory but if the actors are remote then messages are passed thought RPC calls.
演員系統是具有各種角色的演員的容器(container)。它提供諸如排程，配置，日誌記錄等服務。它還包含一個執行緒池，從所有的角色開始。所有的演員駐留在一個層次結構中。每個新建立的actor都將被分配給父母。演員們使用資訊系統互相交談。每個參與者都有自己的郵箱，從中讀取所有郵件。如果參與者是本地的，則訊息通過共享記憶體共享，但是如果參與者是遠端的，則認為RPC呼叫訊息。

Each parent is responsible for the supervision of its children. If any error happens with the children, the parent gets notified. If an actor can solve its own problem then it can restart its children. If it cannot solve the problem then it can escalate the issue to its own parent:
每位家長負責監督其子女。如果孩子出現任何錯誤，家長會收到通知。如果一個演員可以解決自己的問題，那麼它可以重新啟動孩子。如果它不能解決問題，那麼它可以把問題升級到它自己的父母：
這裡寫圖片描述

In Flink, an actor is a container having state and behavior. An actor’s thread sequentially keeps on processing the messages it will receive in its mailbox. The state and the behavior are determined by the message it has received.
在Flink中，actor是具有狀態和行為的容器。一個actor的執行緒依次繼續處理它將在郵箱中收到的訊息。狀態和行為是由收到的資訊決定的。

3.3 Scheduler(排程)

Executors in Flink are defined as task slots. Each Task Manager needs to manage one or more task slots. Internally, Flink decides which tasks needs to share the slot and which tasks must be placed into a specific slot. It defines that through the SlotSharingGroup and CoLocationGroup.
Flink中的執行者被定義為任務槽。每個工作管理員都需要管理一個或多個任務槽。在內部，Flink決定哪些任務需要共享該插槽以及哪些任務必須被放置在特定的插槽中。它通過SlotSharingGroup和CoLocationGroup。

3.4 Check pointing(檢查指向)

Check pointing is Flink’s backbone for providing consistent fault tolerance. It keeps on taking consistent snapshots for distributed data streams and executor states. It is inspired by the Chandy-Lamport algorithm but has been modified for Flink’s tailored requirement.
檢查指向是Flink提供一致容錯的主幹。它始終為分散式資料流和執行器狀態提供一致的快照。它受Chandy-Lamport演算法的啟發，但是已經根據Flink的定製要求進行了修改。

The fault-tolerant mechanism keeps on creating lightweight snapshots for the data flows. They therefore continue the functionality without any significant over-burden. Generally the state of the data flow is kept in a configured place such as HDFS.
容錯機制一直為資料流建立輕量級快照。因此，他們繼續功能，沒有任何重大的負擔。通常，資料流的狀態儲存在HDFS等配置的地方。

In case of any failure, Flink stops the executors and resets them and starts executing from the latest available checkpoint.
如果出現任何故障，Flink會停止執行程式並重置它們，並從最新的可用檢查點開始執行。

Stream barriers are core elements of Flink’s snapshots. They are ingested into data streams without affecting the flow. Barriers never overtake the records. They group sets of records into a snapshot. Each barrier carries a unique ID. The following diagram shows how the barriers are injected into the data stream for snapshots:
流量障礙是Flink快照的核心要素。它們被攝入資料流而不影響流量。障礙永遠不會超過記錄。他們將一組記錄分成快照。每個障礙都帶有一個唯一的ID。下圖顯示瞭如何將屏障注入到快照的資料流中：
這裡寫圖片描述

Each snapshot state is reported to the Flink Job Manager’s checkpoint coordinator. While drawing snapshots, Flink handles the alignment of records in order to avoid re-processing the same records because of any failure. This alignment generally takes some milliseconds. But for some intense applications, where even millisecond latency is not acceptable, we have an option to choose low latency over exactly a single record processing. By default, Flink processes each record exactly once. If any application needs low latency and is fine with at least a single delivery, we can switch off that trigger. This will skip the alignment and will improve the latency.

將每個快照狀態報告給Flink作業管理器的檢查點協調器。在繪製快照時，Flink處理記錄對齊，以避免由於任何故障而重新處理相同的記錄。這種對齊通常需要幾毫秒。但是對於一些激烈的應用，即使毫秒級的延遲是不可接受的，我們也可以選擇在一個記錄處理中選擇低延遲。預設情況下，Flink只處理一個記錄。如果任何應用程式需要低延遲，並且至少有一次交付就可以，我們可以關閉該觸發器。這將跳過對齊，並會改善延遲。

2.3.5 Task manager

Task managers are worker nodes that execute the tasks in one or more threads in JVM. Parallelism of task execution is determined by the task slots available on each Task Manager. Each task represents a set of resources allocated to the task slot. For example, if a Task Manager has four slots then it will allocate 25% of the memory to each slot. There could be one or more threads running in a task slot. Threads in the same slot share the same JVM. Tasks in the same JVM share TCP connections and heart beat messages:
工作管理員是在JVM中的一個或多個執行緒中執行任務的工作者節點。任務執行的並行性由每個工作管理員上可用的任務槽決定。每個任務代表分配給任務槽的一組資源。例如，如果工作管理員有四個插槽，那麼它將為每個插槽分配25％的記憶體。可能有一個或多個執行緒在任務槽中執行。同一個槽中的執行緒共享相同的JVM。同一JVM中的任務共享TCP連線和心跳訊息：
這裡寫圖片描述

2.3.6 Job client

The job client is not an internal part of Flink’s program execution but it is the starting point of the execution. The job client is responsible for accepting the program from the user and then creating a data flow and then submitting the data flow to the Job Manager for further execution. Once the execution is completed, the job client provides the results back to the user. A data flow is a plan of execution. Consider a very simple word count program:
作業客戶端不是Flink程式執行的內部部分，但是它是執行的起點。作業客戶端負責接受來自使用者的程式，然後建立資料流，然後將資料流提交給作業管理器以供進一步執行。一旦執行完成，作業客戶端將結果提供給使用者。資料流是一個執行計劃。考慮一個非常簡單的字數統計程式：

val text=env.readTextFile("input.txt")  //source
val counts=text.flatMap{_.toLowerCase.split("\\w+") filter{_.nonEmpty}}
           .map{(_,1)}
           .groupBy(0)
           .sum(1)                       //Transformation  
counts.writeAsCsv("output.txt","\n"," ") //Sink

When a client accepts the program from the user, it then transforms it into a data flow. The data flow for the aforementioned program may look like this:
當客戶端接受來自使用者的程式時，它將其轉換成資料流。上述程式的資料流可能如下所示：
這裡寫圖片描述
The preceding diagram shows how a program gets transformed into a data flow. Flink data flows are parallel and distributed by default. For parallel data processing, Flink partitions the operators and streams. Operator partitions are called sub-tasks. Streams can distribute the data in a one-to-one or a re-distributed manner.
上圖顯示了程式如何轉換為資料流。 Flink資料流預設是並行分佈的。對於並行資料處理，Flink分割運算子和流。操作員分割槽被稱為子任務。流可以以一對一或重新分佈的方式分發資料。

The data flows directly from the source to the map operators as there is no need to shuffle the data. But for a GroupBy operation Flink may need to redistribute the data by keys in order to get the correct results:
資料直接從源流向map運算子，因為不需要混洗資料。但是對於GroupBy操作，Flink可能需要通過keys重新分配資料才能獲得正確的結果：
這裡寫圖片描述

2.4 Features

In the earlier sections, we tried to understand the Flink architecture and its execution model. Because of its robust architecture, Flink is full of various features.
在前面的章節中，我們嘗試瞭解Flink體系結構及其執行模型。由於其強大的架構，Flink充滿了各種功能。

2.4.1 High performance(高效能)

Flink is designed to achieve high performance and low latency. Unlike other streaming frameworks such as Spark, you don’t need to do many manual configurations to get the best performance. Flink’s pipelined data processing gives better performance compared to its counterparts.
Flink旨在實現高效能和低延遲。與Spark等其他流式框架不同，您不需要執行許多手動配置即可獲得最佳效能。與同行相比，Flink的流水線資料處理效能更好。

2.4.2 Exactly-once stateful computation(確切的一次有狀態計算)

As we discussed in the previous section, Flink’s distributed checkpoint processing helps to guarantee processing each record exactly once. In the case of high-throughput applications, Flink provides us with a switch to allow at least once processing.
正如我們在上一節中討論的那樣，Flink的分散式檢查點處理有助於保證每個記錄只處理一次。在高通量應用的情況下，Flink為我們提供了一個開關，允許至少一次處理。

2.4.3 Flexible streaming windows(靈活的流式視窗)

Flink supports data-driven windows. This means we can design a window based on time,counts, or sessions. A window can also be customized which allows us to detect specific patterns in event streams.
Flink支援資料驅動的視窗。這意味著我們可以根據時間，計數或會話設計一個視窗。還可以定製視窗，使我們能夠檢測事件流中的特定模式。

2.4.4 Fault tolerance(容錯)

Flink’s distributed, lightweight snapshot mechanism helps in achieving a great degree of fault tolerance. It allows Flink to provide high-throughput performance with guaranteed delivery.
Flink的分散式輕量級快照機制有助於實現高度的容錯性。它允許Flink提供高吞吐量效能和保證交付。

2.4.5 Memory management（記憶體管理）

Flink is supplied with its own memory management inside a JVM which makes it independent of Java’s default garbage collector. It efficiently does memory management by using hashing,indexing, caching, and sorting.
Flink在JVM內部提供了自己的記憶體管理，使其獨立於Java的預設垃圾收集器。它通過使用雜湊，索引，快取和排序有效地進行記憶體管理。

2.4.6 Optimizer（優化）

Flink’s batch data processing API is optimized in order to avoid memory-consuming operations such as shuffle, sort, and so on. It also makes sure that caching is used in order to avoid heavy disk IO operations.
Flink的批處理資料處理API經過優化，以避免諸如洗牌，分類等耗費記憶體的操作。它還確保使用快取來避免繁重的IO操作。

2.4.7 Stream and batch in one platform（在一個平臺上進行流和批處理）

Flink provides APIs for both batch and stream data processing. So once you set up the Flink environment, it can host stream and batch processing applications easily. In fact Flink works on Streaming first principle and considers batch processing as the special case of streaming.
Flink為批處理和流資料處理提供API。所以一旦你建立了Flink的環境，它可以容易地託管流和批處理應用程式。事實上，Flink的工作原理是流式處理，並將批處理視為流式處理的特例。

2.4.8 Libraries（庫）

Flink has a rich set of libraries to do machine learning, graph processing, relational data processing, and so on. Because of its architecture, it is very easy to perform complex event processing and alerting. We are going to see more about these libraries in subsequent chapters.
Flink有一套豐富的庫來做機器學習，圖形處理，關係資料處理等等。由於其架構，執行復雜事件處理和警報非常容易。我們將在隨後的章節中看到更多關於這些庫的資訊。

2.4.9 Event time semantics（事件時間語義）

Flink supports event time semantics. This helps in processing streams where events arrive out of order. Sometimes events may come delayed. Flink’s architecture allows us to define windows based on time, counts, and sessions, which helps in dealing with such scenarios.
Flink支援事件時間語義。這有助於處理事件無序到達的流。有時事件可能會延遲。 Flink的架構允許我們根據時間，計數和會話來定義視窗，這有助於處理這種情況。