Hadoop 叢集基準測試

Hadoop · 發表 2018-10-05 07:23:15

摘要：生產環境中，如何對 Hadoop 叢集進行 Benchmark Test？如何進行服務所需的機器選型？如何快速對比出不同叢集的效能？本文將通過 Hadoop 自帶的 Benchmark 測試程式：TestDFSIO 和 TeraSort，簡單介紹如何進行 Hadoop 的讀寫 &...

生產環境中，如何對 Hadoop 叢集進行 Benchmark Test？如何進行服務所需的機器選型？如何快速對比出不同叢集的效能？

本文將通過 Hadoop 自帶的 Benchmark 測試程式：TestDFSIO 和 TeraSort，簡單介紹如何進行 Hadoop 的讀寫 & 計算效能的壓測。

回顧上篇文章： ofollow,noindex"> 認識多佇列網絡卡中斷繫結

（本文使用 2.6.0 的 hadoop 版本進行測試，基準測試被打包在測試程式 JAR 檔案中，通過無參呼叫 bin/hadoop jar ./share/hadoop/mapreduce/xxx.jar 可以得到其列表）

使用 TestDFSIO

進行叢集的 I/O 效能測試處

TestDFSIO :

org.apache.hadoop.fs.TestDFSIO

TestDFSIO 程式原理：

使用多個 Map Task 模擬多路的併發讀寫。通過自己的 Mapper class 用來讀寫資料，生成統計資訊；通過自己的 Reduce Class 來收集並彙總各個 Map Task 的統計資訊，主要涉及到三個檔案: AccumulatingReducer.java, IOMapperBase.java, TestDFSIO.java。

TestDFSIO 大致執行過程：

根據 Map Task 的數量將相應個數的 Control 控制檔案寫入 HDFS，這些控制檔案僅包含一行內容：<資料檔名，資料檔案大小> ;
啟動 MapReduceJob，IOMapperBase Class 中的 Map 方法將 Control 檔案作為輸入檔案，讀取內容，將資料檔名和大小作為引數傳遞給自定義的 doIO 函式，進行實際的資料讀寫工作。而後將資料大小和 doIO 執行的時間傳遞給自定義的 collectStatus 函式，進行統計資料的輸出工作 ;
doIO 的實現：TestDFSIO 過載並實現 doIO 函式，將指定大小的資料寫入 HDFS 檔案系統;
collectStatus 的實現：TestDFSIO 過載並實現 collectStatus 函式，將任務數量，以及資料大小，完成時間等相關資料作為 Map Class 的結果輸出;
統計資料用不同的字首標識，例如 l: (stand for long), s: (stand for string) etc;
執行唯一的一個 Reduce 任務，收集各個 Map Class 的統計資料，使用 AccumulatingReducer 進行彙總統計;
最後當 MapReduceJob 完成以後，呼叫 analyzeResult 函式讀取最終的統計資料並輸出到控制檯和本地的 Log 檔案中;

那麼 MR 任務測試叢集讀寫效能是否會因為資料傳輸影響到結果判斷呢？

可以看整個過程中，實際通過 MR 框架進行讀寫 Shuffle 的只是 Control 檔案，資料量非常小，所以 MR 框架本身的資料傳輸對測試的影響很小，可以忽略不計，測試結果基本是取決於 HDFS 的讀寫效能的。

瞭解到原理後，我們將執行 TestDFSIO 進行測試

測試叢集版本：hadoop-2.6.0-mdh3.11

測試叢集的機器情況：5 個 slave(dn/nm) 節點，每個節點機器為 32 核，128g 記憶體，12*4THdd 磁碟的物理機。

測試資料：5 個檔案，每個檔案大小為 1TB。

環境要求：叢集保證完全空閒，無其他干擾任務。

1. 寫測試：

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-mdh3.11-jre8-SNAPSHOT.jar TestDFSIO -write -nrFiles 5 -size 1TB
# 檢視測試結果

cat TestDFSIO_results.log

----- TestDFSIO ----- : write
Date & time: Mon Jun 04 16:44:25 CST 2018
Number of files: 5
Total MBytes processed: 5242880.0
Throughput mb/sec: 213.10459447844454
Average IO rate mb/sec: 213.11135864257812
 IO rate std deviation: 1.1965074234796487
Test exec time sec: 4972.91

2. 讀測試：

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-mdh3.11-jre8-SNAPSHOT.jar TestDFSIO -read -nrFiles 5 -size 1TB
# 檢視測試結果

cat TestDFSIO_results.log

----- TestDFSIO ----- : read
Date & time: Mon Jun 04 18:48:48 CST 2018
Number of files: 5
Total MBytes processed: 5242880.0
Throughput mb/sec: 164.327389903222
Average IO rate mb/sec: 164.33087158203125
 IO rate std deviation: 0.7560928117328837
Test exec time sec: 6436.246

以上測試資料解釋：

Throughput mb/sec和 Average IO rate mb/sec 是兩個最重要的效能衡量指標：Throughput mb/sec 衡量每個 map task 的平均吞吐量，Average IO rate mb/sec 衡量每個檔案的平均 IO 速度。

IO rate std deviation：標準差，高標準差表示資料散佈在一個大的值域中，這可能意味著群集中某個節點存在效能相關的問題，這可能和硬體或軟體有關。

使用 TeraSort

進行叢集的計算效能測試

TeraSort: org.apache.hadoop.examples.terasort.TeraSort

TeraSort 程式原理：

對輸入檔案按 Key 進行全域性排序。TeraSort 針對的是大批量的資料，在實現過程中為了保證 Reduce 階段各個 Reduce Job 的負載平衡，以保證全域性運算的速度，TeraSort 對資料進行了預取樣分析。

TeraSort 大致執行過程：

從 job 框架上看，為了保證 Reduce 階段的負載平衡，使用 jobConf.setPartitionerClass 自定義了 Partitioner Class 用來對資料進行分割槽，在 map 和 reduce 階段對資料不做額外處理。Job 流程如下：

對資料進行分段取樣：例如將輸入檔案最多分割為 10 段，每段讀取最多 100,000 行資料作為樣本，統計各個 Key 值出現的頻率並對 Key 值使用內建的 QuickSort 進行快速排序（這一步是 JobClient 在單個節點上執行的，取樣的運算量不能太大）;
將樣本統計結果中位於樣本統計平均分段處的 Key 值（例如 n/10 處 n=[1..10]）做為分割槽的依據以 DistributedCache 的方式寫入檔案，這樣在 MapReduce 階段的各個節點都能夠 Access 這個檔案。如果全域性資料的 Key 值分佈與樣本類似的話，這也就代表了全域性資料的平均分割槽的位置;
在 MapReduceJob 執行過程中，自定義的 Partitioner 會讀取這個樣本統計檔案，根據分割槽邊界 Key 值建立一個兩級的索引樹用來快速定位特定 Key 值對應的分割槽（這個兩級索引樹是根據 TeraSort 規定的輸入資料的特點定製的，對普通資料不一定具有普遍適用性，比如 Hadoop 內建的 TotalPartitioner 就採用了更通用的二分查詢法來定位分割槽）;

總結：

TeraSort 使用了 Hadoop 預設的 IdentityMapper 和 IdentityReducer。IdentityMapper 和 IdentityReducer 對它們的輸入不做任何處理，將輸入 k,v 直接輸出；也就是說是完全是為了走框架的流程而空跑。這正是 Hadoop 的 TeraSort 的巧妙所在，它沒有為排序而實現自己的 mapper 和 reducer，而是完全利用 Hadoop 的 Map Reduce 框架內的機制實現了排序。而也正因為如此，我們可以在叢集上利用 TeraSort 來測試 Hadoop。

瞭解到原理後，我們將執行 TeraSort 進行測試

測試叢集版本：hadoop-2.6.0-mdh3.11

測試叢集的機器情況：

5 個 slave(dn/nm) 節點，每個節點機器為 32 核，128g 記憶體，12*4THdd 磁碟的物理機。

測試資料：

hadoop 自帶的生成資料工具 TeraGen，輸入檔案是由一行行 100 位元組的記錄組成，每行記錄包括一個 10 位元組的 Key；以 Key 來對記錄排序。

環境要求：

叢集保證完全空閒，無其他干擾任務。

1

測試資料生成

按照 SortBenchmark 要求的輸入資料規則（需要 gensort 工具生成輸入資料）：輸入檔案是由一行行 100 位元組的記錄組成，每行記錄包括一個 10 位元組的 Key；以 Key 來對記錄排序。（具體可參考 http://www.ordinal.com/gensort.html）

Hadoop 的 TeraSort 實現的生成資料工具 TeraGen，演算法與 gensort 一致，我們將使用 TeraGen 生成測試資料：

（測試資料量為 1T，由於 100 位元組一行，則設定行數為 10000000000）

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-mdh3.11-jre8-SNAPSHOT.jar teragen 10000000000 /terasort/input1TB
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=248548
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=173
HDFS: Number of bytes written=1000000000000
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=32792925
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10930975
Total vcore-seconds taken by all map tasks=10930975
Total megabyte-seconds taken by all map tasks=8394988800
Map-Reduce Framework
Map input records=10000000000
Map output records=10000000000
Input split bytes=173
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=193112
CPU time spent (ms)=14325820
Physical memory (bytes) snapshot=916639744
Virtual memory (bytes) snapshot=12308406272
Total committed heap usage (bytes)=712507392
HeapUsageGroup
HeapUsageCounter=30947608
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=3028416809717741100
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1000000000000
 # 檢視生成的資料
bin/hadoop dfs -ls /terasort/input1TB
Found 3 items
-rw-r--r--3 hdfs_admin supergroup0 2018-06-05 11:49 /terasort/input1TB/_SUCCESS
-rw-r--r--3 hdfs_admin supergroup 500000000000 2018-06-05 11:45 /terasort/input1TB/part-m-00000
-rw-r--r--3 hdfs_admin supergroup 500000000000 2018-06-05 11:49 /terasort/input1TB/part-m-00001

2

執行 TeraSort 測試程式

測試資料生成好後，我們將執行 TeraSort 測試程式：

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-mdh3.11-jre8-SNAPSHOT.jar terasort /terasort/input1TB /terasort/output1TB
18/06/06 03:50:08 INFO mapreduce.Job: Counters: 52
File System Counters
FILE: Number of bytes read=5189229479006
FILE: Number of bytes written=6238290771828
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1000000856980
HDFS: Number of bytes written=1000000000000
HDFS: Number of read operations=22359
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=1
Launched map tasks=7453
Launched reduce tasks=1
Data-local map tasks=4424
Rack-local map tasks=3029
Total time spent by all maps in occupied slots (ms)=356530188
Total time spent by all reduces in occupied slots (ms)=224698152
Total time spent by all map tasks (ms)=118843396
Total time spent by all reduce tasks (ms)=56174538
Total vcore-seconds taken by all map tasks=118843396
Total vcore-seconds taken by all reduce tasks=56174538
Total megabyte-seconds taken by all map tasks=91271728128
Total megabyte-seconds taken by all reduce tasks=57522726912
Map-Reduce Framework
Map input records=10000000000
Map output records=10000000000
Map output bytes=1020000000000
Map output materialized bytes=1040000044712
Input split bytes=856980
Combine input records=0
Combine output records=0
Reduce input groups=10000000000
Reduce shuffle bytes=1040000044712
Reduce input records=10000000000
Reduce output records=10000000000
Spilled Records=59896435961
Shuffled Maps =7452
Failed Shuffles=0
Merged Map outputs=7452
GC time elapsed (ms)=14193819
CPU time spent (ms)=179564830
Physical memory (bytes) snapshot=3104994074624
Virtual memory (bytes) snapshot=46362045841408
Total committed heap usage (bytes)=2586227769344
HeapUsageGroup
HeapUsageCounter=896956972576
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1000000000000
File Output Format Counters
Bytes Written=1000000000000
18/06/06 03:50:08 INFO terasort.TeraSort: done
# 檢視輸出
bin/hadoop dfs -ls /terasort/output1TB
Found 3 items
-rw-r--r--1 hdfs_admin supergroup0 2018-06-06 03:50 /terasort/output1TB/_SUCCESS
-rw-r--r--10 hdfs_admin supergroup0 2018-06-05 11:52 /terasort/output1TB/_partition.lst
-rw-r--r--1 hdfs_admin supergroup 1000000000000 2018-06-06 03:50 /terasort/output1TB/part-r-00000

通過 Job Counters 等指標我們可以看出整個 TeraSort 的執行情況，可以通過這些資料對比出當前框架的計算效能。

3

結果的校驗：TeraValidate

TeraSort 自帶校驗程式 TeraValidate，用來檢驗排序輸出結果是否是有序的：

bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-mdh3.11-jre8-SNAPSHOT.jar teravalidate /terasort/output1TB /terasort/validate1TB

如果有錯誤，log 記錄會放在輸出目錄裡。

總結

Hadoop 自帶的 Benchmark 測試程式看起來微不足道，如果我們能夠多多挖掘，便可發揮更大的價值；既可以用來對叢集上線前的測試校驗，又可以用來進行叢集調優測試，通過舉一反三可以用到更多地地方。

參考文獻

《Hadoop 權威指南》

Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co.

Hadoop 叢集基準測試

使用 TestDFSIO

進行叢集的 I/O 效能測試處

使用 TeraSort

進行叢集的計算效能測試

1 測試資料生成

2 執行 TeraSort 測試程式

3 結果的校驗：TeraValidate

您可能也會喜歡…

1

測試資料生成

2

執行 TeraSort 測試程式

3

結果的校驗：TeraValidate