1. 程式人生 > >分散式資源排程框架 ——YARN

分散式資源排程框架 ——YARN

1 YARN 產生背景

  • MapReduce1.x 存在的問題:單點故障和 節點壓力大不易擴充套件;
  • Hadoop1.x 時,MapReduce -> Master/Slave 架構,1個 JobTracker 帶多個 TaskTracker
  • JobTracker : 負責資源管理和作業排程
  • TaskTracker: 定期向 JT 彙報 本節點的健康狀況、資源使用情況、作業執行情況;接受來自JT 的命令——啟動任務
    在這裡插入圖片描述
    在這裡插入圖片描述
  • YARN:不同計算框架可以共享同一個 HDFS 叢集上的資料,享受整體的資源排程

2 YARN 的架構

http://archive-primary.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.7.0/hadoop-yarn/hadoop-yarn-site/YARN.html

  • ResourceManager:RM,整個叢集同一時間提供服務的RM只有一個,負責叢集資源的統一管理和排程,處理客戶端的請求——提交一個作業,殺死一個作業;監控NM,一旦某個NM掛了,那麼該 NM 上執行的任務需要告訴 AM;
  • NodeManager:NM,整個叢集有多個,負責本節點資源管理和使用,定時向 RM 彙報本節點的資源使用情況;接收並處理來自 RM 的各種命令:啟動 Container; 處理來自 AM 的命令;單個節點的資源管理
  • **ApplicationMaster **: AM,負責應用程式的管理,每個應用程式對應一個:MR,Spark;為應用程式向 RM 申請資源(core,memory),分配給內部的 task;需要與 NM 通訊:啟動/停止 task,task 是執行在 container 裡面, AM也是執行在 container裡面;
  • Container:封裝了CPU,Memory 等資源的一個容器,是一個任務執行環境的抽象
  • Client:提交作業,檢視進度
    在這裡插入圖片描述

3 YARN 環境搭建

3.1 mapred-site.xml

<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>

3.2 yarn-site.xml

<property>
        <name>
yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>

3.3 啟動 YARN

[[email protected] ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-resourcemanager-node1.out
node1: starting nodemanager, logging to /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/logs/yarn-hadoop-nodemanager-node1.out

瀏覽器訪問 http://node1:8088
在這裡插入圖片描述

4 提交 MapReduce 作業到 YARN

自帶案例 /home/hadoop/apps/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce2

hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar
[[email protected] mapreduce2]$ hadoop jar 
RunJar jarFile [mainClass] args...

[[email protected] mapreduce2]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar 
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
[[email protected] mapreduce2]$ 
[[email protected] mapreduce2]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi
Usage: org.apache.hadoop.examples.QuasiMonteCarlo <nMaps> <nSamples>
Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

[[email protected] mapreduce2]$ 

[[email protected] mapreduce2]$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.7.0.jar pi 2 3
Number of Maps  = 2
Samples per Map = 3
18/10/29 22:19:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
18/10/29 22:19:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/10/29 22:19:03 INFO input.FileInputFormat: Total input paths to process : 2
18/10/29 22:19:04 INFO mapreduce.JobSubmitter: number of splits:2
18/10/29 22:19:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1540822729980_0001
18/10/29 22:19:04 INFO impl.YarnClientImpl: Submitted application application_1540822729980_0001
18/10/29 22:19:04 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1540822729980_0001/
18/10/29 22:19:04 INFO mapreduce.Job: Running job: job_1540822729980_0001
18/10/29 22:19:16 INFO mapreduce.Job: Job job_1540822729980_0001 running in uber mode : false
18/10/29 22:19:16 INFO mapreduce.Job:  map 0% reduce 0%
18/10/29 22:19:26 INFO mapreduce.Job:  map 50% reduce 0%
18/10/29 22:19:27 INFO mapreduce.Job:  map 100% reduce 0%
18/10/29 22:19:32 INFO mapreduce.Job:  map 100% reduce 100%
18/10/29 22:19:33 INFO mapreduce.Job: Job job_1540822729980_0001 completed successfully
18/10/29 22:19:33 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=50
		FILE: Number of bytes written=335472
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=522
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=11
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Job Counters 
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=15859
		Total time spent by all reduces in occupied slots (ms)=4321
		Total time spent by all map tasks (ms)=15859
		Total time spent by all reduce tasks (ms)=4321
		Total vcore-seconds taken by all map tasks=15859
		Total vcore-seconds taken by all reduce tasks=4321
		Total megabyte-seconds taken by all map tasks=16239616
		Total megabyte-seconds taken by all reduce tasks=4424704
	Map-Reduce Framework
		Map input records=2
		Map output records=4
		Map output bytes=36
		Map output materialized bytes=56
		Input split bytes=286
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=56
		Reduce input records=4
		Reduce output records=0
		Spilled Records=8
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=245
		CPU time spent (ms)=1260
		Physical memory (bytes) snapshot=458809344
		Virtual memory (bytes) snapshot=8175378432
		Total committed heap usage (bytes)=262033408
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=236
	File Output Format Counters 
		Bytes Written=97
Job Finished in 30.938 seconds
Estimated value of Pi is 4.00000000000000000000
[[email protected] mapreduce2]$