1. 程式人生 > >大資料之(3)Hadoop環境MapReduce程式驗證及hdfs常用命令

大資料之(3)Hadoop環境MapReduce程式驗證及hdfs常用命令

一、MapReduce驗證

  1. 本地建立一個test.txt檔案
    vim test.txt
    輸入一些英文句子如下:
Beijing is the capital of China
I love Beijing
I love China
  1. 上傳test.txt到hdfs系統的 ouput目錄
  • hdfs dfs -mkdir /user
  • hdfs dfs -mkdir /user/input
  • hdfs dfs -put test.txt /user/input
  1. 執行MapReduce例子程式
  • /hadoop/hadoop-2.9.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar替換成自己的對應目錄。
hadoop jar /hadoop/hadoop-2.9.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar wordcount /user/input/test.txt output
  • 執行命令正常輸出如下結果,map先從0-100然後reduce從0-100.
18/11/08 16:18:48 INFO client.RMProxy: Connecting to ResourceManager at master.hadoop/172.16.16.15:8032
18/11/08 16:18:49 INFO input.FileInputFormat: Total input files to process : 1
18/11/08 16:18:49 INFO mapreduce.JobSubmitter: number of splits:1
18/11/08 16:18:49 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/11/08 16:18:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1541665050140_0001
18/11/08 16:18:49 INFO impl.YarnClientImpl: Submitted application application_1541665050140_0001
18/11/08 16:18:49 INFO mapreduce.Job: The url to track the job: http://master.hadoop:8088/proxy/application_1541665050140_0001/
18/11/08 16:18:49 INFO mapreduce.Job: Running job: job_1541665050140_0001
18/11/08 16:18:55 INFO mapreduce.Job: Job job_1541665050140_0001 running in uber mode : false
18/11/08 16:18:55 INFO mapreduce.Job:  map 0% reduce 0%
18/11/08 16:19:00 INFO mapreduce.Job:  map 100% reduce 0%
18/11/08 16:19:05 INFO mapreduce.Job:  map 100% reduce 100%
18/11/08 16:19:06 INFO mapreduce.Job: Job job_1541665050140_0001 completed successfully
18/11/08 16:19:06 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=106
		FILE: Number of bytes written=404235
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=175
		HDFS: Number of bytes written=64
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Rack-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=1945
		Total time spent by all reduces in occupied slots (ms)=1574
		Total time spent by all map tasks (ms)=1945
		Total time spent by all reduce tasks (ms)=1574
		Total vcore-milliseconds taken by all map tasks=1945
		Total vcore-milliseconds taken by all reduce tasks=1574
		Total megabyte-milliseconds taken by all map tasks=1991680
		Total megabyte-milliseconds taken by all reduce tasks=1611776
	Map-Reduce Framework
		Map input records=3
		Map output records=12
		Map output bytes=107
		Map output materialized bytes=106
		Input split bytes=116
		Combine input records=12
		Combine output records=9
		Reduce input groups=9
		Reduce shuffle bytes=106
		Reduce input records=9
		Reduce output records=9
		Spilled Records=18
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=88
		CPU time spent (ms)=890
		Physical memory (bytes) snapshot=513982464
		Virtual memory (bytes) snapshot=4242575360
		Total committed heap usage (bytes)=316145664
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=59
	File Output Format Counters 
		Bytes Written=64

  • 檢視結果
    hdfs dfs -ls output
Found 2 items
-rw-r--r--   1 root supergroup          0 2018-11-08 16:19 output1/_SUCCESS
-rw-r--r--   1 root supergroup         64 2018-11-08 16:19 output1/part-r-00000

hdfs dfs -cat output/part-r-00000

Beijing	2
China	2
I	2
capital	1
is	1
love	2
of	1
the	1

-下載到本地output目錄看結果
hdfs dfs -get output output
cat ouput/part-r-00000
輸出同樣的結果

二、hdfs常用命令

圖中顯示了很多命令選項資訊。以上截圖不全,我在表格4-1中完整地列出了支援的命令選項。

選項名稱 使用格式 含義
-ls -ls <路徑> 檢視指定路徑的當前目錄結構
-lsr -lsr <路徑> 遞迴檢視指定路徑的目錄結構
-du -du <路徑> 統計目錄下個檔案大小
-dus -dus <路徑> 彙總統計目錄下檔案(夾)大小
-count -count [-q] <路徑> 統計檔案(夾)數量
-mv -mv <源路徑><目的路徑> 移動
-cp -cp <源路徑> <目的路徑> 複製
-rm -rm [-skipTrash] <路徑> 刪除檔案/空白資料夾
-rmr -rmr [-skipTrash] <路徑> 遞迴刪除
-put -put <多個linux上的檔案> <hdfs路徑> 上傳檔案
-copyFromLocal -copyFromLocal <多個linux上的檔案> <hdfs路徑> 從本地複製
-moveFromLocal -moveFromLocal <多個linux上的檔案> <hdfs路徑> 從本地移動
-getmerge -getmerge <源路徑> <linux路徑> 合併到本地
-cat -cat <hdfs路徑> 檢視檔案內容
-text -text <hdfs路徑> 檢視檔案內容
-copyToLocal -copyToLocal [-ignoreCrc] [-crc] [hdfs源路徑] [linux目的路徑] 從本地複製
-moveToLocal -moveToLocal [-crc] <hdfs源路徑> <linux目的路徑> 從本地移動
-mkdir -mkdir <hdfs路徑> 建立空白資料夾
-setrep -setrep [-R] [-w] <副本數> <路徑> 修改副本數量
-touchz -touchz <檔案路徑> 建立空白檔案
-stat -stat [format] <路徑> 顯示檔案統計資訊
-tail -tail [-f] <檔案> 檢視檔案尾部資訊
-chmod -chmod [-R] <許可權模式> [路徑] 修改許可權
-chown -chown [-R] [屬主][:[屬組]] 路徑 修改屬主
-chgrp -chgrp [-R] 屬組名稱 路徑 修改屬組
-help -help [命令選項] 幫助

注意:以上表格中路徑包括hdfs中的路徑和linux中的路徑。對於容易產生歧義的地方,會特別指出“linux路徑”或者“hdfs路徑”。如果沒有明確指出,意味著是hdfs路徑。