大資料之(3)Hadoop環境MapReduce程式驗證及hdfs常用命令
阿新 • • 發佈:2018-11-17
一、MapReduce驗證
- 本地建立一個test.txt檔案
vim test.txt
輸入一些英文句子如下:
Beijing is the capital of China
I love Beijing
I love China
- 上傳test.txt到hdfs系統的 ouput目錄
- hdfs dfs -mkdir /user
- hdfs dfs -mkdir /user/input
- hdfs dfs -put test.txt /user/input
- 執行MapReduce例子程式
- /hadoop/hadoop-2.9.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar替換成自己的對應目錄。
hadoop jar /hadoop/hadoop-2.9.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar wordcount /user/input/test.txt output
- 執行命令正常輸出如下結果,map先從0-100然後reduce從0-100.
18/11/08 16:18:48 INFO client.RMProxy: Connecting to ResourceManager at master.hadoop/172.16.16.15:8032 18/11/08 16:18:49 INFO input.FileInputFormat: Total input files to process : 1 18/11/08 16:18:49 INFO mapreduce.JobSubmitter: number of splits:1 18/11/08 16:18:49 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 18/11/08 16:18:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1541665050140_0001 18/11/08 16:18:49 INFO impl.YarnClientImpl: Submitted application application_1541665050140_0001 18/11/08 16:18:49 INFO mapreduce.Job: The url to track the job: http://master.hadoop:8088/proxy/application_1541665050140_0001/ 18/11/08 16:18:49 INFO mapreduce.Job: Running job: job_1541665050140_0001 18/11/08 16:18:55 INFO mapreduce.Job: Job job_1541665050140_0001 running in uber mode : false 18/11/08 16:18:55 INFO mapreduce.Job: map 0% reduce 0% 18/11/08 16:19:00 INFO mapreduce.Job: map 100% reduce 0% 18/11/08 16:19:05 INFO mapreduce.Job: map 100% reduce 100% 18/11/08 16:19:06 INFO mapreduce.Job: Job job_1541665050140_0001 completed successfully 18/11/08 16:19:06 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=106 FILE: Number of bytes written=404235 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=175 HDFS: Number of bytes written=64 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Rack-local map tasks=1 Total time spent by all maps in occupied slots (ms)=1945 Total time spent by all reduces in occupied slots (ms)=1574 Total time spent by all map tasks (ms)=1945 Total time spent by all reduce tasks (ms)=1574 Total vcore-milliseconds taken by all map tasks=1945 Total vcore-milliseconds taken by all reduce tasks=1574 Total megabyte-milliseconds taken by all map tasks=1991680 Total megabyte-milliseconds taken by all reduce tasks=1611776 Map-Reduce Framework Map input records=3 Map output records=12 Map output bytes=107 Map output materialized bytes=106 Input split bytes=116 Combine input records=12 Combine output records=9 Reduce input groups=9 Reduce shuffle bytes=106 Reduce input records=9 Reduce output records=9 Spilled Records=18 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=88 CPU time spent (ms)=890 Physical memory (bytes) snapshot=513982464 Virtual memory (bytes) snapshot=4242575360 Total committed heap usage (bytes)=316145664 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=59 File Output Format Counters Bytes Written=64
- 檢視結果
hdfs dfs -ls output
Found 2 items
-rw-r--r-- 1 root supergroup 0 2018-11-08 16:19 output1/_SUCCESS
-rw-r--r-- 1 root supergroup 64 2018-11-08 16:19 output1/part-r-00000
hdfs dfs -cat output/part-r-00000
Beijing 2
China 2
I 2
capital 1
is 1
love 2
of 1
the 1
-下載到本地output目錄看結果
hdfs dfs -get output output
cat ouput/part-r-00000
輸出同樣的結果
二、hdfs常用命令
圖中顯示了很多命令選項資訊。以上截圖不全,我在表格4-1中完整地列出了支援的命令選項。
選項名稱 | 使用格式 | 含義 |
---|---|---|
-ls | -ls <路徑> | 檢視指定路徑的當前目錄結構 |
-lsr | -lsr <路徑> | 遞迴檢視指定路徑的目錄結構 |
-du | -du <路徑> | 統計目錄下個檔案大小 |
-dus | -dus <路徑> | 彙總統計目錄下檔案(夾)大小 |
-count | -count [-q] <路徑> | 統計檔案(夾)數量 |
-mv | -mv <源路徑><目的路徑> | 移動 |
-cp | -cp <源路徑> <目的路徑> | 複製 |
-rm | -rm [-skipTrash] <路徑> | 刪除檔案/空白資料夾 |
-rmr | -rmr [-skipTrash] <路徑> | 遞迴刪除 |
-put | -put <多個linux上的檔案> <hdfs路徑> | 上傳檔案 |
-copyFromLocal | -copyFromLocal <多個linux上的檔案> <hdfs路徑> | 從本地複製 |
-moveFromLocal | -moveFromLocal <多個linux上的檔案> <hdfs路徑> | 從本地移動 |
-getmerge | -getmerge <源路徑> <linux路徑> | 合併到本地 |
-cat | -cat <hdfs路徑> | 檢視檔案內容 |
-text | -text <hdfs路徑> | 檢視檔案內容 |
-copyToLocal | -copyToLocal [-ignoreCrc] [-crc] [hdfs源路徑] [linux目的路徑] | 從本地複製 |
-moveToLocal | -moveToLocal [-crc] <hdfs源路徑> <linux目的路徑> | 從本地移動 |
-mkdir | -mkdir <hdfs路徑> | 建立空白資料夾 |
-setrep | -setrep [-R] [-w] <副本數> <路徑> | 修改副本數量 |
-touchz | -touchz <檔案路徑> | 建立空白檔案 |
-stat | -stat [format] <路徑> | 顯示檔案統計資訊 |
-tail | -tail [-f] <檔案> | 檢視檔案尾部資訊 |
-chmod | -chmod [-R] <許可權模式> [路徑] | 修改許可權 |
-chown | -chown [-R] [屬主][:[屬組]] 路徑 | 修改屬主 |
-chgrp | -chgrp [-R] 屬組名稱 路徑 | 修改屬組 |
-help | -help [命令選項] | 幫助 |
注意:以上表格中路徑包括hdfs中的路徑和linux中的路徑。對於容易產生歧義的地方,會特別指出“linux路徑”或者“hdfs路徑”。如果沒有明確指出,意味著是hdfs路徑。