1. 程式人生 > >Mac基於docker的hadoop單機環境搭建

Mac基於docker的hadoop單機環境搭建

1. 下載docker.dmg

2. 執行docker pull拉去centos映象

docker pull centos:centos7

docker run -it centos:centos7 /bin/bash

3. 建立hadoop使用者

4. 把wget,vim,sudo,telnet,openssl server和client還有initscripts都要記得裝上

5. 下載hadoop

wget http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.9.1.tar.gz

6. 下載jdk8

wget wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u91-b14/jdk-8u91-linux-x64.tar.gz

7. 解壓安裝後格式化HDFS:

遇到第一個問題:

[[email protected] hadoop]# bin/hadoop namenode-format

Error: Could not find or load main class namenode-format

原因是namenode後面是有個空格的。。。

之後遇到ssh無法連線問題,yum了openssl以及

yum install initscripts #用以解決functions找不到問題

8. 配置了ssh免密碼登入

ssh-keygen -t rsa

cat id_rsa.pub >> authorized_keys

9. 修改配置檔案:  /etc/hosts, 

修改slaves

localhost

yarn001

修改yarn-site.xml

 <property>

   <name>yarn.nodemanager.aux-services</name>

   <value>mapreduce_shuffle</value>

 </property>

修改hadoop-env.sh

export JAVA_HOME=/home/hadoop/jdk1.8.0_91

修改mapred-site.xml

 <property>

   <name>mapreduce.framework.name</name>

   <value>yarn</value>

 </property>

 <property>

   <name>mapred.job.tracker</name>

   <value>hdfs://yarn001:9001</value>

 </property>

修改core-site.xml

 <property>

   <name>fs.default.name</name>

   <value>hdfs://yarn001:8020</value>

 </property>

修改hdfs-site.xml

 <property>

   <name>dfs.replication</name>

   <value>1</value>

 </property>

 <property>

   <name>fs.default.name</name>

   <value>hdfs://yarn001:8020</value>

 </property>

 <property>

   <name>dfs.namenode.name.dir</name>

   <value>/etc/hadoop/dfs/name</value>

 </property>

 <property>

   <name>dfs.datanode.data.dir</name>

   <value>/etc/hadoop/dfs/data</value>

 </property>


10. 啟動hdfs和yarn

確保sshd已經啟動

sbin/start-all.sh

[[email protected] hadoop]# jps

1536 SecondaryNameNode

2356 Jps

1189 NameNode

2281 DataNode

1771 NodeManager

779 ResourceManager

11. 測試一個mapreduce作業

[[email protected] hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.9.1.jar pi 2 100

Number of Maps  = 2

Samples per Map = 100

17/01/30 16:21:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Wrote input for Map #0

Wrote input for Map #1

Starting Job

17/01/30 16:21:46 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

17/01/30 16:21:47 INFO input.FileInputFormat: Total input paths to process : 2

17/01/30 16:21:47 INFO mapreduce.JobSubmitter: number of splits:2

17/01/30 16:21:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1485792264282_0001

17/01/30 16:21:48 INFO impl.YarnClientImpl: Submitted application application_1485792264282_0001

17/01/30 16:21:48 INFO mapreduce.Job: The url to track the job: http://58d9fc9eb3de:8088/proxy/application_1485792264282_0001/

17/01/30 16:21:48 INFO mapreduce.Job: Running job: job_1485792264282_0001

17/01/30 16:21:56 INFO mapreduce.Job: Job job_1485792264282_0001 running in uber mode : false

17/01/30 16:21:56 INFO mapreduce.Job:  map 0% reduce 0%

17/01/30 16:22:03 INFO mapreduce.Job:  map 50% reduce 0%

17/01/30 16:22:04 INFO mapreduce.Job:  map 100% reduce 0%

17/01/30 16:22:12 INFO mapreduce.Job:  map 100% reduce 100%

17/01/30 16:22:12 INFO mapreduce.Job: Job job_1485792264282_0001 completed successfully

17/01/30 16:22:12 INFO mapreduce.Job: Counters: 49

File System Counters

FILE: Number of bytes read=50

FILE: Number of bytes written=352557

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=522

HDFS: Number of bytes written=215

HDFS: Number of read operations=11

HDFS: Number of large read operations=0

HDFS: Number of write operations=3

Job Counters 

Launched map tasks=2

Launched reduce tasks=1

Data-local map tasks=2

Total time spent by all maps in occupied slots (ms)=10732

Total time spent by all reduces in occupied slots (ms)=5086

Total time spent by all map tasks (ms)=10732

Total time spent by all reduce tasks (ms)=5086

Total vcore-seconds taken by all map tasks=10732

Total vcore-seconds taken by all reduce tasks=5086

Total megabyte-seconds taken by all map tasks=10989568

Total megabyte-seconds taken by all reduce tasks=5208064

Map-Reduce Framework

Map input records=2

Map output records=4

Map output bytes=36

Map output materialized bytes=56

Input split bytes=286

Combine input records=0

Combine output records=0

Reduce input groups=2

Reduce shuffle bytes=56

Reduce input records=4

Reduce output records=0

Spilled Records=8

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=247

CPU time spent (ms)=1700

Physical memory (bytes) snapshot=611356672

Virtual memory (bytes) snapshot=7846543360

Total committed heap usage (bytes)=489684992

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters 

Bytes Read=236

File Output Format Counters 

Bytes Written=97

Job Finished in 25.87 seconds

Estimated value of Pi is 3.12000000000000000000