Hadoop安裝雜記(2)
阿新 • • 發佈:2018-04-23
hadoop 安裝 分布式模型 基礎一、分布式模型
1、環境準備
準備4個節點,master1為主控節點(NameNode、SecondaryNameNode、ResourceManager),master2-4作為數據節點(DataNode、NodeManager)。並做好ntp時間同步
1.1 每個節點配置JAVA環境
[root@master1 ~]# vim /etc/profile.d/java.sh export JAVA_HOME=/usr [root@master1 ~]# scp /etc/profile.d/java.sh root@master2:/etc/profile.d/ [root@master1 ~]# scp /etc/profile.d/java.sh root@master3:/etc/profile.d/ [root@master1 ~]# scp /etc/profile.d/java.sh root@master4:/etc/profile.d/ 每個節點安裝java-devel [root@master1 ~]# yum install -y java-1.7.0-openjdk-devel [root@master2 ~]# yum install -y java-1.7.0-openjdk-devel [root@master3 ~]# yum install -y java-1.7.0-openjdk-devel [root@master4 ~]# yum install -y java-1.7.0-openjdk-devel 配置hadoop環境變量: [root@master1 ~]# vim /etc/profile.d/hadoop.sh export HADOOP_PREFIX=/bdapps/hadoop export PATH=$PATH:${HADOOP_PREFIX}/bin:${HADOOP_PREFIX}/sbin export HADOOP_YARN_HOME=${HADOOP_PREFIX} export HADOOP_MAPPERD_HOME=${HADOOP_PREFIX} export HADOOP_COMMON_HOME=${HADOOP_PREFIX} export HADOOP_HDFS_HOME=${HADOOP_PREFIX} [root@master1 ~]# source /etc/profile.d/hadoop.sh scp /etc/profile.d/hadoop.sh master2:/etc/profile.d/hadoop.sh scp /etc/profile.d/hadoop.sh master3:/etc/profile.d/hadoop.sh scp /etc/profile.d/hadoop.sh master4:/etc/profile.d/hadoop.sh
1.2 每個節點準備host文件,實驗使用別名調用
[root@master1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.201.106.131 master1 master1.com master 10.201.106.132 master2 master2.com 10.201.106.133 master3 master3.com 10.201.106.134 master4 master4.com master2,3,4節點同上
1.3 創建用戶組和用戶
[root@master1 ~]# useradd -g hadoop hadoop
設置用戶密碼:
echo ‘hadoop‘ | passwd --stdin hadoop
master2,3,4節點同上
for i in `seq 2 4`;do ssh root@master${i} "echo ‘hadoop‘ | passwd --stdin hadoop";done
1.4 讓master1(主控節點)的hadoop用戶能通過密鑰登錄master1,2,3,4
[root@master1 ~]# su - hadoop 生成密鑰和公鑰: [hadoop@master1 ~]$ ssh-keygen -t rsa -P ‘hadoop‘ 將master1的公鑰拷貝到master1,2,3,4節點: [hadoop@master1 ~]$ for i in `seq 1 4`;do ssh-copy-id -i .ssh/id_rsa.pub hadoop@master${i};done
2、hadoop安裝
2.1 創建目錄並配置權限
[root@master1 ~]# mkdir -pv /bdapps /data/hadoop/hdfs/{nn,snn,dn}
[root@master1 ~]# chown -R hadoop:hadoop /data/hadoop/hdfs/
展開hadoop:
[root@master1 ~]# tar xf hadoop-2.6.2.tar.gz -C /bdapps/
創建軟鏈接:
[root@master1 ~]# cd /bdapps/
[root@master1 bdapps]# ln -sv hadoop-2.6.2 hadoop
創建日誌目錄,並授權
[root@master1 ~]# cd /bdapps/hadoop
[root@master1 hadoop]# mkdir logs
[root@master1 hadoop]# chmod g+w logs
修改hadoop安裝目錄權限
[root@master1 hadoop]# chown -R hadoop:hadoop ./*
2.2 主節點(master1)配置
[root@master1 ~]# cd /bdapps/hadoop/etc/hadoop/
[root@master1 hadoop]# vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
<final>true</final>
</property>
</configuration>
[root@master1 hadoop]# vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>10.201.106.131:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
</configuration>
[root@master1 hadoop]# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:///data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///data/hadoop/hdfs/snn</value>
</property>
</configuration>
[root@master1 hadoop]# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
[root@master1 hadoop]# vim slaves
master2
master3
master4
2.2 配置三個從節點
[root@master2 ~]# mkdir -pv /bdapps /data/hadoop/hdfs/dn
[root@master2 ~]# chown -R hadoop:hadoop /data/hadoop/hdfs/
[root@master2 ~]# tar xf hadoop-2.6.2.tar.gz -C /bdapps/
[root@master2 bdapps]# ln -sv hadoop-2.6.2 hadoop
[root@master2 bdapps]# cd hadoop
[root@master2 bdapps]# mkdir logs
[root@master2 bdapps]# chmod g+w logs
[root@master2 bdapps]# chown -R hadoop:hadoop ./*
從master1拷貝配置文件:
[root@master1 hadoop]# su - hadoop
[hadoop@master1 ~]$ scp /bdapps/hadoop/etc/hadoop/* master2:/bdapps/hadoop/etc/hadoop/
2.3 格式化文件系統
[hadoop@master1 ~]$ hdfs namenode -format
2.4 啟動mapreduce集群
啟動集群datanode節點
[hadoop@master1 ~]$ start-dfs.sh
Starting namenodes on [master]
The authenticity of host ‘master (10.201.106.131)‘ can‘t be established.
ECDSA key fingerprint is 5e:5d:4d:d2:3f:73:fb:5c:c4:26:c7:c4:85:10:c9:75.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added ‘master‘ (ECDSA) to the list of known hosts.
master: starting namenode, logging to /bdapps/hadoop/logs/hadoop-hadoop-namenode-master1.com.out
master2: starting datanode, logging to /bdapps/hadoop/logs/hadoop-hadoop-datanode-master2.com.out
master4: starting datanode, logging to /bdapps/hadoop/logs/hadoop-hadoop-datanode-master4.com.out
master3: starting datanode, logging to /bdapps/hadoop/logs/hadoop-hadoop-datanode-master3.com.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host ‘0.0.0.0 (0.0.0.0)‘ can‘t be established.
ECDSA key fingerprint is 5e:5d:4d:d2:3f:73:fb:5c:c4:26:c7:c4:85:10:c9:75.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added ‘0.0.0.0‘ (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /bdapps/hadoop/logs/hadoop-hadoop-secondarynamenode-master1.com.out
查看各節點啟動的進程:
[hadoop@master1 ~]$ jps
4977 NameNode
5324 Jps
5155 SecondaryNameNode
[root@master2 hadoop]# su - hadoop
Last login: Sun Apr 22 11:52:57 CST 2018 from master1 on pts/1
[hadoop@master2 ~]$
[hadoop@master2 ~]$
[hadoop@master2 ~]$ jps
9972 DataNode
10131 Jps
確認主節點能夠連接到另外三個從節點
[root@master1 ~]# netstat -tanp | grep 8020
tcp 0 0 10.201.106.131:8020 0.0.0.0:* LISTEN 4977/java
tcp 0 0 10.201.106.131:8020 10.201.106.134:51956 ESTABLISHED 4977/java
tcp 0 0 10.201.106.131:8020 10.201.106.133:36426 ESTABLISHED 4977/java
tcp 0 0 10.201.106.131:8020 10.201.106.132:37988 ESTABLISHED 4977/java
上傳文件測試:
[hadoop@master1 ~]$ hdfs dfs -mkdir /test
[hadoop@master1 ~]$ hdfs dfs -put /etc/fstab /test/fstab
[hadoop@master1 ~]$ hdfs dfs -ls /test/fstab
-rw-r--r-- 2 hadoop supergroup 1065 2018-04-23 03:00 /test/fstab
真實文件路徑:
[hadoop@master2 logs]$ cat /data/hadoop/hdfs/dn/current/BP-1262978243-10.201.106.131-1524421803827/current/finalized/subdir0/subdir0/blk_1073741827
[hadoop@master4 ~]$ cat /data/hadoop/hdfs/dn/current/BP-1262978243-10.201.106.131-1524421803827/current/finalized/subdir0/subdir0/blk_1073741827
2.5 啟動yarn集群
[hadoop@master1 ~]$ start-yarn.sh
#主節點啟動了ResourceManager
starting yarn daemons
starting resourcemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-resourcemanager-master1.com.out
master3: starting nodemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-nodemanager-master3.com.out
master4: starting nodemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-nodemanager-master4.com.out
master2: starting nodemanager, logging to /bdapps/hadoop/logs/yarn-hadoop-nodemanager-master2.com.out
[hadoop@master1 ~]$ jps
5919 ResourceManager
4977 NameNode
5155 SecondaryNameNode
6190 Jps
從節點啟動了NodeManager
[hadoop@master2 logs]$ jps
10243 DataNode
10508 Jps
10405 NodeManager
[hadoop@master3 ~]$ jps
9380 DataNode
9696 NodeManager
9796 Jps
2.6 查看WEB界面狀態
瀏覽器訪問:http://10.201.106.131:8088
瀏覽器訪問:http://10.201.106.131:50070
3、其他操作
3.1 上傳大文件觀察切塊
生成一個200M文件:
[hadoop@master1 ~]$ dd if=/dev/zero of=test bs=1M count=200
原始圖:
上傳後(超過64M後會切塊):
3.2 通過瀏覽器查看日誌
訪問:http://10.201.106.131:50070/logs/
3.2 運行任務測試
列出該測試程序的可用示例:
[hadoop@master1 ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar
統計文件單詞數:
[hadoop@master1 ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /test/fstab /test/functions /test/wordou
查看:
查看任務進度:
查看結果 :
[hadoop@master1 ~]$ hdfs dfs -ls /test/wordout
Found 2 items
-rw-r--r-- 2 hadoop supergroup 0 2018-04-23 06:56 /test/wordout/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 7855 2018-04-23 06:56 /test/wordout/part-r-00000
4、yarn集群管理命令
4.1 查看yarn的所有命令
[hadoop@master1 ~]$ yarn
4.2 application
4.2.1 查看作業
列出活動作業:
[hadoop@master1 ~]$ yarn application -list
18/04/23 07:23:47 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8032
Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):0
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
列出所有作業:
[hadoop@master1 ~]$ yarn application -list -appStates=all
18/04/23 07:24:48 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8032
Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED]):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1524424153008_0001 word count MAPREDUCE hadoop default FINISHED SUCCEEDED 100% http://master2:19888/jobhistory/job/job_1524424153008_0001
[hadoop@master1 ~]$
查看作業狀態:
[hadoop@master1 ~]$ yarn application -status application_1524424153008_0001
18/04/23 07:28:32 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8032
Application Report :
Application-Id : application_1524424153008_0001
Application-Name : word count
Application-Type : MAPREDUCE
User : hadoop
Queue : default
Start-Time : 1524437422005
Finish-Time : 1524437801216
Progress : 100%
State : FINISHED
Final-State : SUCCEEDED
Tracking-URL : http://master2:19888/jobhistory/job/job_1524424153008_0001
RPC Port : 40927
AM Host : master2
Aggregate Resource Allocation : 1326835 MB-seconds, 909 vcore-seconds
Diagnostics :
[hadoop@master1 ~]$
4.3 node
列出node列表:
[hadoop@master1 ~]$ yarn node -list
18/04/23 07:33:37 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
master4:47410 RUNNING master4:8042 0
master3:55126 RUNNING master3:8042 0
master2:54307 RUNNING master2:8042 0
列出所有node節點,包括故障下線的:
[hadoop@master1 ~]$ yarn node -list -all
查看指定節點狀態信息:
[hadoop@master1 ~]$ yarn node -status master2:54307
18/04/23 07:41:01 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8032
Node Report :
Node-Id : master2:54307
Rack : /default-rack
Node-State : RUNNING
Node-Http-Address : master2:8042
Last-Health-Update : Sun 22/Apr/18 10:06:49:900CST
Health-Report :
Containers : 0
Memory-Used : 0MB
Memory-Capacity : 8192MB
CPU-Used : 0 vcores
CPU-Capacity : 8 vcores
Node-Labels :
4.4 logs
在yarn-site.xml配置文件定義yarn.log-aggregation-enable屬性的值為true即可。
需要重啟集群
查看作業日誌
[hadoop@master1 ~]$ yarn logs -applicationId application_1524424153008_0001
4.5 classpath
查看java環境路徑:
[hadoop@master1 ~]$ yarn classpath
/bdapps/hadoop/etc/hadoop:/bdapps/hadoop/etc/hadoop:/bdapps/hadoop/etc/hadoop:/bdapps/hadoop/share/hadoop/common/lib/*:/bdapps/hadoop/share/hadoop/common/*:/bdapps/hadoop/share/hadoop/hdfs:/bdapps/hadoop/share/hadoop/hdfs/lib/*:/bdapps/hadoop/share/hadoop/hdfs/*:/bdapps/hadoop/share/hadoop/yarn/lib/*:/bdapps/hadoop/share/hadoop/yarn/*:/bdapps/hadoop/share/hadoop/mapreduce/lib/*:/bdapps/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/bdapps/hadoop/share/hadoop/yarn/*:/bdapps/hadoop/share/hadoop/yarn/lib/*
yarn的管理命令
4.6 rmadmin
4.6.1 命令幫助
獲取命令幫助:
[hadoop@master1 ~]$ yarn rmadmin -help
4.6.2 刷新node節點狀態信息
[hadoop@master1 ~]$ yarn rmadmin -refreshNodes
18/04/23 07:54:48 INFO client.RMProxy: Connecting to ResourceManager at master/10.201.106.131:8033
4.7 運行YARN Application流程
1、Application初始化及體檢;
2、分配內存並啟動AM;
3、AM註冊及資源分配;
4、啟動並監控容易;
5、Application進度報告;
6、Application運行完成;
5、其他
5.1 官方自動安裝部署hadoop工具:Ambari
Hadoop安裝雜記(2)