Hadoop安裝雜記(1)
阿新 • • 發佈:2018-04-22
hadoop 安裝 偽分布式模型 基礎一、Hadoop基礎
1、偽分布式模型(單節點)
1.1 配置centos7默認JDK1.7的環境變量
[root@master1 ~]# vim /etc/profile.d/java.sh
i
export JAVA_HOME=/usr
[root@master1 ~]# source /etc/profile.d/java.sh
安裝jdk-devl包:
[root@master1 ~]# yum install java-1.7.0-openjdk-devel.x86_64
1.2 創建hadoop目錄,並將hadoop展開至目錄
[root@master1 ~]# mkdir /bdapps [root@master1 ~]# tar xf hadoop-2.6.2.tar.gz -C /bdapps/ [root@master1 ~]# cd /bdapps/ 創建軟鏈接: [root@master1 bdapps]# ln -sv hadoop-2.6.2 hadoop
1.2 設置hadoop環境變量
[root@master1 hadoop]# vim /etc/profile.d/hadoop.sh export HADOOP_PREFIX=/bdapps/hadoop export PATH=$PATH:${HADOOP_PREFIX}/bin:${HADOOP_PREFIX}/sbin export HADOOP_YARN_HOME=${HADOOP_PREFIX} export HADOOP_MAPPERD_HOME=${HADOOP_PREFIX} export HADOOP_COMMON_HOME=${HADOOP_PREFIX} export HADOOP_HDFS_HOME=${HADOOP_PREFIX} 重載文件: [root@master1 ~]# source /etc/profile.d/hadoop.sh
1.3 創建運行Hadoop進程的用戶和相關目錄
創建組 [root@master1 ~]# groupadd hadoop 創建用戶,劃入hadoop組 [root@master1 ~]# useradd -g hadoop yarn [root@master1 ~]# useradd -g hadoop hdfs [root@master1 ~]# useradd -g hadoop mapred 創建數據目錄: [root@master1 ~]# mkdir -pv /data/hadoop/hdfs/{nn,snn,dn} 數據目錄授權: [root@master1 ~]# chown -R hdfs:hadoop /data/hadoop/hdfs [root@master1 ~]# ll /data/hadoop/hdfs total 0 drwxr-xr-x 2 hdfs hadoop 6 Apr 19 08:44 dn drwxr-xr-x 2 hdfs hadoop 6 Apr 19 08:44 nn drwxr-xr-x 2 hdfs hadoop 6 Apr 19 08:44 snn 創建日誌目錄並配置用戶權限(在安裝目錄下配置): [root@master1 ~]# cd /bdapps/hadoop [root@master1 hadoop]# mkdir logs [root@master1 hadoop]# chmod g+w logs/ [root@master1 hadoop]# chown -R yarn:hadoop logs [root@master1 hadoop]# ll | grep log drwxrwxr-x 2 yarn hadoop 6 Apr 19 08:47 logs 修改安裝目錄屬主屬組 [root@master1 hadoop]# chown -R yarn:hadoop ./*
1.4 配置hadoop
配置NS:
[root@master1 hadoop]# pwd
/bdapps/hadoop/etc/hadoop
[root@master1 hadoop]# vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
<final>true</final>
</property>
</configuration>
配置hdfs相關屬性:
[root@master1 hadoop]# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/dn</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>file:///data/hadoop/hdfs/snn</value>
</property>
<property>
<name>fs.checkpoint.edits.dir</name>
<value>file:///data/hadoop/hdfs/snn</value>
</property>
</configuration>
配置mapred(MapReduce)
[root@master1 hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@master1 hadoop]# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置yarn:
[root@master1 hadoop]# vim yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>10.201.106.131:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
</configuration>
1.5 定義從節點,偽分布模式默認從節點是自己,不用定義
[root@master1 hadoop]# cat slaves
localhost
1.6 格式化HDFS
切換hdfs用戶:
[root@master1 ~]# su - hdfs
hdfs命令查看幫助:
[hdfs@master1 ~]$ hdfs --help
格式化:
[hdfs@master1 ~]$ hdfs namenode -format
查看:
[hdfs@master1 ~]$ ls /data/hadoop/hdfs/nn/current/
fsimage_0000000000000000000 seen_txid
fsimage_0000000000000000000.md5 VERSION
1.7 啟動hadoop
1.7.1 mapreduce相關啟動
以hdfs用戶啟動相關進程:
啟動名稱節點:
[hdfs@master1 ~]$ hadoop-daemon.sh start namenode
查看java進程:
[hdfs@master1 ~]$ jps
9127 NameNode
9220 Jps
查看詳細java進程信息:
[hdfs@master1 ~]$ jps -v
啟動輔助名稱節點:
[hdfs@master1 ~]$ hadoop-daemon.sh start secondarynamenode
啟動data節點:
[hdfs@master1 ~]$ hadoop-daemon.sh start datanode
遠程上傳文件測試:
[hdfs@master1 ~]$ hdfs dfs -mkdir /test
[hdfs@master1 ~]$ hdfs dfs -put /etc/fstab /test/fstab
[hdfs@master1 ~]$ hdfs dfs -ls /test
Found 1 items
-rw-r--r-- 1 hdfs supergroup 1065 2018-04-20 15:04 /test/fstab
這個就是剛才上傳fstab文件
[hdfs@master1 ~]$ cat /data/hadoop/hdfs/dn/current/BP-908063675-10.201.106.131-1524136482474/current/finalized/subdir0/subdir0/blk_1073741825
本地宿主機存放數據的目錄(文件系統):
[hdfs@master1 ~]$ ls /data/hadoop/hdfs/dn/current/
BP-908063675-10.201.106.131-1524136482474 VERSION
1.7.2 yarn集群啟動
切換到yarn用戶:
[root@master1 ~]# su - yarn
啟動resourcemanager:
[yarn@master1 ~]$ yarn-daemon.sh start resourcemanager
啟動nodemanager:
[yarn@master1 ~]$ yarn-daemon.sh start nodemanager
1.8 查看hadoop狀態
瀏覽器訪問:http://10.201.106.131:50070
瀏覽器訪問:http://10.201.106.131:8088
1.9 hadoop上提交程序並運行
1.9.1 運行mapreduce測試程序
切換用戶:
[root@master1 mapreduce]# su - hdfs
運行測試程序:
[hdfs@master1 ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar
統計單詞個數:
[hdfs@master1 ~]$ yarn jar /bdapps/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /test/fstab /test/fstab.out
查看統計結果:
[hdfs@master1 ~]$ hdfs dfs -cat /test/fstab.out/part-r-00000
Hadoop安裝雜記(1)