1. 程式人生 > >Hadoop1.0之集群搭建

Hadoop1.0之集群搭建

右鍵 等了 .html redirect sbin access save then 系列

VirtualBox虛擬機

下載地址

下載擇操作系統對應的基礎安裝包
下載擴展包(不區分操作系統)

http://www.oracle.com/technetwork/cn/server-storage/virtualbox/downloads/index.html

安裝基礎包

按照提示安裝即可

擴展包安裝

1 先安裝基礎包

2 安裝擴展包

打開虛擬機 -> 管理 -> 全局設定 -> 擴展 -> 點擊右邊的加號 -> 選擇下載的擴展包文件,按照提示安裝即可


VBox安裝CentOS7

下載minimalISO

http://isoredirect.centos.org/centos/7/isos/x86_64/CentOS-7-x86_64-Minimal-1804.iso

從鏡像列表裏選擇離自己近的鏡像下載,我選擇的是網易163源

虛擬機配置規劃

CPU 2核,內存 1G,SWAP:2G,硬盤 40G(動態增長,非預先分配固定模式)

創建虛擬機

新建 -> 按照提示操作即可

安裝CentOS7操作系統

創建包含操作系統iso文件的光盤

選中剛才創建的虛擬機 -> 設置 -> 存儲 -> 點擊+號,然後選擇CentOS7 ISO文件

安裝操作系統

啟動虛擬機,會顯示圖形化安裝界面,按照提示操作即可,會提示一系列的設置,硬盤分區我選擇的自動分區。
各種設置都完成後,點擊安裝,等待一段時間,安裝需要一些時間,我的機器上,大概等了十幾分鐘:)

配置虛擬機網絡

最好選擇NAT網絡模式

安裝依賴的包

因為是安裝的是最小化ISO,有些基礎的包都沒有

yum install gcc wget lrzsz vim

問題

1 NAT網絡虛擬機可以ping通主機,但主機ping不通虛擬機

采用了以下辦法都不能解決(PS:以前是可以的)

1.關閉主機,虛擬機防火墻
2.重新安裝VBox和虛擬機

折騰了幾小時,未找到原因,先暫時使用橋接模式,後面再看看能不能想到是什麽原因


Hadoop集群環境搭建

集群規劃

使用3臺虛擬機
1臺master, ip地址:192.168.1.15
2臺slave, slave1 ip地址:192.168.1.15, slave2 ip地址:192.168.1.16

實際環境中,namenode需要多分配內存,datanode需要多分配硬盤空間

master虛擬機操作

安裝java

下載地址

選擇安裝java8
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

解壓

tar -xvzf jdk-8u181-linux-x64.tar.gz

設置環境變量

export JAVA_HOME=/usr/local/src/jdk1.8.0_181
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin

環境變量生效

source ~/.bashrc

問題

  1. x86,x64是什麽意思?
    x86:32位,x64:64位

2.選擇tar.gz還是rpm?
這個看個人喜好吧,我選擇的tar.gz,需要單獨配置java相關的環境變量

hadoop 1.2.1安裝

解壓

[root@localhost src]# tar -xvzf hadoop-1.2.1-bin.tar.gz

創建tmp目錄

[root@localhost src]# cd hadoop-1.2.1
[root@localhost hadoop-1.2.1]# mkdir tmp

配置

進入conf目錄

[root@localhost hadoop-1.2.1]# cd conf
[root@localhost conf]# pwd
/usr/local/src/hadoop-1.2.1/conf
  1. 配置masters
[root@localhost conf]# vim masters

master
  1. 配置slaves
[root@localhost conf]# vim slaves

slave1
slave2
  1. 配置core-site.xml
vim core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/src/hadoop-1.2.1/tmp</value>
</property>
<property>
        <name>fs.default.name</name>
        <value>hdfs://192.168.1.15:9000</value>
        </property>
</configuration>
  1. 配置mapred-site.xml
[root@localhost conf]# vim mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>mapred.job.tracker</name>
                <value>http://192.168.1.15:9001</value>
        </property>
</configuration>
  1. 配置hdfs-site.xml
[root@localhost conf]# vim hdfs-site.xml

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
</configuration>

6.配置hadoop-env.sh

[root@localhost conf]# vim hadoop-env.sh

# 增加
export JAVA_HOME=/usr/local/src/jdk1.8.0_181
  1. 配置hosts
[root@localhost conf]# vim /etc/hosts
192.168.1.15 master
192.168.1.16 slave1
192.168.1.17 slave2
  1. 配置hostname
[root@localhost conf]# hostnamectl set-hostname master
[root@localhost conf]# hostnamectl status
   Static hostname: master
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 8751162d551a426393cd5e5c2fadf3d3
           Boot ID: 4d3093f75e514da399ff522bea8b420f
    Virtualization: kvm
  Operating System: CentOS Linux 7 (Core)
       CPE OS Name: cpe:/o:centos:centos:7
            Kernel: Linux 3.10.0-862.el7.x86_64
      Architecture: x86-64

slave1虛擬機操作

創建

從master克隆一份(克隆之前,先退出虛擬機)

選中master虛擬機->點擊鼠標右鍵->復制->設置虛擬名(勾選重新初始化所有網卡地址)->接下來的步驟按提示操作

設置hostname

hostnamectl set-hostname slave2

slave2虛擬機操作

操作同slave1,只不過主機名設置為slave2

虛擬機間建立互信,實現免密碼登錄

1.三臺機分別生成rsa非對稱秘鑰

# master
[wadeyu@master ~]$ su root
Password: 
[root@master wadeyu]# ssh-keygen

# slave1
[wadeyu@slave1 ~]$ su root
Password: 
[root@slave1 wadeyu]# ssh-keygen

# slave2
[wadeyu@slave2 ~]$ su root
Password: 
[root@slave2 wadeyu]# ssh-keygen

2.保存公鑰到~/.ssh/authorized_keys文件中

# master機器操作

[root@master wadeyu]# cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

追加slave1和slave2的公鑰到這個文件中
[root@master wadeyu]# scp slave1:~/.ssh/id_rsa.pub ~/slave1_id_rsa.pub
[root@master wadeyu]# scp slave2:~/.ssh/id_rsa.pub ~/slave2_id_rsa.pub
[root@master wadeyu]# cat ~/slave1_id_rsa.pub >> ~/.ssh/authorized_keys 
[root@master wadeyu]# cat ~/slave2_id_rsa.pub >> ~/.ssh/authorized_keys

復制文件~/.ssh/authorized_keys到slave1,slave2
[root@master wadeyu]# scp ~/.ssh/authorized_keys slave1:~/.ssh
root@slave1's password: 
authorized_keys                                                                                              100% 1179   458.2KB/s   00:00    
[root@master wadeyu]# scp ~/.ssh/authorized_keys slave2:~/.ssh
root@slave2's password: 
authorized_keys 

其它操作(每臺虛擬機)

為了減少系統配置對集群的影響,學習環境關閉防火墻和selinux

1.關閉防火墻

[root@master wadeyu]# systemctl stop firewalld
[root@master wadeyu]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Sat 2018-09-01 11:26:29 CST; 5s ago
     Docs: man:firewalld(1)
  Process: 635 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 635 (code=exited, status=0/SUCCESS)

Sep 01 10:23:16 master systemd[1]: Starting firewalld - dynamic firewall daemon...
Sep 01 10:23:18 master systemd[1]: Started firewalld - dynamic firewall daemon.
Sep 01 11:26:21 master systemd[1]: Stopping firewalld - dynamic firewall daemon...
Sep 01 11:26:29 master systemd[1]: Stopped firewalld - dynamic firewall daemon.
  1. 關閉selinux
[root@master wadeyu]# getenforce
Enforcing
[root@master wadeyu]# setenforce 0
[root@master wadeyu]# getenforce
Permissive

啟動集群

master節點操作,進入hadoop/bin目錄

  1. 第一次啟動需要對hadoop格式化
[root@master wadeyu]# cd /usr/local/src/hadoop-1.2.1
hadoop-1.2.1/            hadoop-1.2.1-bin.tar.gz  
[root@master wadeyu]# cd /usr/local/src/hadoop-1.2.1
[root@master hadoop-1.2.1]# cd /usr/local/src/hadoop-1.2.1/bin
[root@master bin]# ./hadoop namenode -format
18/09/01 11:37:07 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/192.168.1.15
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.8.0_181
************************************************************/
18/09/01 11:37:08 INFO util.GSet: Computing capacity for map BlocksMap
18/09/01 11:37:08 INFO util.GSet: VM type       = 64-bit
18/09/01 11:37:08 INFO util.GSet: 2.0% max memory = 1013645312
18/09/01 11:37:08 INFO util.GSet: capacity      = 2^21 = 2097152 entries
18/09/01 11:37:08 INFO util.GSet: recommended=2097152, actual=2097152
18/09/01 11:37:08 INFO namenode.FSNamesystem: fsOwner=root
18/09/01 11:37:08 INFO namenode.FSNamesystem: supergroup=supergroup
18/09/01 11:37:08 INFO namenode.FSNamesystem: isPermissionEnabled=true
18/09/01 11:37:08 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
18/09/01 11:37:08 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
18/09/01 11:37:08 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
18/09/01 11:37:08 INFO namenode.NameNode: Caching file names occuring more than 10 times 
18/09/01 11:37:09 INFO common.Storage: Image file /usr/local/src/hadoop-1.2.1/tmp/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
18/09/01 11:37:09 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/usr/local/src/hadoop-1.2.1/tmp/dfs/name/current/edits
18/09/01 11:37:09 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/usr/local/src/hadoop-1.2.1/tmp/dfs/name/current/edits
18/09/01 11:37:09 INFO common.Storage: Storage directory /usr/local/src/hadoop-1.2.1/tmp/dfs/name has been successfully formatted.
18/09/01 11:37:09 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.15
************************************************************/

2.啟動所有節點

[root@master bin]# ./start-all.sh 
starting namenode, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-wadeyu-namenode-master.out
slave2: starting datanode, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-slave2.out
slave1: starting datanode, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-slave1.out
The authenticity of host 'master (192.168.1.15)' can't be established.
ECDSA key fingerprint is SHA256:8DvdHBlcz1qInlLa9k2iYyd4Ip7auPhcb0mjHbEwZmo.
ECDSA key fingerprint is MD5:9e:33:01:d2:fb:9c:dc:4f:40:30:90:fe:37:6e:1f:33.
Are you sure you want to continue connecting (yes/no)? yes
master: Warning: Permanently added 'master,192.168.1.15' (ECDSA) to the list of known hosts.
master: starting secondarynamenode, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-master.out
starting jobtracker, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-wadeyu-jobtracker-master.out
slave1: starting tasktracker, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-slave1.out
slave2: starting tasktracker, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-slave2.out

3.查看集群狀態

# master
[root@master bin]# jps
2116 JobTracker
2232 Jps
1883 NameNode
2044 SecondaryNameNode

# slave1
[root@master bin]# ssh slave1
Last login: Sat Sep  1 11:20:05 2018 from slave2
[root@slave1 ~]# jps
3936 Jps
1617 TaskTracker
1538 DataNode

#slave2
[root@slave1 ~]# exit
logout
Connection to slave1 closed.
[root@master bin]# ssh slave2
Last login: Sat Sep  1 11:20:24 2018 from slave1
[root@slave2 ~]# jps
3774 TaskTracker
3695 DataNode
3871 Jps

4.hadoop文件操作示例

# 查看/
[root@master bin]# ./hadoop fs -ls /
Found 1 items
drwxr-xr-x   - root supergroup          0 2018-09-01 11:38 /usr

# 上傳文件
[root@master bin]# ./hadoop fs -put /etc/passwd /
[root@master bin]# ./hadoop fs -ls /
Found 2 items
-rw-r--r--   3 root supergroup        847 2018-09-01 11:44 /passwd
drwxr-xr-x   - root supergroup          0 2018-09-01 11:38 /usr

# 查看文件內容
[root@master bin]# ./hadoop fs -cat /passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:998:User for polkitd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
wadeyu:x:1000:1000:wadeyu:/home/wadeyu:/bin/bash

說明

  1. 虛擬機使用了橋接模式連接,我在路由器增加了虛擬機mac地址和ip的綁定,所以虛擬機沒有固定ip

參考資料

【0】八鬥學院內部學習資料

Hadoop1.0之集群搭建