1. 程式人生 > >Ubuntu14.04-Hadoop2.7.1-jdk1.7.0安裝偽分散式

Ubuntu14.04-Hadoop2.7.1-jdk1.7.0安裝偽分散式

任務1-1

1、建立hadoop使用者

sudo useradd -m hadoop   建立使用者

sudo passwd hadoop  設定密碼

2、安裝配置ssh

安裝ssh server:sudo apt-get install openssh-server

cd ~/.ssh/       # 若沒有該目錄,請先執行一次ssh localhost

ssh-keygen -t rsa         # 會有提示,都按回車就可以

cat id_rsa.pub >> authorized_keys  # 加入授權

使用ssh localhost試試能否直接登入

3、安裝配置JDK

cd /usr/lib/  開啟/usr/lib資料夾

sudo mkdir jvm  建立jvm檔案

sudo tar zxvf ~/下載/jdk-8u91-linux-x64.tar.gz -C /usr/lib/jvm

設定JAVA_HOME:

sudo gedit ~/.bashrc

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_91,儲存退出。

立即生效:source ~/.bashrc

測試JAVA_HOME是否設定成功,輸出了上面設定的路徑表示成功:

echo $JAVA_HOME

sudo apt-get install openjdk-7-jdk

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

java –version

  1. 安裝Hadoop2.7.1

sudo tar zxvf  ~/下載/hadoop-2.7.1.tar.gz -C /usr/local

cd /usr/local/

sudo mv ./hadoop-2.7.1/ ./hadoop    # 將資料夾名改為hadoop

sudo chown -R hadoop(當前使用者名稱) ./hadoop      # 修改檔案許可權

sudo gedit ~/.bashrc

開啟介面後,在之前配置的JAVA_HOME後面輸入:

export HADOOP_INSTALL=/usr/local/hadoop

export PATH=$PATH:$HADOOP_INSTALL/bin

export PATH=$PATH:$HADOOP_INSTALL/sbin

export HADOOP_MAPRED_HOME=$HADOOP_INSTALL

export HADOOP_COMMON_HOME=$HADOOP_INSTALL

export HADOOP_HDFS_HOME=$HADOOP_INSTALL

export YARN_HOME=$HADOOP_INSTALL

立即生效:source ~/.bashrc

  1. 配置偽分散式

    切換至配置檔案目錄:  cd /usr/local/hadoop/etc/hadoop

sudo gedit core-site.xml

<configuration>

    <property>

        <name>hadoop.tmp.dir</name>

        <value>file:/usr/local/hadoop/tmp</value>

        <description>Abase for other temporary directories.</description>

    </property>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://localhost:9000</value>

    </property>

</configuration>

sudo gedit hdfs-site.xml

<configuration>

    <property>

        <name>dfs.replication</name>

        <value>1</value>

    </property>

    <property>

        <name>dfs.namenode.name.dir</name>

        <value>file:/usr/local/hadoop/tmp/dfs/name</value>

    </property>

    <property>

        <name>dfs.datanode.data.dir</name>

        <value>file:/usr/local/hadoop/tmp/dfs/data</value>

    </property>

</configuration>

sudo gedit yarn-site.xml

<configuration>

  <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

  </property>

  <property>

    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

    <value>org.apache.hadoop.mapred.ShuffleHandler</value>

  </property>

</configuration>

mv mapred-site.xml.template mapred-site.xml更換名字

sudo gedit mapred-site.xml

<configuration>

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn</value>

  </property>

</configuration>

  1. 啟動/停止hadoop

格式化。

hdfs namenode -format

start-all.sh 啟動所有的Hadoop守護程序。包括NameNode、Secondary NameNode、DataNode、JobTracker、 TaskTrack

stop-all.sh 停止所有的Hadoop守護程序。包括NameNode、Secondary NameNode、DataNode、JobTracker、 TaskTrack

start-dfs.sh 啟動Hadoop HDFS守護程序NameNode、SecondaryNameNode和DataNode

stop-dfs.sh 停止Hadoop HDFS守護程序NameNode、SecondaryNameNode和DataNode

hadoop-daemons.sh start namenode 單獨啟動NameNode守護程序

hadoop-daemons.sh stop namenode 單獨停止NameNode守護程序

hadoop-daemons.sh start datanode 單獨啟動DataNode守護程序

hadoop-daemons.sh stop datanode 單獨停止DataNode守護程序

hadoop-daemons.sh startsecondarynamenode單獨啟動SecondaryNameNode守護程序

hadoop-daemons.sh stop secondarynamenode 單獨停止SecondaryNameNode守護程序

start-mapred.sh 啟動Hadoop MapReduce守護程序JobTracker和TaskTracker

stop-mapred.sh 停止Hadoop MapReduce守護程序JobTracker和TaskTracker

hadoop-daemons.sh start jobtracker 單獨啟動JobTracker守護程序

hadoop-daemons.sh stop jobtracker 單獨停止JobTracker守護程序

hadoop-daemons.sh start tasktracker 單獨啟動TaskTracker守護程序

hadoop-daemons.sh stop tasktracker 單獨啟動TaskTracker守護程序

jps 檢視

完整程序如下:

2583 DataNode

2970 ResourceManager

3461 Jps

3177 NodeManager

2361 NameNode

2840 SecondaryNam

若執行jps後提示:

程式 'jps' 已包含在下列軟體包中:

* default-jdk

* ecj

* gcj-4.6-jdk

* openjdk-6-jdk

* gcj-4.5-jdk

* openjdk-7-jdk

請嘗試:sudo apt-get install <選定的軟體包>

那麼請執行下面命令,手動設定系統預設JDK:

Sudo update-alternatives --install /usr/bin/jps jps /usr/lib/jvm/jdk1.7.0_79/bin/jps 1

Sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.7.0_79/bin/javac 300

sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.7.0_79/bin/java 300

再次執行jps就不會出現提示了。

任務1-2

啟動Hadoop

hdfs dfs -mkdir -p /user/hadoop (要使用當前使用者的使用者名稱)

hdfs dfs -mkdir -p /input   hdfs建立input目錄

hdfs dfs -put ~/下載/dat0102.dat /input/  將本地檔案dat0102.dat匯入到HDFSinput目錄中

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-

examples-2.7.1.jar grep /input/dat0102.dat/ /output/ "HDFS"  

呼叫Hadoop jar包來查詢dat0102.dat中的HDFS欄位出現的次數,並儲存在output目錄下

hdfs dfs -cat /output/part-r-00000  輸出hdfs欄位出現的次數

任務1-3

Hadoop 平臺進行效能調優

sudo gedit yarn-site.xml

<property>

  <name>yarn.scheduler.maximum-allocation-mb</name>

  <value>2048</value>

</property>

sudo gedit mapred-site.xml

<property>

  <name>mapreduce.map.memory.mb</name>

  <value>1024</value>

</property>

<property>

  <name>mapreduce.reduce.memory.mb</name>

  <value>2048</value>

</property>

<property>

  <name>mapreduce.map.java.opts</name>

  <value>-Xmx768m</value>

  </property>

<property>

  <name>mapreduce.reduce.java.opts</name>

  <value>-Xmx1536m</value>

</property>

任務2-4

  1. 安裝hive2.1.1

sudo tar -zxvf ~/下載/apache-hive-2.1.1-bin.tar.gz -C /usr/local

    cd /usr/local/

sudo mv apache-hive-2.1.1-bin hive       # 將資料夾名改為hive

sudo chown -R hadoop:hadoop hive sudo chmod 774 hadoop # 修改檔案許可權

  1. 配置hive環境

sudo apt-get install vim   安裝vim

vim ~/.bashrc

export HIVE_HOME=/usr/local/hive

export PATH=$PATH:$HIVE_HOME/bin

source ~/.bashrc

  1. 配置Hive

cd /usr/local/hive/conf

mv hive-env.sh.template hive-env.sh

mv hive-default.xml.template hive-site.xml

mv hive-log4j2.properties.template hive-log4j2.properties

mv hive-exec-log4j2.properties.template hive-exec-log4j2.properties

  1. 修改hive-env.sh

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_79    ##Java路徑

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64    ##Java路徑

export HADOOP_HOME=/usr/local/hadoop   ##Hadoop安裝路徑

export HIVE_HOME=/usr/local/hive    ##Hive安裝路徑

export HIVE_CONF_DIR=/usr/local/hive/conf    ##Hive配置檔案路徑

  1. 修改hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <property>

    <name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>

    <description>JDBC connect string for a JDBC metastore</description>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionDriverName</name>

    <value>com.mysql.jdbc.Driver</value>

    <description>Driver class name for a JDBC metastore</description>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionUserName</name>

    <value>hive</value>

    <description>username to use against metastore database</description>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionPassword</name>

    <value>hive</value>

    <description>password to use against metastore database</description>

  </property>

</configuration>

  1. 安裝並配置mysql

sudo apt-get install mysql-server  #安裝mysql

service mysql start  啟動MySQL

service mysql stop  停止MySQL

sudo netstat -tap | grep mysql  檢視是否啟動成功

mysql -u root –p 進入MySQL shell 頁面

  1. 建立一個 hive 資料庫用來儲存 Hive 元資料,且資料庫訪問的使用者名稱和密碼都為 hive

mysql> CREATE DATABASE hive;

mysql> USE hive;

mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';

mysql> GRANT ALL ON hive.* TO 'hive'@'localhost' IDENTIFIED BY 'hive';

mysql> GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive';

mysql> FLUSH PRIVILEGES;

mysql> quit;

  1. 安裝MySQL jdbc

tar -zxvf ~/下載/mysql-connector-java-5.1.39.tar.gz –c /usr/local/hive解壓

cp /usr/local/hive/mysql-connector-java-5.1.40/mysql-connector-java-5.1.40 -bin.jar  /usr/local/hive/lib          #將mysql-connector-java-5.1.40-bin.jar拷貝到/usr/local/hive/lib目錄下

  1. 執行之前先初始化操作 

schematool -initSchema -dbType mysql

  1. 啟動hadoop

start-all.sh

  1. 啟動hive

1、在Hadoop平臺建立result目錄

hdfs dfs -mkdir -p /result

2、建立Hive資料表(表名為:movie)

create table movie(name string,time string,score string)

row format delimited fields terminated by ',';

3、載入資料

load data local inpath 'home/hadoop/Downloads/dat0204.log' into table movie;

4、查詢資料

select * from movie where time>='2014.1.1' and time<='2014.12.31' order by time;

5、匯入Hadoop平臺的result目錄

insert overwrite directory "/result"

row format delimited fields terminated by ',' select * from movie;

6、jar包

hadoop jar /usr/local/Hadoop/share/Hadoop/tools/lib/hadoop-streaming-2.7.0.jar \

-file ~/Dowloads/ans0203map.py \

-mapper ‘python ans0203map.py’ \

-file ~/Dowloads/ans0203reduce.py \

-reducer ‘python ans0203reduce.py’ \

-input /input/dat0203.log \

-output /output