Ubuntu14.04-Hadoop2.7.1-jdk1.7.0安裝偽分散式
任務1-1
1、建立hadoop使用者
sudo useradd -m hadoop 建立使用者
sudo passwd hadoop 設定密碼
2、安裝配置ssh
安裝ssh server:sudo apt-get install openssh-server
cd ~/.ssh/ # 若沒有該目錄,請先執行一次ssh localhost
ssh-keygen -t rsa # 會有提示,都按回車就可以
cat id_rsa.pub >> authorized_keys # 加入授權
使用ssh localhost試試能否直接登入
3、安裝配置JDK
cd /usr/lib/ 開啟/usr/lib資料夾
sudo mkdir jvm 建立jvm檔案
sudo tar zxvf ~/下載/jdk-8u91-linux-x64.tar.gz -C /usr/lib/jvm
設定JAVA_HOME:
sudo gedit ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_91,儲存退出。
立即生效:source ~/.bashrc
測試JAVA_HOME是否設定成功,輸出了上面設定的路徑表示成功:
echo $JAVA_HOME
sudo apt-get install openjdk-7-jdk
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
java –version
- 安裝Hadoop2.7.1
sudo tar zxvf ~/下載/hadoop-2.7.1.tar.gz -C /usr/local
cd /usr/local/
sudo mv ./hadoop-2.7.1/ ./hadoop # 將資料夾名改為hadoop
sudo chown -R hadoop(當前使用者名稱) ./hadoop # 修改檔案許可權
sudo gedit ~/.bashrc
開啟介面後,在之前配置的JAVA_HOME後面輸入:
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
立即生效:source ~/.bashrc
- 配置偽分散式
切換至配置檔案目錄: cd /usr/local/hadoop/etc/hadoop
sudo gedit core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
sudo gedit hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
sudo gedit yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
mv mapred-site.xml.template mapred-site.xml更換名字
sudo gedit mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 啟動/停止hadoop
格式化。
hdfs namenode -format
start-all.sh 啟動所有的Hadoop守護程序。包括NameNode、Secondary NameNode、DataNode、JobTracker、 TaskTrack
stop-all.sh 停止所有的Hadoop守護程序。包括NameNode、Secondary NameNode、DataNode、JobTracker、 TaskTrack
start-dfs.sh 啟動Hadoop HDFS守護程序NameNode、SecondaryNameNode和DataNode
stop-dfs.sh 停止Hadoop HDFS守護程序NameNode、SecondaryNameNode和DataNode
hadoop-daemons.sh start namenode 單獨啟動NameNode守護程序
hadoop-daemons.sh stop namenode 單獨停止NameNode守護程序
hadoop-daemons.sh start datanode 單獨啟動DataNode守護程序
hadoop-daemons.sh stop datanode 單獨停止DataNode守護程序
hadoop-daemons.sh startsecondarynamenode單獨啟動SecondaryNameNode守護程序
hadoop-daemons.sh stop secondarynamenode 單獨停止SecondaryNameNode守護程序
start-mapred.sh 啟動Hadoop MapReduce守護程序JobTracker和TaskTracker
stop-mapred.sh 停止Hadoop MapReduce守護程序JobTracker和TaskTracker
hadoop-daemons.sh start jobtracker 單獨啟動JobTracker守護程序
hadoop-daemons.sh stop jobtracker 單獨停止JobTracker守護程序
hadoop-daemons.sh start tasktracker 單獨啟動TaskTracker守護程序
hadoop-daemons.sh stop tasktracker 單獨啟動TaskTracker守護程序
jps 檢視
完整程序如下:
2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNam
若執行jps後提示:
程式 'jps' 已包含在下列軟體包中:
* default-jdk
* ecj
* gcj-4.6-jdk
* openjdk-6-jdk
* gcj-4.5-jdk
* openjdk-7-jdk
請嘗試:sudo apt-get install <選定的軟體包>
那麼請執行下面命令,手動設定系統預設JDK:
Sudo update-alternatives --install /usr/bin/jps jps /usr/lib/jvm/jdk1.7.0_79/bin/jps 1
Sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.7.0_79/bin/javac 300
sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.7.0_79/bin/java 300
再次執行jps就不會出現提示了。
任務1-2
啟動Hadoop
hdfs dfs -mkdir -p /user/hadoop (要使用當前使用者的使用者名稱)
hdfs dfs -mkdir -p /input hdfs建立input目錄
hdfs dfs -put ~/下載/dat0102.dat /input/ 將本地檔案dat0102.dat匯入到HDFSinput目錄中
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-
examples-2.7.1.jar grep /input/dat0102.dat/ /output/ "HDFS"
呼叫Hadoop jar包來查詢dat0102.dat中的HDFS欄位出現的次數,並儲存在output目錄下
hdfs dfs -cat /output/part-r-00000 輸出hdfs欄位出現的次數
任務1-3
Hadoop 平臺進行效能調優
sudo gedit yarn-site.xml
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
sudo gedit mapred-site.xml
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx768m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx1536m</value>
</property>
任務2-4
- 安裝hive2.1.1
sudo tar -zxvf ~/下載/apache-hive-2.1.1-bin.tar.gz -C /usr/local
cd /usr/local/
sudo mv apache-hive-2.1.1-bin hive # 將資料夾名改為hive
sudo chown -R hadoop:hadoop hive sudo chmod 774 hadoop # 修改檔案許可權
- 配置hive環境
sudo apt-get install vim 安裝vim
vim ~/.bashrc
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
source ~/.bashrc
- 配置Hive
cd /usr/local/hive/conf
mv hive-env.sh.template hive-env.sh
mv hive-default.xml.template hive-site.xml
mv hive-log4j2.properties.template hive-log4j2.properties
mv hive-exec-log4j2.properties.template hive-exec-log4j2.properties
- 修改hive-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_79 ##Java路徑
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 ##Java路徑
export HADOOP_HOME=/usr/local/hadoop ##Hadoop安裝路徑
export HIVE_HOME=/usr/local/hive ##Hive安裝路徑
export HIVE_CONF_DIR=/usr/local/hive/conf ##Hive配置檔案路徑
- 修改hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
</configuration>
- 安裝並配置mysql
sudo apt-get install mysql-server #安裝mysql
service mysql start 啟動MySQL
service mysql stop 停止MySQL
sudo netstat -tap | grep mysql 檢視是否啟動成功
mysql -u root –p 進入MySQL shell 頁面
- 建立一個 hive 資料庫用來儲存 Hive 元資料,且資料庫訪問的使用者名稱和密碼都為 hive。
mysql> CREATE DATABASE hive;
mysql> USE hive;
mysql> CREATE USER 'hive'@'localhost' IDENTIFIED BY 'hive';
mysql> GRANT ALL ON hive.* TO 'hive'@'localhost' IDENTIFIED BY 'hive';
mysql> GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive';
mysql> FLUSH PRIVILEGES;
mysql> quit;
- 安裝MySQL jdbc包
tar -zxvf ~/下載/mysql-connector-java-5.1.39.tar.gz –c /usr/local/hive解壓
cp /usr/local/hive/mysql-connector-java-5.1.40/mysql-connector-java-5.1.40 -bin.jar /usr/local/hive/lib #將mysql-connector-java-5.1.40-bin.jar拷貝到/usr/local/hive/lib目錄下
- 執行之前先初始化操作
schematool -initSchema -dbType mysql
- 啟動hadoop
start-all.sh
- 啟動hive
1、在Hadoop平臺建立result目錄
hdfs dfs -mkdir -p /result
2、建立Hive資料表(表名為:movie)
create table movie(name string,time string,score string)
row format delimited fields terminated by ',';
3、載入資料
load data local inpath 'home/hadoop/Downloads/dat0204.log' into table movie;
4、查詢資料
select * from movie where time>='2014.1.1' and time<='2014.12.31' order by time;
5、匯入Hadoop平臺的result目錄
insert overwrite directory "/result"
row format delimited fields terminated by ',' select * from movie;
6、jar包
hadoop jar /usr/local/Hadoop/share/Hadoop/tools/lib/hadoop-streaming-2.7.0.jar \
-file ~/Dowloads/ans0203map.py \
-mapper ‘python ans0203map.py’ \
-file ~/Dowloads/ans0203reduce.py \
-reducer ‘python ans0203reduce.py’ \
-input /input/dat0203.log \
-output /output