1. 程式人生 > >自己的HADOOP平臺(三):Mysql+hive遠端模式+Spark on Yarn

自己的HADOOP平臺(三):Mysql+hive遠端模式+Spark on Yarn

Spark和hive配置較為簡單,為了方便Spark對資料的使用與測試,因此在搭建Spark on Yarn模式的同時,也把Mysql + Hive一起搭建完成,並且配置Hive對Spark的支援,讓Spark也能像Hive一樣操作資料。

前期準備

scala-2.11.11.tgz
spark-2.1.1-bin-hadoop2.7.tar.gz
hive-1.2.1.tar.gz
mysql-connector-java-5.1.43-bin.jar

這裡寫圖片描述

安裝MySQL

通過yum 安裝MySQL
MySQL因為只用來儲存hive的元資料,因此只用在一個節點上安裝就好
1、下載MySQL的repo源

wget http://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm

2、安裝mysql源

yum localinstall mysql57-community-release-el7-11.noarch.rpm

3、檢查源是否安裝成功

yum repolist enabled | grep "mysql.*-community.*"

4、安裝mysql

yum install mysql-community-server

5、啟動mysql

systemctl start mysqld

6、檢視mysql狀態

systemctl status mysqld

出現active (running)表示成功

7、設定開機啟動mysql

systemctl enable mysqld
systemctl daemon-reload

8、修改root本地登入密碼

//生成預設密碼,然後登入後修改
grep 'temporary password' /var/log/mysqld.log
mysql -uroot -p

//修改全域性引數以便修改密碼
//檢查是否安裝validate_password外掛
SHOW VARIABLES LIKE 'validate_password%'
; //修改validate_passwhiord_policy引數的值 set global validate_password_policy=0; //設定root賬戶密碼 set password for 'root'@'localhost'=password('rootroot');

9、新增遠端登入使用者

GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'rootroot' WITH GRANT OPTION;

10、配置預設編碼為utf-8

//修改/etc/my.cnf 在[mysqld]下新增編碼
character_set_server=utf8
init_connect='SET NAMES utf8'

HIVE安裝

在master1節點上

1、建立hdfs目錄並賦予許可權
這幾步必須做,否則後面指定hive元資料庫的時候回出錯

hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir -p /user/hive/tmp
hdfs dfs -mkdir -p /user/hive/log
hdfs dfs -chmod 777 /user/hive/warehouse
hdfs dfs -chmod 777 /user/hive/tmp
hdfs dfs -chmod 777 /user/hive/log

增加環境變數

export HIVE_HOME=/usr/local/hive-1.2.1
export HIVE_CONF_DIR=/usr/local/hive/conf

2、建立mysql資料庫資訊並指定元資料庫

//登入mysql,建立一個數據庫命令為hive
create database hive;

//建立hive使用者,並賦予所有的許可權
CREATE USER 'hive'@'localhost' IDENTIFIED BY 'rootroot';
GRANT ALL PRIVILEGES ON *.* TO hive IDENTIFIED BY 'ROOTROOT' WITH GRANT OPTION;

//將mysql的JDBC驅動包拷貝到hive的安裝目錄的lib目錄中

3、遠端模式的服務端配置(master節點)
修改hive-site.xml配置

vim /usr/local/hive-1.2.1/conf/hive-site.xml
//具體配置如下

<configuration>
 <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>rootroot</value>
    <description>password to use against metastore database</description>
  </property>

  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/usr/local/hive-1.2.1/iotmp/operation_logs</value>
    <description>Top level directory where operation logs are stored if logging functionality is enabled</description>
  </property>

  <property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>

  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/usr/local/hive-1.2.1/iotmp</value>
    <description>Local scratch space for Hive jobs</description>
  </property>

  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/usr/local/hive-1.2.1/iotmp</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>

  <property>
    <name>hive.querylog.location</name>
    <value>/usr/local/hive-1.2.1/iotmp</value>
    <description>Location of Hive run time structured log file</description>
  </property>
</configuration>

4、其他節點作為客戶端(master1/slave1/slave2/slave3)

修改hive-site.xml配置

<configuration>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/usr/hive/warehouse</value>
    </property>

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://master:9083</value>
    </property>
</configuration>

到這裡 hive的遠端模式就配置完成了。

測試一下hive是否正常啟動

//在master節點上啟動hive元資料服務
hive --service metastore &

//在master1節點上啟動hive
hive

這裡寫圖片描述

這裡寫圖片描述

hive 可以顯示資料
mysql儲存hive元資料資訊
HDFS儲存資料
這裡寫圖片描述

對應的HDFS上的資料
這裡寫圖片描述

hive功能執行正常

Spark on Yarn 配置

1、解壓spark包

//解壓到/usr/local/spark
tar -zxvf spark-2.1.1-bin-hadoop2.7.tgz
mv spark-2.1.1-bin-hadoop2.7 /usr/local/spark

2、增加環境變數

vim ~/.bashrc
//增加
export SPARK_HOME=/usr/local/spark
//在PATH後面追加
%SPARK_HOME/bin:%SPARK_HOME/sbin

3、修改spark-env.sh配置檔案

//增加配置
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export JAVA_HOME=/usr/local/jdk1.8.0_144
export SPARK_HOME=/usr/local/spark
export SPARK_EXECUTOR_MEMORY=1G
export SPARK_EXECUTOR_cores=1
export SPARK_WORKER_CORES=1
export SCALA_HOME=/usr/local/scala

測試一下通過spark on yarn
使用spark知道的SparkPi來測試,指定master為yarn

/usr/local/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --num-executors 2 /usr/local/spark/examples/jars/spark-examples_2.11-2.1.1.jar 5

這裡寫圖片描述

也可以在yarn UI介面上看到Yarn為spark分配的application
這裡寫圖片描述

Spark sql訪問hive資料

1、將master節點的hive的配置檔案hive-site.xml拷貝進入spark/conf目錄中
hive-site.xml內容

<property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://master:3306/hive?createDatabaseIfNoExist=true</value>
</property>

<property>    
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
</property>

<property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
<property>

<property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>rootroot</value>
</property>

<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse<value>
</property>

2、修改spark-default.conf檔案

//在配置檔案裡面增加如下配置
spark.sql.warehouse.dir /user/spark/warehouse

3、將hive-site.xml 和 spark-default.conf兩個配置檔案傳送給其他的幾個節點

scp hive-site.xml hadoop@master1:/usr/local/spark/conf
scp hive-site.xml hadoop@slave1:/usr/local/spark/conf
scp hive-site.xml hadoop@slave2:/usr/local/spark/conf
scp hive-site.xml hadoop@slave3:/usr/local/spark/conf
scp spark-default.conf hadoop@master1:/usr/local/spark/conf
scp spark-default.conf hadoop@slave1:/usr/local/spark/conf
scp spark-default.conf hadoop@slave2:/usr/local/spark/conf
scp spark-default.conf hadoop@slave3:/usr/local/spark/conf

4、把mysql的驅動包放入spark/jars裡面

增加配置過後,就可以通過spark sql來操作hive資料庫了

測試一下spark sql 對hive的操作
這裡寫圖片描述

spark能通過sql語句訪問,功能正常!

如果有什麼意見或者建議,請聯絡我,謝謝。