1. 程式人生 > >Spark-Sql整合hive,在spark-sql命令和spark-shell命令下執行sql命令和整合呼叫hive

Spark-Sql整合hive,在spark-sql命令和spark-shell命令下執行sql命令和整合呼叫hive

分享一下我老師大神的人工智慧教程!零基礎,通俗易懂!http://blog.csdn.net/jiangjunshow

也歡迎大家轉載本篇文章。分享知識,造福人民,實現我們中華民族偉大復興!

                       

1.安裝hive
如果想建立一個數據庫使用者,並且為資料庫賦值許可權,可以參考:http://blog.csdn.net/tototuzuoquan/article/details/52785504

2.將配置好的hive-site.xml、core-site.xml、hdfs-site.xml放入$SPARK_HOME/conf目錄下

[root@hadoop1 conf]# cd /home/tuzq/software/hive/apache-hive-1.2.1-bin[root@hadoop1 conf]# cp hive-site.xml $SPARK_HOME/conf[root@hadoop1 spark-1.6.2-bin-hadoop2.6]# cd $HADOOP_HOME[root@hadoop1 hadoop]# cp core-site.xml $SPARK_HOME/conf
[root@hadoop1 hadoop]# cp hdfs-site.xml $SPARK_HOME/conf同步spark叢集中的conf中的配置[root@hadoop1 conf]# scp -r * root@hadoop2:$PWD[root@hadoop1 conf]# scp -r * root@hadoop3:$PWD[root@hadoop1 conf]# scp -r * root@hadoop4:$PWD[root@hadoop1 conf]# scp -r * root@hadoop5:$PWD
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

放入進去之後,注意重新啟動Spark叢集,關於叢集啟動和停止,可以參考:

http://blog.csdn.net/tototuzuoquan/article/details/74481570
   
  • 1

修改spark的log4j列印輸出的日誌錯誤級別為Error。修改內容為:
這裡寫圖片描述

3.啟動spark-shell時指定mysql連線驅動位置

bin/spark-shell --master spark://hadoop1:7077,hadoop2:7077 --executor-memory 1g --total-executor-cores 2 --driver-class-path /home/tuzq/software/spark-1.6.2-bin-hadoop2.6/lib/mysql-connector-java-5.1.38.jar
   
  • 1

如果啟動的過程中報如下錯:
這裡寫圖片描述

可以按照上面的紅框下的url進行檢查:
https://wiki.apache.org/hadoop/ConnectionRefused
這裡寫圖片描述

4.使用sqlContext.sql呼叫HQL
在使用之前先要啟動hive,建立person表:

hive> create table person(id bigint,name string,age int) row format delimited fields terminated by " " ;OKTime taken: 2.152 secondshive> show tables;OKfuncpersonwypTime taken: 0.269 seconds, Fetched: 3 row(s)hive>
   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

檢視hdfs中person的內容:

[root@hadoop3 ~]# hdfs dfs -cat /person.txt1 zhangsan 192 lisi 203 wangwu 284 zhaoliu 265 tianqi 246 chengnong 557 zhouxingchi 588 mayun 509 yangliying 3010 lilianjie 5111 zhanghuimei 3512 lian 5313 zhangyimou 54[root@hadoop3 ~]# hdfs dfs -cat hdfs://mycluster/person.txt1 zhangsan 192 lisi 203 wangwu 284 zhaoliu 265 tianqi 246 chengnong 557 zhouxingchi 588 mayun 509 yangliying 3010 lilianjie 5111 zhanghuimei 3512 lian 5313 zhangyimou 54
   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28

load資料到person表中:

hive> load data inpath '/person.txt' into table person;Loading data to table default.personTable default.person stats: [numFiles=1, totalSize=193]OKTime taken: 1.634 secondshive> select * from person;OK1   zhangsan    192   lisi    203   wangwu  284   zhaoliu 265   tianqi  246   chengnong   557   zhouxingchi 588   mayun   509   yangliying  3010  lilianjie   5111  zhanghuimei 3512  lian    5313  zhangyimou  54Time taken: 0.164 seconds, Fetched: 13 row(s)hive>
   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
如果是spark-2.1.1-bin-hadoop2.7,它沒有sqlContext,所以要先執行:val sqlContext = new org.apache.spark.sql.SQLContext(sc)如果是spark-1.6.2-bin-hadoop2.6,不用執行:val sqlContext = new org.apache.spark.sql.SQLContext(sc)scala> sqlContext.sql("select * from person limit 2")+---+--------+---+| id|    name|age|+---+--------+---+|  1|zhangsan| 19||  2|    lisi| 20|+---+--------+---+scala>
   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

或使用org.apache.spark.sql.hive.HiveContext (同樣是在spark-sql這個shell命令下)

scala> import org.apache.spark.sql.hive.HiveContextimport org.apache.spark.sql.hive.HiveContextscala> val hiveContext = new HiveContext(sc)Wed Jul 12 12:43:36 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.Wed Jul 12 12:43:36 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.hiveContext: org.apache.spark.sql.hive.HiveContext = [email protected]6d9a46d7scala> hiveContext.sql("select * from person")res2: org.apache.spark.sql.DataFrame = [id: bigint, name: string, age: int]scala> hiveContext.sql("select * from person").show+---+-----------+---+| id|       name|age|+---+-----------+---+1|   zhangsan| 19||  2|       lisi| 20||  3|     wangwu| 28||  4|    zhaoliu| 26||  5|     tianqi| 24||  6|  chengnong| 55||  7|zhouxingchi| 58||  8|      mayun| 50||  9| yangliying| 30|| 10|  lilianjie| 51|| 11|zhanghuimei| 35|| 12|       lian| 53|| 13| zhangyimou| 54|+---+-----------+---+scala>
   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32

bin/spark-sql \
 –master spark://hadoop1:7077,hadoop2:7077 \
 –executor-memory 1g \
 –total-executor-cores 2 \
 –driver-class-path /home/tuzq/software/spark-1.6.2-bin-hadoop2.6/lib/mysql-connector-java-5.1.38.jar

5、啟動spark-shell時指定mysql連線驅動位置

bin/spark-shell --master spark://hadoop1:7077,hadoop2:7077 --executor-memory 1g --total-executor-cores 2 --driver-class-path /home/tuzq/software/spark-1.6.2-bin-hadoop2.6/lib/mysql-connector-java-5.1.38.jar
   
  • 1

5.1.使用sqlContext.sql呼叫HQL(這裡是在spark-shell中執行的命令)

scala> sqlContext.sql("select * from person limit 2")res0: org.apache.spark.sql.DataFrame = [id: bigint, name: string, age: int]scala> sqlContext.sql("select * from person limit 2").show+---+--------+---+| id|    name|age|+---+--------+---+|  1|zhangsan| 19||  2|    lisi| 20|+---+--------+---+scala>
   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13

或使用org.apache.spark.sql.hive.HiveContext

scala> import org.apache.spark.sql.hive.HiveContextimport org.apache.spark.sql.hive.HiveContextscala> val hiveContext = new HiveContext(sc)這裡是日誌,略去scala> hiveContext.sql("select * from person")res2: org.apache.spark.sql.DataFrame = [id: bigint, name: string, age: int]scala> hiveContext.sql("select * from person").show+---+-----------+---+| id|       name|age|+---+-----------+---+|  1|   zhangsan| 19||  2|       lisi| 20||  3|     wangwu| 28||  4|    zhaoliu| 26||  5|     tianqi| 24||  6|  chengnong| 55||  7|zhouxingchi| 58||  8|      mayun| 50||  9| yangliying| 30|| 10|  lilianjie| 51|| 11|zhanghuimei| 35|| 12|       lian| 53|| 13| zhangyimou| 54|+---+-----------+---+scala> 
   
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
           

給我老師的人工智慧教程打call!http://blog.csdn.net/jiangjunshow

這裡寫圖片描述