spark讀取hive表資料實操
阿新 • • 發佈:2019-01-24
環境:spark1.6 hive1.2.1 hadoop2.6.4
1.新增一下依賴包
spark-hive_2.10的新增為了能建立hivecontext物件
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.1</version>
</dependency>
mysql驅動連結元資料
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.38</version>
<scope>compile</scope>
</dependency>
2.新增hive-site.xml檔案內容如下
其中mysql中hive庫是hive的元資料庫
<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager-->
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name >javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
</configuration>
3.開始讀取hive表的資料了,程式碼如下
object App {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("test").setMaster("local[2]")
val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
sqlContext.table("test.person") // 庫名.表名 的格式
.registerTempTable("person") // 註冊成臨時表
sqlContext.sql(
"""
| select *
| from person
| limit 10
""".stripMargin).show()
sc.stop()
}
}