1. 程式人生 > >spark讀取hive表資料實操

spark讀取hive表資料實操

環境:spark1.6 hive1.2.1 hadoop2.6.4
1.新增一下依賴包
spark-hive_2.10的新增為了能建立hivecontext物件

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.10</artifactId>
      <version>1.6.1</version>
    </dependency>

mysql驅動連結元資料

    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.38</version>
      <scope>compile</scope>
    </dependency>

2.新增hive-site.xml檔案內容如下
其中mysql中hive庫是hive的元資料庫

<?xml version="1.0" encoding="UTF-8"?>
<!--Autogenerated by Cloudera Manager--> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name
>
javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> </property> </configuration>

3.開始讀取hive表的資料了,程式碼如下

object App {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("test").setMaster("local[2]")
    val sc = new SparkContext(conf)
    val sqlContext = new HiveContext(sc)
    sqlContext.table("test.person") // 庫名.表名 的格式
              .registerTempTable("person")  // 註冊成臨時表
    sqlContext.sql(
      """
        | select *
        |   from person
        |  limit 10
      """.stripMargin).show()
    sc.stop()
  }
}