spark學習記錄(十一、Spark on Hive配置)
阿新 • • 發佈:2019-01-13
新增依賴
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>2.4.0</version>
<scope>provided</scope>
</dependency>
1.1hadoop1啟動hive metastore服務
在hive/sbin目錄下
hive --service metastore &
1.2修改hive客戶端主機上 hive/conf目錄下的hive-site.xml 新增
<configuration> <property> <name>hive.metastore.uris</name> <value>thrift://hadoop1:9083</value> </property> </configuration>
1.3 將hive客戶端的主機上hive/conf目錄下的hive-site.xml拷到spark客戶端spark/conf目錄下
1.4修改spark客戶端的hive-site.xml,只保留
<configuration> <property> <name>hive.metastore.uris</name> <value>thrift://hadoop1:9083</value> </property> </configuration>
1.5啟動spark
在spark客戶端的spark/bin目錄下
./spark-shell --master spark://hadoop1:7077,hadoop2:7077
1.6編寫java程式碼
public class JavaExample {
public static void main(String[] args) {
SparkConf conf = new SparkConf();
conf.setAppName("hive");
JavaSparkContext sc = new JavaSparkContext(conf);
// HiveContext是SQLContext子類
HiveContext hiveContext = new HiveContext(sc);
hiveContext.sql("USE spark");
hiveContext.sql("DROP TABLE IF EXISTS student_infos");
// 在hive中建立student_infos表
hiveContext.sql("CREATE TABLE IF NOT EXISTS student_infos (name STRING,age INT) row format delimited " +
"fields terminated by '\t'");
hiveContext.sql("load data local inpath '/usr/local/student_infos' into table student_infos");
/**
* 查詢生成的DataFrame
*/
Dataset<Row> siDf = hiveContext.sql("SELECT * from student_infos");
siDf.registerTempTable("students");
siDf.show();
/**
* 將結果儲存到hive表good_student_infos
*/
hiveContext.sql("DROP TABLE IF EXISTS good_student_infos");
siDf.write().mode(SaveMode.Overwrite).saveAsTable("good_student_infos");
Dataset<Row> table = hiveContext.table("good_student_infos");
Row[] collects = table.collect();
for (Row collect : collects) {
System.out.println(collect);
}
sc.stop();
}
}
1.7在bin目錄下執行命令
./spark-submit --master spark://hadoop1:7077,hadoop2:7077 /usr/local/JavaExample.jar