1. 程式人生 > >spark學習記錄(十一、Spark on Hive配置)

spark學習記錄(十一、Spark on Hive配置)

新增依賴

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.12</artifactId>
            <version>2.4.0</version>
            <scope>provided</scope>
        </dependency>

1.1hadoop1啟動hive metastore服務

在hive/sbin目錄下

hive --service metastore &

1.2修改hive客戶端主機上 hive/conf目錄下的hive-site.xml 新增

    <configuration>
        <property>
            <name>hive.metastore.uris</name>
            <value>thrift://hadoop1:9083</value>
        </property>
    </configuration>

1.3 將hive客戶端的主機上hive/conf目錄下的hive-site.xml拷到spark客戶端spark/conf目錄下 

1.4修改spark客戶端的hive-site.xml,只保留

    <configuration>
        <property>
            <name>hive.metastore.uris</name>
            <value>thrift://hadoop1:9083</value>
        </property>
    </configuration>

1.5啟動spark

在spark客戶端的spark/bin目錄下

./spark-shell --master spark://hadoop1:7077,hadoop2:7077

1.6編寫java程式碼

public class JavaExample {
    public static void main(String[] args) {
        SparkConf conf = new SparkConf();
        conf.setAppName("hive");
        JavaSparkContext sc = new JavaSparkContext(conf);
//        HiveContext是SQLContext子類
        HiveContext hiveContext = new HiveContext(sc);
        hiveContext.sql("USE spark");
        hiveContext.sql("DROP TABLE IF EXISTS student_infos");
//        在hive中建立student_infos表
        hiveContext.sql("CREATE TABLE IF NOT EXISTS student_infos (name STRING,age INT) row format delimited " +
                "fields terminated by '\t'");
        hiveContext.sql("load data local inpath '/usr/local/student_infos' into table student_infos");

        /**
         * 查詢生成的DataFrame
         */
        Dataset<Row> siDf = hiveContext.sql("SELECT * from student_infos");

        siDf.registerTempTable("students");
        siDf.show();

        /**
         * 將結果儲存到hive表good_student_infos
         */
        hiveContext.sql("DROP TABLE IF EXISTS good_student_infos");
        siDf.write().mode(SaveMode.Overwrite).saveAsTable("good_student_infos");
        Dataset<Row> table = hiveContext.table("good_student_infos");
        Row[] collects = table.collect();
        for (Row collect : collects) {
            System.out.println(collect);
        }
        sc.stop();
    }
}

1.7在bin目錄下執行命令

./spark-submit --master spark://hadoop1:7077,hadoop2:7077 /usr/local/JavaExample.jar