1. 程式人生 > >通過spark sql建立HIVE的分割槽表

通過spark sql建立HIVE的分割槽表

今天需要通過匯入文字中的資料到HIVE資料庫,而且因為預設該表的資料會比較大,所以採用分割槽表的設計方案。將表按地區和日期分割槽。在這個過程出現過一些BUG,記錄以便後期檢視。

 spark.sql("use oracledb")
 spark.sql("CREATE TABLE IF NOT EXISTS " + tablename + " (OBUID STRING, BUS_ID STRING,REVTIME STRING,OBUTIME STRING,LONGITUDE STRING,LATITUDE STRING,\
 GPSKEY STRING,DIRECTION STRING,SPEED STRING,RUNNING_NO STRING,DATA_SERIAL STRING,GPS_MILEAGE STRING,SATELLITE_COUNT STRING,ROUTE_CODE STRING,SERVICE STRING)
\ PARTITIONED BY(AREASTRING,OBUDATE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ") spark.sql("set hive.exec.dynamic.partition.mode = nonstrict") spark.sql("set hive.exec.dynamic.partition = true") # print("建立資料庫完成") if addoroverwrite: # 追加 spark.sql("INSERT INTO TABLE " + tablename + " PARTITION(AREA,OBUDATE) SELECT OBUID,BUS_ID, REVTIME, OBUTIME,LONGITUDE ,LATITUDE,GPSKEY,DIRECTION,SPEED,
\ RUNNING_NO,DATA_SERIAL,GPS_MILEAGE, SATELLITE_COUNT ,ROUTE_CODE,SERVICE,'gz' AS AREA,SUBSTR(OBUTIME,1,10) AS OBUDATEFROM " + tablename + "_tmp")
z執行指令碼後出現以下錯誤:

Partition spec {area=, obudate=, AREA=gz, OBUDATE=2017-01-} contains non-partition columns;

經過度娘,有提到分割槽表中大小寫的BUG,於是修改指令碼,將分割槽欄位小寫,執行成功。修改後的指令碼:

 spark.sql("use oracledb")
 spark.sql("CREATE TABLE IF NOT EXISTS " + tablename + " (OBUID STRING, BUS_ID STRING,REVTIME STRING,OBUTIME STRING,LONGITUDE STRING,LATITUDE STRING,\
 GPSKEY STRING,DIRECTION STRING,SPEED STRING,RUNNING_NO STRING,DATA_SERIAL STRING,GPS_MILEAGE STRING,SATELLITE_COUNT STRING,ROUTE_CODE STRING,SERVICE STRING)\
  PARTITIONED BY(area STRING,obudate STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ")
 # 設定引數
 # hive > set  hive.exec.dynamic.partition.mode = nonstrict;
 # hive > set  hive.exec.dynamic.partition = true;
spark.sql("set  hive.exec.dynamic.partition.mode = nonstrict")
 spark.sql("set  hive.exec.dynamic.partition = true")

# print("建立資料庫完成")
if addoroverwrite:
     # 追加
spark.sql("INSERT INTO TABLE " + tablename + " PARTITION(area,obudate) SELECT OBUID,BUS_ID, REVTIME, OBUTIME,LONGITUDE ,LATITUDE,GPSKEY,DIRECTION,SPEED,\
                 RUNNING_NO,DATA_SERIAL,GPS_MILEAGE, SATELLITE_COUNT ,ROUTE_CODE,SERVICE,'gz' AS area ,SUBSTR(OBUTIME,1,10) AS obudate FROM " + tablename + "_tmp")