1. 程式人生 > >Flume hive sink採坑記錄

Flume hive sink採坑記錄

一、hive sink概述

hive sink與hdfs sink 想對比,hive sink可以近實時的把資料採集到hive表中,hdfs sink要構建hive外部表去關聯hdfs路徑,並且實時性沒辣麼高。

二、注意事項

1、Hive表必須設定bucket並且 stored as orc

2、flume配置的hive列名必須都是小寫,即fieldnames的配置都必須是小寫

3、要手動構建分割槽,即autoCreatePartitions = false

三、Configure hive sink

```a1.sinks.k2.type = hive
a1.sinks.k2.channel = c2
#hive元儲存的url
a1.sinks.k2.hive.metastore = thrift://192.168.3.150:9083
#hive表庫名
a1.sinks.k2.hive.database = test
#hive表表名
a1.sinks.k2.hive.table = ods_table
#hive表分割槽,逗號分隔,%Y代表2018,&y代表18
a1.sinks.k2.hive.partition = %Y-%m-%d
#此處自動建立分割槽必須關閉,否則會報錯。使用手動構建分割槽
a1.sinks.k2.autoCreatePartitions = false
#使用本地時間(而不是事件頭的時間戳)
a1.sinks.k2.useLocalTimeStamp = false
#a1.sinks.k2.round = true
#a1.sinks.k2.roundValue = 1
#a1.sinks.k2.roundUnit = minute
a1.sinks.k2.serializer = DELIMITED
#切記切記,一定要記得轉義
a1.sinks.k2.serializer.delimiter = "\\001"
#a1.sinks.k2.serializer.serdeSeparator = "\\001"
#在Flume配置的Hive 列名必須都為小寫字母。Hive表必須設定bucket並且 stored as orc。
a1.sinks.k2.serializer.fieldnames = dstype,id,type,lastuploadtime

```

四、hive

create table test.ods_table
(
dsType string ,
id string ,
type string ,
lastUploadTime string 
)
partitioned by (dt string)
clustered by (id) into 2 buckets
stored as orc

TBLPROPERTIES ('transactional'='true');

alter table test.ods_table add if not exists partition ( dt='2018-05-18');