flume將kafka中topic資料匯入hive中

阿新 • • 發佈：2018-12-07

一、首先更加資料的表結構在hive中進行表的建立。

 create table AREA1(unid string,area_punid string,area_no string,area_name string,area_dept_unid string,area_longitude string,area_latitude string,area_sortid string,create_time string) 
clustered by (unid) into 2 buckets 
stored as orc;

注意點： clustered by () into 2 buckets 和 stored as orc 要加上不然會報錯，我第一次弄的時候沒加，也是在網上找到這樣的解決方法。

二、建立完表之後，就可以開始編寫flume的配置檔案了，這是關鍵。

在flume的conf目錄下建立一個配置檔案叫 kafkatohive.conf。然後進行下面的配置

flumeagent1.sources = source_from_kafka
flumeagent1.channels = mem_channel
flumeagent1.sinks = hive_sink
# Define / Configure source
flumeagent1.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSource
flumeagent1.sources.source_from_kafka.zookeeperConnect = 192.168.72.129:2181,192.168.72.130:2181,192.168.72.131:2181
flumeagent1.sources.source_from_kafka.topic = oracle-kafka
flumeagent1.sources.source_from_kafka.channels = mem_channel
flumeagent1.sources.source_from_kafka.interceptors = i1
flumeagent1.sources.source_from_kafka.interceptors.i1.type = timestamp
flumeagent1.sources.source_from_kafka.consumer.timeout.ms = 1000
# Hive Sink
flumeagent1.sinks.hive_sink.type = hive
flumeagent1.sinks.hive_sink.hive.metastore = thrift://192.168.72.129:9083
flumeagent1.sinks.hive_sink.hive.database = test
flumeagent1.sinks.hive_sink.hive.table = AREA1
flumeagent1.sinks.hive_sink.hive.txnsPerBatchAsk = 2
flumeagent1.sinks.hive_sink.batchSize = 10
flumeagent1.sinks.hive_sink.serializer = DELIMITED
flumeagent1.sinks.hive_sink.serializer.delimiter = ,
flumeagent1.sinks.hive_sink.serializer.fieldnames = unid,area_punid,area_no,area_name,area_dept_unid,area_longitude,area_latitude,area_sortid,create_time 
# Use a channel which buffers events in memory
flumeagent1.channels.mem_channel.type = memory
flumeagent1.channels.mem_channel.capacity = 10000
flumeagent1.channels.mem_channel.transactionCapacity = 100
# Bind the source and sink to the channel
flumeagent1.sources.source_from_kafka.channels = mem_channel
flumeagent1.sinks.hive_sink.channel = mem_channel

三、執行flume agent命令如下：

[[email protected] bin]# flume-ng agent -n flumeagent1 -f ../conf/flumetohive.conf

此時就可以把資料匯入到hive中了，為了能實時的匯入資料到hive中，可以建立一個agent從其他資料來源導資料到kafka其主題中，這樣只要設定好deley，應該就能實現準實時的資料流了吧。

flume將kafka中topic資料匯入hive中

一、首先更加資料的表結構在hive中進行表的建立。 create table AREA1(unid string,area_punid string,area_no string,area_name s

使用sqoop將mysql中的資料匯入Hive時報錯

1.使用sqoop在hive中建立一個和mysql中資料結構一樣的表時報錯 ERROR manager.SqlManager: Error reading from database: java.sql.SQLException: Streaming result set [em

sqoop將Mysql資料匯入Hive中

注：筆者用的是sqoop1.4.6版本一、將Mysql資料匯入Hive中命令： sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true --con

mysql匯入資料load data infile用法(將txt檔案中的資料匯入表中)

我們常常匯入資料！mysql有一個高效匯入方法，那就是load data infile 下面來看案例說明基本語法： load data [low_priority] [local] infile 'file_name txt' [replace | ignor

sqoop mysql資料匯入Hive中

sqoop import --connect jdbc:mysql://192.168.8.97:3306/db1?charset-utf8 --username root --password 123456 --table pd_info --columns "pid,cid" --fields-term

利用sqoop將hive資料匯入Oracle中（踩的坑）

教程很多，這裡只說踩過的坑 1.下載sqoop時，還得下一個bin的包，拿到sqoop-1.4.6.jar 的包，放到hadoop的lib目錄下 2.匯入oracle，執行程式碼時，使用者名稱和表名必須大寫！且資料庫建表時表名必須大寫！示例程式碼： sqoop expo

使用spark將hive中的資料匯入到mongodb

import com.huinong.truffle.push.process.domain.common.constant.Constants; import com.mongodb.spark.MongoSpark; import com.mongodb.spark.config.WriteConf

如何快速地將Hive中的資料匯入ClickHouse

如何快速地將Hive中的資料匯入ClickHouse ClickHouse是面向OLAP的分散式列式DBMS。我們部門目前已經把所有資料分析相關的日誌資料儲存至ClickHouse這個優秀的資料倉庫之中，當前日資料量達到了300億。在之前的文章如何快速地把HDFS中的資料

Flume將 kafka 中的資料轉存到 HDFS 中

flume1.8 kafka Channel + HDFS sink(without sources) 將 kafka 中的資料轉存到 HDFS 中, 用作離線計算, flume 已經幫我們實現了, 新增配置檔案, 直接啟動 flume-ng 即可. The Kafka channel can be

利用sqoop將hive資料匯入Oracle中

首先：如oracle則執行sqoop list-databases --connect jdbc:oracle:thin:@//192.168.27.235:1521/ORCL --username DATACENTER -P 來測試是否能正確連線資料庫　如mysql則執行sq

使用shell將hdfs上的資料匯入到hive表中

days=($(seq 20150515 20150517)) hours=() for (( i=0; i<=23;++i)) do if [ $i -lt 10 ]; then

Hive 實戰練習（一）—按照日期將每天的資料匯入Hive表中

需求：每天會產生很多的日誌檔案資料，有這麼一種需求：需要將每天產生的日誌資料在晚上12點鐘過後定時執行操作，匯入到Hive表中供第二天資料分析使用。要求建立分割槽表，並按照日期分割槽。資料檔案命名是以當天日期命名的，如2015-01-09.txt一、建立分割

將資料匯入Hive資料庫中，使用python連結Hive讀取資料庫，轉化成pandas的dataframe

做網際網路應用開發過程中，時常需要面對海量的資料儲存及計算，傳統的伺服器已經很難再滿足一些運算需求，基於hadoop/spark的大資料處理平臺得到廣泛的應用。本文提供一個匯入資料到hive，用python讀取hive資料庫的例子。這實際是個比較簡單的操作，但是還是

利用Python將Excel表中的資料匯入Mysql資料庫

python操作Excel 需要匯入xlrd包，可以通過pip install xlrd 一鍵安裝。 #coding=utf-8 ''' python 將指定目錄下的excel檔案匯入到資料庫中 ''' # 引入資料庫包 import pymysql # 引入操作excel包

java實現將資料庫中的資料匯入到

HSSFWorkbook workbook = new HSSFWorkbook(); HSSFSheet sheet = workbook.createSheet(“渠道列表”); List<ChannelPuting> listChannelPuting = ch

將excel檔案中的資料匯入到資料庫中的步驟

//1，在控制請中通過MultipartHttpServlerRequest multipartFiles獲取上傳的檔案 multipartFile file = multipartFiles.getFiles("file");//引數file是前端input上傳檔案標籤的name屬性；

oracle通過load data 將資料匯入表中通過儲存過程進行批量處理

說明:雖然沒圖，但文字表述很清楚，自己做過的專案留著備用（這只是初版，比較繁瑣，但很明確）準備工作做完之後，後期可直接使用。如後期excel資料有變更，只需改動對應的部分即可，不涉及改動的可直接使用。實際操作步驟依照excel資料模版格式準備好建表語句，將中間過渡

C# 將Excel中的資料匯入到資料庫SQLS

解決方式： 1.現將Excel中的資料存放在DataTable中程式碼參考部落格：C#讀取Excel中的內容，並將內容存放在二維陣列中” 2. 將DataTable中的資料匯入到SqlServer中具體程式碼如下： public static void D

將word文件資料匯入到sql server資料庫中

我現在的需求是這樣的，需要將這些題目插入到sql server資料庫中。並且要對應起來，一開始在網上找了很多方法，都沒有找到合適的。後面感覺，還是自己寫一個比較好，因為只有自己寫的，才是最適合你的！於是就開始倒騰。 1、首先是讀取word中的資料，在我上

使用flume將kafka資料sink到HBase

1. hbase sink介紹如果還不瞭解flume請檢視我寫的其他flume下的部落格。接下來的內容主要來自flume官方文件的學習。 hbase的sink主要有以下兩種。兩種方式都提供和HBASE一樣的一致性保證，即行級原子性 1.1 HbaseSink

flume將kafka中topic資料匯入hive中

相關推薦