1. 程式人生 > >hbase匯出資料為文字,csv,html等檔案

hbase匯出資料為文字,csv,html等檔案

需求:

根據時間範圍、區域等條件查詢,將hbase中終端採集資料最大時間、最小時間的日誌資料匯出

思路:

1、通過hbase自帶匯入匯出將查詢到的終端mac資料匯出到指定目錄

hbase org.apache.hadoop.hbase.mapreduce.Driver export  表名稱   目錄

例如:hbase org.apache.hadoop.hbase.mapreduce.Driver export  LogTerminal  /home/hbase

這樣匯出是整個表資料,沒法根據條件過濾,並且匯出資料為多個檔案,處理不方便,姑且暫時不能使用。

2、通過查詢條件過濾匯出

scan 'LOG20180108',{COLUMNS => 'INFO',LIMIT=>1,FILTER=>"(PrefixFilter('T')) AND (SingleColumnValueFilter('INFO','AreaCode',=,'binary:610103'))"}

匯出到檔案:

echo "scan 'LOG20180108',{COLUMNS => ['INFO'],LIMIT=>1,FILTER=>\"(PrefixFilter('T')) AND (SingleColumnValueFilter('INFO','AreaCode',=,'binary:610103'))\"}" | hbase shell > myText.csv


3、通過hive匯出

1、hbase表與hive臨時表同步。

2、hive臨時表資料匯入到真實表

3、將真實表資料匯入資料庫

指令碼如下:

#!/bin/bash

#獲取輸入時間,預設為當前時間

get_time(){

if [ 3 -eq $# ]

then

date=`date -d "+0 day $1" +%Y%m%d`

enddate=`date -d "+1 day $2" +%Y%m%d`

elif [ 0 -eq $# ]

then

echo "無輸入引數,則預設構建昨天."

echo "若是批量構建,請輸入時間段,格式為【$0 yyyy-mm-dd yyyy-mm-dd】."

#read -p "若不輸入引數則預設構建昨天資料,輸入【y】繼續構建昨日資料,輸入【n】退出:" isBuild

#case $isBuild in

#y | Y)

date=`date -d "+0 day yesterday" +%Y%m%d`

enddate=`date +%Y%m%d`

# ;;

#n | N)

# exit

# ;;

#*)

# echo "輸入錯誤,退出"

# exit

# ;;

#esac

else

echo "輸入有誤."

echo "若是批量構建,請輸入時間段,格式為【$0 yyyy-mm-dd yyyy-mm-dd】."

echo "若預設構建昨天資料,則不需要輸入引數,直接執行【$0】."

fi

}

#建立儲存資料表結構

hive_table(){

echo "create hive table start......."

hive -S -e "DROP TABLE IF EXISTS LogTerminal;

CREATE TABLE LogTerminal(rowkey string,TerminalID string,TerminalMac string,DeviceType string, Power string,Channel string,MaxPower string,TimeNear string,LonNear string,LatNear string,RouteMac string,SSID string, SSIDs string,SecurityType string,RealType string, RealCode string,TerminalType int,PcBrand string, Phone string,IMSI string,IMEI string,OS string,CustomerStartTime string,OffLineTime string, Model string,CoordinateX string,CoordinateY string, OffLineLon string,OffLineLat string,PcIP string, RouteType string,SessionID string,ProtoType string,CyberCode string,IsMove string,IsElectric string, SafeState string,GPIOState string,Serial string, GuildID string,Time string,ManufacturerCode string,AreaCode string,UnitCode string,MachineCode string, SystemType string,DATASOURCEID string,Lon string, Lat string,LatLon string,RESOURCETYPE string, AccessSystemID string,InterfaceID string,InterfaceGroupID string,WriterTime string )COMMENT 'LogTerminal Table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET; "

hive -S -e "DROP TABLE IF EXISTS LogTerminal_min;

CREATE TABLE LogTerminal_min(rowkey string,TerminalID string,TerminalMac string,DeviceType string, Power string,Channel string,MaxPower string,TimeNear string,LonNear string,LatNear string,RouteMac string,SSID string, SSIDs string,SecurityType string,RealType string, RealCode string,TerminalType int,PcBrand string, Phone string,IMSI string,IMEI string,OS string,CustomerStartTime string,OffLineTime string, Model string,CoordinateX string,CoordinateY string, OffLineLon string,OffLineLat string,PcIP string, RouteType string,SessionID string,ProtoType string,CyberCode string,IsMove string,IsElectric string, SafeState string,GPIOState string,Serial string, GuildID string,Time string,ManufacturerCode string,AreaCode string,UnitCode string,MachineCode string, SystemType string,DATASOURCEID string,Lon string, Lat string,LatLon string,RESOURCETYPE string, AccessSystemID string,InterfaceID string,InterfaceGroupID string,WriterTime string )COMMENT 'LogTerminal Table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET; "

hive -S -e "DROP TABLE IF EXISTS LogTerminal_max;

CREATE TABLE LogTerminal_max(rowkey string,TerminalID string,TerminalMac string,DeviceType string, Power string,Channel string,MaxPower string,TimeNear string,LonNear string,LatNear string,RouteMac string,SSID string, SSIDs string,SecurityType string,RealType string, RealCode string,TerminalType int,PcBrand string, Phone string,IMSI string,IMEI string,OS string,CustomerStartTime string,OffLineTime string, Model string,CoordinateX string,CoordinateY string, OffLineLon string,OffLineLat string,PcIP string, RouteType string,SessionID string,ProtoType string,CyberCode string,IsMove string,IsElectric string, SafeState string,GPIOState string,Serial string, GuildID string,Time string,ManufacturerCode string,AreaCode string,UnitCode string,MachineCode string, SystemType string,DATASOURCEID string,Lon string, Lat string,LatLon string,RESOURCETYPE string, AccessSystemID string,InterfaceID string,InterfaceGroupID string,WriterTime string )COMMENT 'LogTerminal Table' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS PARQUET; "

echo "create hive table end......."

}

#建立臨時終端日誌表結構,並將hbase表和hive臨時表關聯,然後根據條件查詢臨時表資料插入到真實表中

hive_task(){

echo "hive task $1 $2 $3  ..."

DATA_FORMAT=`date -d "$1" +%Y-%m-%d`

TABLE_NAME=LOG$1

AREA_CODE=$3

echo $DATA_FORMAT

echo $TABLE_NAME

echo $AREA_CODE

hive -S -e "DROP TABLE  IF EXISTS TempLogTerminal;

CREATE EXTERNAL TABLE TempLogTerminal(key string,TerminalID string,TerminalMac string,DeviceType string, Power string,Channel string,MaxPower string,TimeNear string,LonNear string,LatNear string,RouteMac string,SSID string, SSIDs string,SecurityType string,RealType string, RealCode string,TerminalType int,PcBrand string, Phone string,IMSI string,IMEI string,OS string,CustomerStartTime string,OffLineTime string, Model string,CoordinateX string,CoordinateY string, OffLineLon string,OffLineLat string,PcIP string, RouteType string,SessionID string,ProtoType string,CyberCode string,IsMove string,IsElectric string, SafeState string,GPIOState string,Serial string, GuildID string,Time string,ManufacturerCode string,AreaCode string,UnitCode string,MachineCode string,SystemType string,DATASOURCEID string,Lon string, Lat string,LatLon string,RESOURCETYPE string, AccessSystemID string,InterfaceID string,InterfaceGroupID string,WriterTime string)  ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES('hbase.columns.mapping'=':key,INFO:TerminalID,INFO:TerminalMac,INFO:DeviceType,INFO:Power,INFO:Channel,INFO:MaxPower,INFO:TimeNear,INFO:LonNear,INFO:LatNear,INFO:RouteMac,INFO:SSID,INFO:SSIDs,INFO:SecurityType,INFO:RealType,INFO:RealCode,INFO:TerminalType,INFO:PcBrand,INFO:Phone,INFO:IMSI,INFO:IMEI,INFO:OS,INFO:CustomerStartTime,INFO:OffLineTime,INFO:Model,INFO:CoordinateX,INFO:CoordinateY,INFO:OffLineLon,INFO:OffLineLat,INFO:PcIP,INFO:RouteType,INFO:SessionID,INFO:ProtoType,INFO:CyberCode,INFO:IsMove,INFO:IsElectric,INFO:SafeState,INFO:GPIOState,INFO:Serial,INFO:GuildID,INFO:Time,INFO:ManufacturerCode,INFO:AreaCode,INFO:UnitCode,INFO:MachineCode,INFO:SystemType,INFO:DATASOURCEID,INFO:Lon,INFO:Lat,INFO:LatLon,INFO:RESOURCETYPE,INFO:AccessSystemID,INFO:InterfaceID,INFO:InterfaceGroupID,INFO:WriterTime') TBLPROPERTIES('hbase.table.name'='$TABLE_NAME') ;

INSERT $2 TABLE logterminal SELECT key as rowkey,TerminalID,TerminalMac,DeviceType,Power,Channel,MaxPower,TimeNear,LonNear,LatNear,RouteMac,SSID,SSIDs,SecurityType,RealType,RealCode,TerminalType,PcBrand, Phone,IMSI,IMEI,OS,CustomerStartTime,OffLineTime,Model,CoordinateX,CoordinateY,OffLineLon,OffLineLat,PcIP,RouteType,SessionID,ProtoType,CyberCode,IsMove,IsElectric, SafeState,GPIOState,Serial,GuildID,Time,ManufacturerCode,AreaCode,UnitCode,MachineCode,SystemType,DATASOURCEID,Lon,Lat,LatLon,RESOURCETYPE,AccessSystemID,InterfaceID,InterfaceGroupID,WriterTime

FROM TempLogTerminal WHERE RESOURCETYPE=32 AND  AreaCode='$AREA_CODE';"

echo "hive task end......."

}

#建立sql表

sqlserver_table(){

"

if exists (select * from sysobjects where name='LogTerminal_min')

drop table LogTerminal_min

CREATE  TABLE LogTerminal_min(

rowkey nvarchar(200),TerminalID bigint,TerminalMac nvarchar(200),DeviceType bigint,

Power bigint,Channel bigint,MaxPower bigint,

TimeNear nvarchar(200),LonNear float,LatNear float,

RouteMac nvarchar(200),SSID nvarchar(200),

SSIDs nvarchar(200),SecurityType nvarchar(200),RealType bigint,

RealCode nvarchar(200),TerminalType bigint,PcBrand nvarchar(200),

Phone nvarchar(200),IMSI nvarchar(200),IMEI nvarchar(200),

OS nvarchar(200),CustomerStartTime nvarchar(200),OffLineTime nvarchar(200),

Model nvarchar(200),CoordinateX nvarchar(200),CoordinateY nvarchar(200),

OffLineLon float,OffLineLat float,PcIP nvarchar(200),

RouteType bigint,SessionID nvarchar(200),ProtoType bigint,

CyberCode nvarchar(200),IsMove bigint,IsElectric bigint,

SafeState bigint,GPIOState bigint,Serial nvarchar(200),

GuildID nvarchar(200),Time nvarchar(200),ManufacturerCode nvarchar(200),

AreaCode nvarchar(200),UnitCode nvarchar(200),MachineCode nvarchar(200),

SystemType nvarchar(200),DATASOURCEID bigint,Lon float,

Lat float,LatLon nvarchar(200),RESOURCETYPE bigint,

AccessSystemID bigint,InterfaceID bigint,InterfaceGroupID bigint,

WriterTime nvarchar(200)

);

if exists (select * from sysobjects where name='LogTerminal_max')

drop table LogTerminal_max

CREATE  TABLE LogTerminal_max(

rowkey nvarchar(200),TerminalID bigint,TerminalMac nvarchar(200),DeviceType bigint,

Power bigint,Channel bigint,MaxPower bigint,

TimeNear nvarchar(200),LonNear float,LatNear float,

RouteMac nvarchar(200),SSID nvarchar(200),

SSIDs nvarchar(200),SecurityType nvarchar(200),RealType bigint,

RealCode nvarchar(200),TerminalType bigint,PcBrand nvarchar(200),

Phone nvarchar(200),IMSI nvarchar(200),IMEI nvarchar(200),

OS nvarchar(200),CustomerStartTime nvarchar(200),OffLineTime nvarchar(200),

Model nvarchar(200),CoordinateX nvarchar(200),CoordinateY nvarchar(200),

OffLineLon float,OffLineLat float,PcIP nvarchar(200),

RouteType bigint,SessionID nvarchar(200),ProtoType bigint,

CyberCode nvarchar(200),IsMove bigint,IsElectric bigint,

SafeState bigint,GPIOState bigint,Serial nvarchar(200),

GuildID nvarchar(200),Time nvarchar(200),ManufacturerCode nvarchar(200),

AreaCode nvarchar(200),UnitCode nvarchar(200),MachineCode nvarchar(200),

SystemType nvarchar(200),DATASOURCEID bigint,Lon float,

Lat float,LatLon nvarchar(200),RESOURCETYPE bigint,

AccessSystemID bigint,InterfaceID bigint,InterfaceGroupID bigint,

WriterTime nvarchar(200)

);

"

}

#將資料匯入到sqlserver中

import_sqlserver(){

echo " import sqlserver start......."

sqoop export -connect 'jdbc:sqlserver://192.168.2.219; username=nsmc53; password=123456;database=WFBDCMain'  -table LogTerminal_min --hcatalog-database default --hcatalog-table LogTerminal_min --num-mappers 5;

sqoop export -connect 'jdbc:sqlserver://192.168.2.219; username=nsmc53; password=123456;database=WFBDCMain'  -table LogTerminal_max --hcatalog-database default --hcatalog-table LogTerminal_max --num-mappers 5;

echo " import sqlserver end......."

}

#將hive資料匯出到本地目錄

export_hive_local(){

echo " export hive to local start......."

mkdir -p /home/hive/min

mkdir -p /home/hive/max

hive -S -e "\

insert overwrite local directory '/home/hive/min'   ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select a.* from LogTerminal a inner join(select TerminalMac,min(time) time from LogTerminal group by TerminalMac) b on a.TerminalMac = b.TerminalMac and a.time = b.time order by a.TerminalMac ;\

insert overwrite local directory '/home/hive/max'   ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select a.* from LogTerminal a inner join(select TerminalMac,max(time) time from LogTerminal group by TerminalMac) b on a.TerminalMac = b.TerminalMac and a.time = b.time order by a.TerminalMac ;"

#將hive資料匯出到csv檔案

hive -e " set hive.cli.print.header=true;select a.* from LogTerminal a inner join(select TerminalMac,min(time) time from LogTerminal group by TerminalMac) b on a.TerminalMac = b.TerminalMac and a.time = b.time order by a.TerminalMac ;" >> /home/hive/logterminal-min.csv

echo "export hive to local end......."

}

#將hive資料根據查詢條件過濾匯入到其他表

export_hive(){

echo " export hive to sqlserver start......."

#查詢時間最小mac

hive -S -e "INSERT OVERWRITE TABLE LogTerminal_min select a.* from LogTerminal a inner join(select TerminalMac,min(time) time from LogTerminal group by TerminalMac) b on a.TerminalMac = b.TerminalMac and a.time = b.time order by a.TerminalMac ;";

#查詢時間最大mac

hive -S -e "INSERT OVERWRITE TABLE LogTerminal_max select a.* from LogTerminal a inner join(select TerminalMac,max(time) time from LogTerminal group by TerminalMac) b on a.TerminalMac = b.TerminalMac and a.time = b.time order by a.TerminalMac ;";

echo "export hive to sqlserver end......."

}

main(){

#建立hive表LogTerminal

echo " create hive table LogTerminal......."

hive_table

#生成表結構時間範圍

get_time $*

Style="OVERWRITE"

date1=$date

while [[ $date1 < $enddate ]]

do

        echo "$date"

##建立臨時終端日誌表TempLogTerminal關聯hbase表,重新執行時刪除以前建立表結構,並將查詢資料匯入LogTerminal

echo " exec  hive_task....... $date1 $Style $3 "

hive_task $date1 $Style $3

date1=`date -d "+1 day $date1" +%Y%m%d`

Style="INTO"

done

#從hive表LogTerminal日誌表匯出到本地目錄/home/hive

echo " export hive to sqlserver ......."

export_hive

echo " import sqlserver ......."

import_sqlserver

echo " query logterminal end......."

}

main $*

通過執行以上指令碼,將hbase資料匯入到sqlserver中,通過sqlserver客戶端工具連線查詢資料,並根據客戶端工具匯出文字,csv,html等檔案。

select  * from WFBDCMain.dbo.LogTerminal_min

select  * from WFBDCMain.dbo.LogTerminal_max


hive資料匯出csv檔案來源網路查詢,做了部分修改如下:

 #!/bin/bash

mkdir -p  /tmp/project_1010/project

hive -e "use default;

insert overwrite local directory '/tmp/project_1010/t_test/'

ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'

select t.* from LogTerminal t"

#生成表頭,替換表頭前的't_test.'欄位,並寫入最終的csv檔案中

hive  -e 'use default; SET hive.cli.print.header=true; SELECT * FROM LogTerminal LIMIT 0' | sed -e 's/\t/|/g;s/t_test\.//g'  > /home/hive/logterminal-min.csv

#把000000_0,000001_0這樣的檔案通過追加的方式,寫入最終的csv檔案中

cat /tmp/project_1010/t_test/* >> /home/hive/logterminal-min.csv

#使用sed處理最終的csv檔案,根據需求進行替換

sed -i 's/\\N/NULL/g' /home/hive/logterminal-min.csv

sed -i "s/|'|/|NULL|/g" /home/hive/logterminal-min.csv

sed -i 's/|"|/|NULL|/g' /tmp/project_1010/project/t_test.csv

 測試hive直接匯出資料時,由於hive匯出csv檔案不能分列等其他問題未得到解決,所以採用資料到方式操作。如果更好的方式請留言,大家相互促進學習。

指令碼下載:

http://download.csdn.net/download/seashouwang/10264445

參考文件:

http://blog.csdn.net/javajxz008/article/details/61173213

https://my.oschina.net/wangjiankui/blog/497658