impala表資料匯出,批量插入elasticsearch
阿新 • • 發佈:2018-12-17
1.從impala表匯出檔案,匯出格式為Csv
impala-shell -q "select * from qscar.kcar " -B --output_delimiter="," --print_header -o kcar.csv
2.將Csv檔案轉化為指定格式(索引頭+jsonobject),方便插入elasticsearch,以下為程式碼展示
import org.json.JSONArray; import org.json.JSONObject; import java.io.*; import java.util.ArrayList; import java.util.List; public class Change { private String fileName = null; private BufferedReader br = null; // private List<String> list = new ArrayList<String>(); public List readCsv(String path){ List<String> list = new ArrayList<String>(); try { br = new BufferedReader(new FileReader(path)); String stemp; while ((stemp = br.readLine()) != null) { list.add(stemp); } } catch (Exception e) { e.printStackTrace(); } return list; } public void ToJson(List list) { int rowNum = list.size();//行數 int colNum = list.get(0).toString().split(",").length;//列數 String tittle[] = list.get(0).toString().split(","); for (int i = 1; i < rowNum; i++) { JSONObject jsonObject = new JSONObject(); JSONArray jsonArray = new JSONArray(); //獲取jsonobject的name for (int j = 0; j < colNum; j++) { String[] tmp = list.get(i).toString().split(","); jsonObject.put(tittle[j],tmp[j]); // jsonArray.put(jsonObject); } //寫檔案 try { FileWriter fileWriter=new FileWriter("D:/test2.json",true); String head="{ \"index\" : { \"_index\" : \"zjx\", \"_type\" : \"type2\" } }"; fileWriter.write(head+"\n"+jsonObject.toString()+"\n"); fileWriter.close(); } catch (IOException e) { e.printStackTrace(); } } } public static void main(String[] args) { String path="D:/kcar.csv"; Change change=new Change(); List list=change.readCsv(path); change.ToJson(list); } }
進行轉化時,需要控制行數,因為輸出的檔案大小需要控制在100M以下,不然插入es時會報錯,所以最好分批次插入。
3.向elasticsearch插入資料
首先,bulk命令格式:
curl -XPOST localhost:9200/_bulk --data-binary @test2.json
test2.json是你所需要讀取的json格式檔案,檔案格式:
{ "index" : { "_index" : "test", "_type" : "type2" } } {"id":"60","areaname":"重慶市","tid":"157","mid":"158","ctime":"20180114160801","ctime1":"2018-01-14 16:08:01"} { "index" : { "_index" : "test", "_type" : "type2" } } {"id":"60","areaname":"重慶市","tid":"157","mid":"158","ctime":"20180114160801","ctime1":"2018-01-14 16:08:01"
每一條欄位資料都要加入索引和型別資訊。