hive 學習系列五(hive 和elasticsearch 的交互,很詳細哦,我又來吹liubi了)
阿新 • • 發佈:2018-07-24
圖片 upload ima com 5.6 cat rds href ping
hive 操作elasticsearch
一,從hive 表格向elasticsearch 導入數據
1,首先,創建elasticsearch 索引,索引如下
curl -XPUT ‘10.81.179.209:9200/zebra_info_demo?pretty‘ -H ‘Content-Type: application/json‘ -d‘ { "settings": { "number_of_shards":5, "number_of_replicas":2 }, "mappings": { "zebra_info": { "properties": { "name" : {"type" : "text"}, "type": {"type": "text"}, "province": {"type": "text"}, "city": {"type": "text"}, "citycode": {"type": "text", "index": "no"}, "district": {"type": "text"}, "adcode": {"type": "text", "index": "no"}, "township": {"type": "text"}, "bausiness_circle": {"type": "text"}, "formatted_address": {"type": "text"}, "location": {"type": "geo_point"}, "extensions": { "type": "nested", "properties": { "map_lat": {"type": "double", "index": "no"}, "map_lng": {"type": "double", "index": "no"}, "avg_price": {"type": "double", "index": "no"}, "shops": {"type":"short", "index": "no"}, "good_comments": {"type":"short", "index": "no"}, "lvl": {"type":"short", "index": "no"}, "leisure_type": {"type": "text", "index": "no"}, "fun_type": {"type": "text", "index": "no"}, "numbers": {"type": "short", "index": "no"} } } } } } } ‘
2,查看elasticsearch版本,下載相應的elasticsearch-hive-hadoop jar 包
可以用如下命令查看elastic search 的版本
本文版本5.6.9
到如下maven 官網下載jar 包。
https://repo.maven.apache.org/maven2/org/elasticsearch/elasticsearch-hadoop-hive/
選擇正確的版本即可。
3, 把下載下來的jar 包上傳到hdfs 路徑下。
本文jar 包路徑,hdfs:///udf/elasticsearch-hadoop-hive-5.6.9.jar
4,哦了,建表,用起來
DELETE jars; add jar hdfs:///udf/elasticsearch-hadoop-hive-5.6.9.jar; drop table zebra_info_demo; CREATE EXTERNAL TABLE zebra_info_demo( name string, `type` string, province double, city string, citycode string, district string, adcode string, township string, business_circle string, formatted_address string, location string, extensions STRUCT<map_lat:double, map_lng:double, avg_price:double, shops:smallint, good_comments:smallint, lvl:smallint, leisure_type:STRING, fun_type:STRING, numbers:smallint> ) STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler‘ TBLPROPERTIES(‘es.nodes‘ = ‘10.81.179.209:9200‘, ‘es.index.auto.create‘ = ‘false‘, ‘es.resource‘ = ‘zebra_info_demo/zebra_info‘, ‘es.read.metadata‘ = ‘true‘, ‘es.mapping.names‘ = ‘name:name, type:type, province:province, city:city, citycode:citycode, district:district, adcode:adcode, township:township, business_circle:business_circle, formatted_address:formatted_address, location:location, extensions:extensions‘);
5, 往裏面填充數據,就O了。
INSERT INTO TABLE zebra_info_demo SELECT a.name, a.brands, a.province, a.city, null as citycode, null as district, null as adcode, null as township, a.business_circle, null as formatted_address, concat(a.map_lat, ‘, ‘, a.map_lng) as `location`, named_struct(‘map_lat‘, cast(a.map_lat as double), ‘map_lng‘,cast(a.map_lng as double) ,‘avg_price‘, cast(0 as DOUBLE), ‘shops‘, 0S, ‘good_comments‘, 0S, ‘lvl‘, cast(a.lv1 as SMALLINT), ‘leisure_type‘, ‘‘, ‘fun_type‘, ‘‘, ‘numbers‘, 0S) as extentions from medicalsite_childclinic a;
運行結果:
二,已知elasticsearch 索引,然後,建立hive 表格和elasticsearch 進行交互。可以join 哦,一個字,liubi
1,先看一下索引和數據
已知索引如下:
curl -XPUT ‘10.81.179.209:9200/join_tests?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
"mappings": {
"cities": {
"properties": {
"province": {
"type": "string"
},
"city": {
"type": "string"
}
}
}
}
}
}
‘
curl -XPUT ‘10.81.179.209:9200/join_tests1?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
"mappings": {
"shop": {
"properties":{
"name": {
"type": "string"
},
"city": {
"type": "string"
}
}
}
}
}
}
‘
數據如下:
2,建立表格,寫一堆有毒的sql 語句。
DELETE jars;
add jar hdfs:///udf/elasticsearch-hadoop-hive-5.6.9.jar;
create table join_tests(
province string,
city string
)STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler‘
TBLPROPERTIES(‘es.nodes‘ = ‘10.81.179.209:9200‘,
‘es.index.auto.create‘ = ‘false‘,
‘es.resource‘ = ‘join_tests/cities‘,
‘es.read.metadata‘ = ‘true‘,
‘es.mapping.names‘ = ‘province:province, city:city‘);
create table join_tests1(
name string,
city string
)STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler‘
TBLPROPERTIES(‘es.nodes‘ = ‘10.81.179.209:9200‘,
‘es.index.auto.create‘ = ‘false‘,
‘es.resource‘ = ‘join_tests1/shop‘,
‘es.read.metadata‘ = ‘true‘,
‘es.mapping.names‘ = ‘name:name, city:city‘);
SELECT
a.province,
b.city,
b.name
from join_tests a LEFT JOIN join_tests1 b on a.city = b.city;
3,運行結果
結束語
推薦一個useful 的工具, apache Hue, 可以用來管理hdfs 文件,hive 操作。mysql 操作等。
hive 學習系列五(hive 和elasticsearch 的交互,很詳細哦,我又來吹liubi了)