hive 學習系列五（hive 和elasticsearch 的交互，很詳細哦，我又來吹liubi了）

阿新 • • 發佈：2018-07-24

圖片 upload ima com 5.6 cat rds href ping

hive 操作elasticsearch

一，從hive 表格向elasticsearch 導入數據

1，首先，創建elasticsearch 索引，索引如下

curl -XPUT ‘10.81.179.209:9200/zebra_info_demo?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
    "settings": {
        "number_of_shards":5,
        "number_of_replicas":2
    },
    "mappings": {
         "zebra_info": {
              "properties": {
                    "name" : {"type" : "text"},
                    "type": {"type": "text"},
                    "province": {"type": "text"},
                    "city": {"type": "text"},
                    "citycode": {"type": "text", "index": "no"},
                    "district": {"type": "text"},
                    "adcode": {"type": "text", "index": "no"},
                    "township": {"type": "text"},
                    "bausiness_circle": {"type": "text"},
                    "formatted_address": {"type": "text"},
                    "location": {"type": "geo_point"},
                    "extensions": {
                      "type": "nested",
                      "properties": {
                        "map_lat": {"type": "double", "index": "no"},
                        "map_lng": {"type": "double", "index": "no"},
                        "avg_price": {"type": "double", "index": "no"},
                        "shops": {"type":"short", "index": "no"},
                        "good_comments": {"type":"short", "index": "no"},
                        "lvl": {"type":"short", "index": "no"},
                        "leisure_type": {"type": "text", "index": "no"},
                        "fun_type": {"type": "text", "index": "no"},
                        "numbers": {"type": "short", "index": "no"}
                       }
                   }
             }
        }
    }
}
‘

2，查看elasticsearch版本，下載相應的elasticsearch-hive-hadoop jar 包

可以用如下命令查看elastic search 的版本
本文版本5.6.9
技術分享圖片

到如下maven 官網下載jar 包。
https://repo.maven.apache.org/maven2/org/elasticsearch/elasticsearch-hadoop-hive/
選擇正確的版本即可。

3，把下載下來的jar 包上傳到hdfs 路徑下。

本文jar 包路徑，hdfs:///udf/elasticsearch-hadoop-hive-5.6.9.jar
技術分享圖片

4，哦了，建表，用起來

DELETE jars;
add jar hdfs:///udf/elasticsearch-hadoop-hive-5.6.9.jar;
drop table zebra_info_demo;
CREATE EXTERNAL  TABLE zebra_info_demo(
name string,
`type` string,
province double,
city string,
citycode string,
district string,
adcode string,
township string,
business_circle string,
formatted_address string,
location string,
extensions STRUCT<map_lat:double, map_lng:double, avg_price:double, shops:smallint, good_comments:smallint, lvl:smallint, leisure_type:STRING, fun_type:STRING, numbers:smallint>
)
STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler‘ 
TBLPROPERTIES(‘es.nodes‘ = ‘10.81.179.209:9200‘,
‘es.index.auto.create‘ = ‘false‘,
‘es.resource‘ = ‘zebra_info_demo/zebra_info‘,
‘es.read.metadata‘ = ‘true‘,
‘es.mapping.names‘ = ‘name:name, type:type, province:province, city:city, citycode:citycode, district:district, adcode:adcode, township:township, business_circle:business_circle, formatted_address:formatted_address, location:location, extensions:extensions‘);

5, 往裏面填充數據，就O了。

INSERT INTO TABLE zebra_info_demo
SELECT 
a.name,
a.brands,
a.province,
a.city,
null as citycode,
null as district,
null as adcode,
null as township,
a.business_circle,
null as formatted_address,
concat(a.map_lat, ‘, ‘, a.map_lng) as `location`,
named_struct(‘map_lat‘, cast(a.map_lat as double), ‘map_lng‘,cast(a.map_lng as double) ,‘avg_price‘, cast(0 as DOUBLE), ‘shops‘, 0S,  ‘good_comments‘, 0S, ‘lvl‘, cast(a.lv1 as SMALLINT), ‘leisure_type‘, ‘‘, ‘fun_type‘, ‘‘, ‘numbers‘, 0S) as extentions
from medicalsite_childclinic a;

運行結果：
技術分享圖片

二，已知elasticsearch 索引，然後，建立hive 表格和elasticsearch 進行交互。可以join 哦，一個字，liubi

1,先看一下索引和數據

已知索引如下：

curl -XPUT  ‘10.81.179.209:9200/join_tests?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
  "mappings": {
    "cities": {
      "properties": {
        "province": {
          "type": "string"
        },
        "city": {
          "type": "string"
        }
      }
    }
    }
  }
}
‘

curl -XPUT  ‘10.81.179.209:9200/join_tests1?pretty‘ -H ‘Content-Type: application/json‘ -d‘
{
  "mappings": {
    "shop": {
      "properties":{
        "name": {
          "type": "string"
        },
        "city": {
          "type": "string"
        }
      }
    }
   }
  }
}
‘

數據如下：
技術分享圖片

技術分享圖片

2，建立表格，寫一堆有毒的sql 語句。

DELETE jars;
add jar hdfs:///udf/elasticsearch-hadoop-hive-5.6.9.jar;
create table join_tests(
    province string,
    city string
)STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler‘ 
TBLPROPERTIES(‘es.nodes‘ = ‘10.81.179.209:9200‘,
‘es.index.auto.create‘ = ‘false‘,
‘es.resource‘ = ‘join_tests/cities‘,
‘es.read.metadata‘ = ‘true‘,
‘es.mapping.names‘ = ‘province:province, city:city‘);

create table join_tests1(
    name string,
    city string
)STORED BY ‘org.elasticsearch.hadoop.hive.EsStorageHandler‘ 
TBLPROPERTIES(‘es.nodes‘ = ‘10.81.179.209:9200‘,
‘es.index.auto.create‘ = ‘false‘,
‘es.resource‘ = ‘join_tests1/shop‘,
‘es.read.metadata‘ = ‘true‘,
‘es.mapping.names‘ = ‘name:name, city:city‘);




SELECT 
    a.province,
    b.city,
    b.name
from join_tests a LEFT JOIN join_tests1 b on a.city = b.city;

3，運行結果

技術分享圖片

結束語

推薦一個useful 的工具， apache Hue, 可以用來管理hdfs 文件，hive 操作。mysql 操作等。

hive 學習系列五（hive 和elasticsearch 的交互，很詳細哦，我又來吹liubi了）

圖片 upload ima com 5.6 cat rds href ping hive 操作elasticsearch 一，從hive 表格向elasticsearch 導入數據 1，首先，創建elasticsearch 索引，索引如下 curl -XPUT ‘10.81

hive 學習系列一（資料型別的定義）

數字型別（Numeric Types）整型 TINYINT(取值範圍：-128 – 127) SMALLINT(取值範圍：-32,768 to 32,767) INT/INTEGER(取值範圍： -2,147,483,648 to 2,147,48

vim 的各種用法，很實用哦，都是本人是在工作中學習和總結的

運維列表 vim編輯一個 .com 設置 windows ati 有意義（一）初級個性化配置你的vim 1、vim是什麽？ vim是Vi IMproved，是編輯器Vi的一個加強版，一個極其強大並符合IT工程師（程序員、運維）習慣的編輯器。如果你是一名職業的SE，那麽

Hive學習之路（十五）Hive分析窗口函數(三) CUME_DIST和PERCENT_RANK

select rank com ble class mina src format () 這兩個序列分析函數不是很常用，這裏也練習一下。數據準備數據格式 cookie3.txt d1,user1,1000 d1,user2,2000 d1,user3,

Hive學習之路（五）DbVisualizer配置連接hive

ado lan inf files AD sha comm HR 下載地址一、安裝DbVisualizer 下載地址http://www.dbvis.com/ 也可以從網上下載破解版程序，此處使用的版本是DbVisualizer 9.1.1 具體的安裝步驟可以百度，

Hive學習之路（六）Hive SQL之數據類型和存儲格式

OS big api 而且好的存儲 array 文本文件字符串一、數據類型 1、基本數據類型 Hive 支持關系型數據中大多數基本數據類型類型描述示例 boolean true/false TRUE tinyint 1字

Hive學習之路（二）Hive安裝

different 0.10 director lar blog cut cti mysql extend Hive的下載下載地址http://mirrors.hust.edu.cn/apache/ 選擇合適的Hive版本進行下載，進到stable-2文件夾可以看到穩

Hive學習之路（一）Hive初識

完成優化 ble 缺點 ase 適合 table vol 利用 Hive 簡介什麽是Hive 1、Hive 由 Facebook 實現並開源 2、是基於 Hadoop 的一個數據倉庫工具 3、可以將結構化的數據映射為一張數據庫表 4、並提供 HQL(Hive

Hive學習之路（三）Hive元數據信息對應MySQL數據庫表

需要 pri from metastore node rom lazy 測試安裝概述 Hive 的元數據信息通常存儲在關系型數據庫中，常用MySQL數據庫作為元數據庫管理。上一篇hive的安裝也是將元數據信息存放在MySQL數據庫中。 Hive的元數據信息在MySQ

Hive學習之路（六）Hive的DDL操作

存儲位置 BE 輔助 cond 允許 param 就是文件夾 selected 庫操作 1、創建庫語法結構 CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name 　　[COMMENT database_

Hive學習之路（七）Hive中文亂碼

min sta keys Coding ava eight img 字符集步驟 Hive註釋中文亂碼創建表的時候，comment說明字段包含中文，表成功創建成功之後，中文說明顯示亂碼 create external table movie( userID in

Hive學習之路（十）Hive的高級操作

ipc functions nes aof inpu 輸入表格開發 eat 一、負責數據類型 1、array 現有數據如下： 1 huangbo guangzhou,xianggang,shenzhen a1:30,a2:20,a3:100 beijing,

Hive學習之路（二十）Hive 執行過程實例分析

cred exe 重復 generator pan hql 語句 color SQ 一、Hive 執行過程概述 1、概述（1） Hive 將 HQL 轉換成一組操作符（Operator），比如 GroupByOperator, JoinOperator 等（2）操

Hive學習之路（十八）Hive的Shell操作

int one 依次也會 not show div ble ive 遞增一、Hive的命令行 1、Hive支持的一些命令 Command Description quit Use quit or exit to leave the interactive sh

Hive Shell命令之一（資料庫和表的操作）

//資料庫的有關操作 1、如果資料庫不存在的話建立資料庫，預設資料庫default： create database if not exists test; 2、檢視hive中所包含的資料庫： show databases; 3、如果資料庫非常多，可以用正則表示式匹配篩選出需要的資料庫名。 sh

caffe 學習系列之生成txt 和lmdb（2）

在上個筆記中，已經學會了如何使用Caffe利用作者給的指令碼訓練CIFAR-10資料集，得到訓練好的CNN模型。但是在上個筆記中，使用的都是作者提供好的指令碼檔案，完全就是按照教程跑了一下提供的demo。對於自己手裡的一些圖片資料集，如何轉換圖片格式、如何計算圖片資料的均值、如何編寫protot

機器學習系列：（五）決策樹——非線性迴歸與分類

和猜猜看一樣，決策樹也是通過對解釋變數序列的逐條測試獲取響應變數結果的。那麼，哪個解釋變數應該先測試？直覺觀察會發現，解釋變數集合包含所有貓或者所有狗的測試，比既包含貓又包含狗的解釋變數集合的測試要好。如果子整合員種類不同，我們還是不能確定種類。我們還需要避免建立那種測試，把單獨的一隻貓或一條狗分離出

【原創】Selenium學習系列之（七）—ConnectDB和複用測試方法

一篇來說一下Webdriver中連線DB合複用測試方法。兩個完全不搭邊的東西怎麼說明呢，既然不好說那就不多說，通過例子來理解。需求我們要實現一個這樣的測試情境：登入系統時，若loginID正確，但密碼錯誤，連續三次密碼輸入錯誤後，系統會lock user。怎麼實現呢

ECMAScript 6 學習系列課程（ES6 Set和Map資料結構）

學過Java的同學，一定用過List和Map的資料結構，不過JavaScript在最新版本中，也提供了Set和Map的資料結構，的確是廣大開發者的福音。 ES6提供了新的資料結構Set。它類似於陣列，但是成員的值都是唯一的，沒有重複的值。 Set

Hive學習之抽樣（Sampling）

當資料量特別大時，對全體資料進行處理存在困難時，抽樣就顯得尤其重要了。抽樣可以從被抽取的資料中估計和推斷出整體的特性，是科學實驗、質量檢驗、社會調查普遍採用的一種經濟有效的工作和研究方法。 Hive支援桶表抽樣和塊抽樣，下面分別學習。所謂桶表指的是在

hive 學習系列五（hive 和elasticsearch 的交互，很詳細哦，我又來吹liubi了）

hive 操作elasticsearch

一，從hive 表格向elasticsearch 導入數據

1，首先，創建elasticsearch 索引，索引如下

2，查看elasticsearch版本，下載相應的elasticsearch-hive-hadoop jar 包

3， 把下載下來的jar 包上傳到hdfs 路徑下。

4，哦了，建表，用起來

5, 往裏面填充數據，就O了。

二，已知elasticsearch 索引，然後，建立hive 表格和elasticsearch 進行交互。可以join 哦，一個字，liubi

1,先看一下索引和數據

2，建立表格，寫一堆有毒的sql 語句。

3，運行結果

結束語

相關推薦

3，把下載下來的jar 包上傳到hdfs 路徑下。