大資料入門（15）hive簡介和配置

阿新 • • 發佈：2018-11-10

1、上傳檔案，解壓到app 下
tar -zxvf 檔案 -C app
2、不配置檔案的情況下
啟動：./hive (目錄：/home/admin/app/hive/bin)
建立表： create table t_1(id int ,name string);
檢視：show tables;
退出：exit;

當前目錄下生產檔案： metastore_db

退出後，在外層目錄啟動.hive ,查看錶，則無法檢視，因為metastore_db 檔案只存在bin 目錄下，預設使用的derby資料庫,缺點：一次只能開啟一個會話；

3、配置檔案，使用mysql 作為資料來源管理
   3.1、配置環境變數（/etc/profile ）

       JAVA_HOME=/home/admin/app/java/jdk1.7.0_71
       HIVE_HOME=/home/admin/app/hive
       HADOOP_HOME=/home/admin/app/hadoop-2.4.1
       PATH=$HIVE_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$PATH
       CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
       export JAVA_HOME HIVE_HOME HADOOP_HOME PATH CLASSPATH

   3.2、修改配置檔案 hive-env.sh
       cp hive-env.sh.template hive-env.sh
   編輯檔案：
       export JAVA_HOME=/home/admin/app/java/jdk1.7.0_71
       export HIVE_HOME=/home/admin/app/hive
       export HADOOP_HOME=/home/admin/app/hadoop-2.4.1
   3.3、新增配置檔案hive-site.xml
       vi hive-site.xml

   3.4、hive-site.xml（需要先安裝mysql或者使用windows下的配置，切允許遠端連線）

   新增如下內容：

<configuration>
   <property>
   <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:mysql://192.168.1.2:3306/hive?createDatabaseIfNotExist=true</value>
   <description>JDBC connect string for a JDBC metastore</description>
   </property>
   <property>
   <name>javax.jdo.option.ConnectionDriverName</name>
   <value>com.mysql.jdbc.Driver</value>
   <description>Driver class name for a JDBC metastore</description>
   </property>

   <property>
   <name>javax.jdo.option.ConnectionUserName</name>
   <value>root</value>
   <description>username to use against metastore database</description>
   </property>

   <property>
   <name>javax.jdo.option.ConnectionPassword</name>
   <value>root</value>
   <description>password to use against metastore database</description>
   </property>
</configuration>
4、將mysql的連線jar包拷貝到$HIVE_HOME/lib目錄下

sftp> cd /home/admin/app/hive/lib
sftp> put e:\soft\mysql-connector-java-5.1.28.jar

注意:使用外部的mysql ,需要把mysql的資料庫字元型別改為latin1，否則建表失敗

5、bin/hive啟動
show tables;
load dat

show tables;
create table z_1(id int,name string);（需要把mysql的資料庫字元型別改為latin1）

6、建立內部表(預設是內部表)
create table t_order(id int,name string,velocity string,price double) row format delimited fields terminated by '\t';

（1）原理
   每一個 Table 在 Hive 中都有一個相應的目錄儲存資料。例如，一個表 test，它在 HDFS 中的路徑為：/ warehouse/test。
   warehouse是在 hive-site.xml 中由 ${hive.metastore.warehouse.dir} 指定的資料倉庫的目錄，所有的 Table 資料（不包括 External Table）都儲存在這個目錄中。

（2）hdfs中路徑

預設hdfs中路徑：hdfs://ns1/user/hive/warehouse
表直接在warehouse 下

http://192.168.1.113:50070/explorer.html#/user/hive/warehouse

   新建一個creat database wek01;新建表t_order_01；
   hdfsl路徑：warehouse/wek01.db/t_order_01

（3）載入資料

   #本地虛擬機器上檔案
   load data local inpath '/home/admin/oder.txt' into table t_order_01;

#hdfs中檔案
load data inpath '/oder3.txt' into table t_order_01;

（4）hive查詢的hdfs原理：查詢表下的檔案

load後文件的hdfs路徑：warehouse/wek01.db/t_order_01/order.txt

select * from t_order_01;
select count(*) from t_order_01;(使用mapreduce)

直接匯入檔案到warehouse/wek01.db/t_order_01/下，
select * from t_order_01 ;//如果格式不同，則顯示為NULL
(5)刪除表時，元資料與資料都會被刪除

7、建立外部表
(1)原理
   它和內部表在元資料的組織上是相同的，而實際資料的儲存則有較大的差異：
   內部表的建立過程和資料載入過程（這兩個過程可以在同一個語句中完成），在載入資料的過程中，實際資料會被移動到資料倉庫目錄中；之後對資料對訪問將會直接在資料倉庫目錄中完成。       刪除表時，表中的資料和元資料將會被同時刪除
   外部表只有一個過程，載入資料和建立表同時完成，並不會移動到資料倉庫目錄中，只是與外部資料建立一個連結。當刪除一個外部表時，僅刪除該連結
（2）建立表
   create external table t_order_ex(id int,name string,velocity string,price double) row format delimited fields terminated by '\t' location '/hive_test';

8、建立分割槽表
（1）原理
       Partition 對應於資料庫的 Partition 列的密集索引；
       在 Hive 中，表中的一個 Partition 對應於表下的一個目錄，所有的 Partition 的資料都儲存在對應的目錄中
       例如：test表中包含 date 和 city 兩個 Partition，
       則對應於date=20130201, city = bj 的 HDFS 子目錄為：/warehouse/test/date=20130201/city=bj
       對應於date=20130202, city=sh 的HDFS 子目錄為：/warehouse/test/date=20130202/city=sh
（2）建立表
   create table t_order_pt(id int,name string,velocity string,price double)
   partitioned by (month string)
   row format delimited fields terminated by '\t';
（3）載入資料
   load data local inpath '/home/admin/order.txt' into table t_order_pt partition (month='201810');
   load data local inpath '/home/admin/order2.txt' into table t_order_pt partition (month='201811');
（4）查詢
   select count(*) from t_order_pt;
   select count(*) from t_order_pt where month='201810';
（5）修改資料
   alter table partition_table add partition (daytime='2013-02-04',city='bj');
   通過load data 載入資料

alter table partition_table drop partition (daytime='2013-02-04',city='bj')
元資料，資料檔案刪除，但目錄daytime=2013-02-04還在

9、將mysq當中的資料直接匯入到hive當中
sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table trade_detail --hive-import --hive-overwrite --hive-table trade_detail --fields-terminated-by '\t'
sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table user_info --hive-import --hive-overwrite --hive-table user_info --fields-terminated-by '\t'

10、hive 的執行模式

Hive的執行模式即任務的執行環境，分為本地與叢集兩種，我們可以通過mapred.job.tracker 來指明
設定方式：
hive > SET mapred.job.tracker=local

11、hive的啟動方式
   1、hive 命令列模式，直接輸入#/hive/bin/hive的執行程式，或者輸入 #hive --service cli
   2、 hive web介面的 (埠號9999) 啟動方式
   #hive --service hwi &
   用於通過瀏覽器來訪問hive；
   http://hadoop0:9999/hwi/
   3、 hive 遠端服務 (埠號10000) 啟動方式
   #hive --service hiveserver &

建立分割槽表
普通表和分割槽表區別：有大量資料增加的需要建分割槽表比較方便
create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by '\t';

   分割槽表載入資料
   load data local inpath './book.txt' overwrite into table book partition (pubdate='2010-08-22');
   load data local inpath '/root/data.am' into table beauty partition (nation="USA");
   select nation, avg(size) from beauties group by nation order by avg(size);

*************************************常見問題*********************************************************************

建立表的過程，報錯
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes

解決方案：修改資料庫的編碼utf-8 為latin1

大資料入門（15）hive簡介和配置

大資料入門（15）hive簡介和配置

大資料入門（13）zookeeper的安裝配置

大資料入門（4）hdfs的shell語法

大資料入門（3）配置hadoop

大資料入門（2）安裝linux的jdk

大資料入門（1）準備linux環境

大資料入門（8）hdfs的客戶端檔案操作

大資料入門（17）hbase叢集搭建

大資料入門（16）mysql5.6.26的rpm方式安裝

大資料入門（14）hadoop+yarn+zookeeper叢集搭建

大資料入門（12）mr倒排索引.

大資料入門（11）mr自定義分組和切片劃分

大資料入門（10）序列化機制，mr流量求和

大資料入門（9）mapreduce計算wordcount的程式編寫

大資料入門（7）RPC客戶端和RPC服務端通訊

大資料入門（6）hdfs的客戶端java

大資料入門（5）配置ssh免密登陸

大資料入門（20）kafka安裝配置

大資料入門（19）storm安裝配置

大資料入門（0）linux的基本命令

大資料入門（15）hive簡介和配置

相關推薦