1. 程式人生 > >大數據學習(8)Hive基礎

大數據學習(8)Hive基礎

fall nat value onf change expected role blog tab

什麽是Hive

Hive是一個基於HDFS的查詢引擎。我們日常中的需求如果都自己去寫MapReduce來實現的話會很費勁的,Hive把日常用到的MapReduce功能,比如排序、分組等功能進行了抽象,對外提供類似於普通數據庫的查詢服務。

它只是封裝MapReduce計算,但它本質並不是數據庫服務,不適合作為聯機服務。通常用於數據倉庫的離線計算中。
在Hive中已經明確說明,不建議使用MapReduce了,而推薦使用Spark。

安裝

tar -zxvf apache-hive-1.2.2-bin.tar.gz -C /usr/local/

hive-site.xml:

<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://
localhost/hive_metastore?createDatabaseIfNotExist=true</value> <description>metadata is stored in a MySQL server</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>MySQL JDBC driver class</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>user name for
connecting to mysql server</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root</value> <description>password for connecting to mysql server</description> </property> </configuration>

拷貝MySQL驅動jar到hive的lib目錄中。

先啟動HDFS:

start-dfs.sh

再啟動hive

如果報錯:

Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)
at jline.TerminalFactory.get(TerminalFactory.java:158)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:229)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.console.ConsoleReader.<init>(ConsoleReader.java:230)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
at org.apache.hadoop.hive.cli.CliDriver.setupConsoleReader(CliDriver.java:787)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:721)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

需要把hive下lib目錄中的jline jar文件替換到hadoop中yarn目錄中:

rm /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/jline-0.9.94.jar 
cp /usr/local/apache-hive-1.2.2-bin/lib/jline-2.12.jar /usr/local/hadoop-2.6.5/share/hadoop/yarn/lib/

啟動之後會自動創建數據庫,登錄數據庫中可以查看到一些元信息:

mysql> use hive_metastore;
mysql> use hive_metastore ;
mysql> select * from dbs;
+-------+-----------------------+------------------------------------------+---------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+------------------------------------------+---------+------------+------------+
| 1 | Default Hive database | hdfs://centos01:9000/user/hive/warehouse | default | public | ROLE |
+-------+-----------------------+------------------------------------------+---------+------------+------------+
1 row in set (0.00 sec)

Hive基本操作

DDL
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
參考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

Thrift

啟動hiveserver2服務

./hiveserver2

使用beeline連接到這個服務上:

[root@centos01 bin]# ./beeline 
Beeline version 1.2.2 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: root
Enter password for jdbc:hive2://localhost:10000: ****
Connected to: Apache Hive (version 1.2.2)
Driver: Hive JDBC (version 1.2.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
+----------------+--+
1 row selected (3.666 seconds)
0: jdbc:hive2://localhost:10000>

大數據學習(8)Hive基礎