presto-0.147+postgresql-9.5.3+msql-5.0.7+hadoop-2.5.2+hive-1.2.1環境構築以及測試
阿新 • • 發佈:2019-01-02
背景
每個支援SQL的資料庫,都有一個強大的SQL引擎。
而對於SQL引擎,基本都是大同小異的,負責SQL文法解析,語意分析,指定查詢樹,優化查詢樹,再到最終的執行,客戶端返回結果。
而presto的也跟一般的是一樣的。
架構如下:
準備
1.postgresql-9.5.3
2.mysql-5.0.7
3.hadoop-2.5.2
4.hive-1.2.1
5.presto-server-0.147
6.presto-cli-0.147-executable.jar
且注意系統要求:
Mac OS X or Linux Java 8 Update 60 or higher (8u60+), 64-bit Maven 3.3.9+ (for building) Python 2.4+ (for running with the launcher script)
環境搭建_1
mysql,postgresql都是在windows這邊搭建的,直接就可以使用。
hadoop-2.5.2的搭建手順之前的博文中已經記載了,此處不再說明。
hive的環境,解壓後就可以使用了。
這裡主要說一下hive的兩種CLI工具:
1.hive shell
2.beeline
現在官網標記beeline是new,hive shell是older。建議使用beeline,結果的現實比較直觀易懂。跟mysql的比較像。
beeline的啟動,如果hive使用預設的debery資料庫的話,請使用下面的方式啟動
./bin/beeline -u jdbc:hive2://
另外,derby只能同時一個使用者使用,否則會報錯如下所示:
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/myProject/apache-hive-1.2.1-bin/metastore_db. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source) at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source) at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source) at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source) at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source) at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source) ... 83 more Error applying authorization policy on hive configuration: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
處理方法也很簡單:
rm -rf derby.log metastore_db
環境搭建_2
presto
- 解壓presto-server-0.147.tar.gz
- mkdir presto-server-0.147/etc
- mkdir presto-server-0.147/catalog
- vim etc/node.properties
node.environment=production node.id=1 node.data-dir=/home/myProject/presto-server-0.147/data
- etc/jvm.config
-server -Xmx16G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError=kill -9 %p
- etc/config.properties
coordinator=true node-scheduler.include-coordinator=true http-server.http.port=8080 query.max-memory=5GB query.max-memory-per-node=1GB discovery-server.enabled=true discovery.uri=http://your-hive-IP:8080
- etc/log.properties
com.facebook.presto=INFO
- 下載presto-cli-0.147-executable.jar
- 將其放置在bin目錄下
- 賦許可權
chmod +x presto-cli-0.147-executable.jar
- etc/catalog/mysql.properties
connector.name=mysql connection-url=jdbc:mysql://your-mysql-location-IP:3306 connection-user=your-mysql-username connection-password=your-mysql-password
- etc/catalog/postgresql.properties
connector.name=postgresql connection-url=jdbc:postgresql://your-postgresql-location-ip/postgres connection-user=your-postgres-username connection-password=your-postgresql-password
- etc/catalog/hive.properties
connector.name的選取參照如下資訊connector.name=hive-hadoop2 hive.metastore.uri=thrift://your-hive-ip:9083 hive.config.resources=/etc/hadoop/core-site.xml,/etc/hadoop/hdfs-site.xml
hive-hadoop1: Apache Hadoop 1.x hive-hadoop2: Apache Hadoop 2.x hive-cdh4: Cloudera CDH 4 hive-cdh5: Cloudera CDH 5
啟動
- bin/launcher start
- ./presto --server localhost:8080--cataloghive--schema default
結果
MySQL
presto:test_hive> select * from mysql.sqoop.t1;
id | int_col | char_col
----+---------+----------
1 | 1 | a
2 | 2 | b
4 | 4 | d
3 | 3 | c
5 | 5 | e
(5 rows)
Query 20160520_101400_00009_k46dt, FINISHED, 1 node
http://localhost:8080/query.html?20160520_101400_00009_k46dt
Splits: 2 total, 0 done (0.00%)
CPU Time: 0.0s total, 0 rows/s, 0B/s, 100% active
Per Node: 0.0 parallelism, 0 rows/s, 0B/s
Parallelism: 0.0
0:29 [0 rows, 0B] [0 rows/s, 0B/s]
postgresql
presto:test_hive> select * from postgresql.public.test;
id | name
----+------
1 | lily
2 | Tom
3 | Jim
(3 rows)
Query 20160520_101503_00010_k46dt, FINISHED, 1 node
http://localhost:8080/query.html?20160520_101503_00010_k46dt
Splits: 2 total, 0 done (0.00%)
CPU Time: 0.0s total, 0 rows/s, 0B/s, 0% active
Per Node: 0.0 parallelism, 0 rows/s, 0B/s
Parallelism: 0.0
0:02 [0 rows, 0B] [0 rows/s, 0B/s]
mysql&postgresql
presto:test_hive> select id,char_col from mysql.sqoop.t1 union select id,name from postgresql.public.test;
id | char_col
----+----------
1 | lily
2 | Tom
3 | Jim
1 | a
2 | b
4 | d
3 | c
5 | e
(8 rows)
Query 20160520_101532_00011_k46dt, FINISHED, 1 node
http://localhost:8080/query.html?20160520_101532_00011_k46dt
Splits: 6 total, 2 done (33.33%)
CPU Time: 0.0s total, 107 rows/s, 0B/s, 17% active
Per Node: 0.0 parallelism, 0 rows/s, 0B/s
Parallelism: 0.0
0:28 [3 rows, 0B] [0 rows/s, 0B/s]
hive
presto:test_hive> select count(*) from stream;
_col0
----------
10353632
(1 row)
Query 20160524_054416_00010_ceya5, FINISHED, 1 node
http://localhost:8080/query.html?20160524_054416_00010_ceya5
Splits: 42 total, 40 done (95.24%)
CPU Time: 6.0s total, 1.67M rows/s, 210MB/s, 18% active
Per Node: 0.7 parallelism, 1.16M rows/s, 146MB/s
Parallelism: 0.7
0:09 [10.1M rows, 1.24GB] [1.16M rows/s, 146MB/s]
mysql+postgresql+hive跨DB結合查詢
presto:test_hive> select char_col from mysql.mysqldb.test union select name from postgresql.public.test union select userid from stream limit 10;
char_col
-------------
lily
Tom
Jim
user_000087
user_000031
user_000062
user_000063
user_000088
user_000089
user_000064
(10 rows)
Query 20160524_054314_00009_ceya5, FINISHED, 1 node
http://localhost:8080/query.html?20160524_054314_00009_ceya5
Splits: 44 total, 1 done (2.27%)
CPU Time: 0.5s total, 651K rows/s, 80.4MB/s, 16% active
Per Node: 0.0 parallelism, 16.6K rows/s, 2.05MB/s
Parallelism: 0.0
0:21 [352K rows, 43.5MB] [16.6K rows/s, 2.05MB/s]
直接跨DB查詢,這是presto的一個特色,在生產環境下,海量資料的來源並不是單一的,為了能實時的進行資料分析,這個就顯得比較尤為方便了。
但從官方文件中來看,目前presto支援的資料來源只有以下13種:
1. Black Hole Connector
2. Cassandra Connector
3. Hive Connector
4. JMX Connector
5. Kafka Connector
6. Kafka Connector Tutorial
7. MongoDB Connector
8. MySQL Connector
9. PostgreSQL Connector
10. Redis Connector
11. System Connector
12. TPCH Connector
13. Local File Connector
另外,presto-jdbc-0.147.jar是標準的JDBC,下面型別的JDBC URL都是支援的:
jdbc:presto://host:port
jdbc:presto://host:port/catalog
jdbc:presto://host:port/catalog/schema
----over----