HAWQ上安裝PXF外掛,並訪問HDFS檔案資料
阿新 • • 發佈:2018-12-20
在安裝pxf外掛之前,可以先檢視一下基礎軟體對應的版本資訊:在hawq目錄下的pxf/gradle.properties檔案中
因我在安裝pxf之前,已經把hadoop及hawq安裝完,在後期所需低版本的hdfs,需要重新指定低版本的路徑(主要是jar包的路徑)
使用的hadoop版本為2.9.0,hawq版本2.4,hbase版本1.4.3
2、下載原始碼
git clone https://github.com/apache/hawq.git
3、編譯PXF
cd /hawq/pxf #進入pxf原始碼路徑中
make #編譯
在編譯過程中,若有錯誤提示資訊,將對應行的註釋資訊刪除即可
4、安裝PXF
mkdir -p /opt/pxf #建立pxf的安裝目錄
export PXF_HOME=/opt/pxf #指定環境變數
make install #安裝pxf到指定的目錄/opt/pxf中
5、修改配置檔案
1)修改pxf-env.sh
export LD_LIBRARY_PATH=/usr/local/hadoop-2.7.1/lib/native:${LD_LIBRARY_PATH ---hadoop的lib/native存放目錄 export PXF_LOGDIR=/opt/pxf/logs ---pxf日誌存放目錄 export PXF_RUNDIR=/opt/pxf ---pxf安裝目錄 export PXF_USER=${PXF_USER:-pxf ---pxf所屬使用者 export PXF_PORT=${PXF_PORT:-51200 ---pxf埠號 export PXF_JVM_OPTS="-Xmx512M -Xss256K" ---JVM_OPTS引數 export HADOOP_DISTRO=CUSTOM export HADOOP_ROOT=/usr/local/hadoop-2.7.1 ---所需hadoop的版本路徑
2)修改pxf-log4j.properties
log4j.appender.ROLLINGFILE.File=/opt/pxf/logs/pxf-service.log ---/opt/pxf/logs/日誌儲存路徑
建議,使用絕對路徑,不要使用環境變數
3)修改pxf-private.classpath
# PXF Configuration /opt/pxf/conf # PXF Libraries /opt/pxf/lib/pxf-hbase.jar /opt/pxf/lib/pxf-hdfs.jar /opt/pxf/lib/pxf-hive.jar /opt/pxf/lib/pxf-json.jar /opt/pxf/lib/pxf-jdbc.jar /opt/pxf/lib/pxf-ignite.jar # Hadoop/Hive/HBase configurations /usr/local/hadoop-2.7.1/etc/hadoop #/usr/local/hadoop/hive/conf /usr/local/hbase/conf /usr/local/hadoop-2.7.1/share/hadoop/common/hadoop-common-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/hadoop-auth-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/asm-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/avro-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-cli-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-codec-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-collections-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-configuration-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-io-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-lang-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-logging-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-compress-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/guava-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/htrace-core*.jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/jetty-*.jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/jersey-core-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/jersey-server-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/log4j-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/protobuf-java-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/slf4j-api-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/snappy-java-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/common/lib/gson-*[0-9].jar # Pick Jackson 1.9 jars from hdfs dir for HDP tar and from mapreduce1 for CDH tar /usr/local/hadoop-2.7.1/share/hadoop/[hdfs|madreduce1]/lib/jackson-core-asl-1.9*[0-9].jar /usr/local/hadoop-2.7.1/share/hadoop/[hdfs|madreduce1]/lib/jackson-mapper-asl-1.9*[0-9].jar # Hive Libraries # HBase Libraries /usr/local/hbase/lib/hbase-client*.jar /usr/local/hbase/lib/hbase-common*.jar /usr/local/hbase/lib/hbase-protocol*.jar /usr/local/hbase/lib/htrace-core*.jar /usr/local/hbase/lib/netty*.jar /usr/local/hbase/lib/zookeeper*.jar /usr/local/hbase/lib/metrics-core*.jar
注意,要寫上hadoop/hbase對應的絕對路徑,因我沒有安裝hive,把對應hive的路徑都註釋掉了,不然後期會提示缺少對應jar或者路徑錯誤等資訊
4)將下圖中的配置檔案中的路徑與上圖的路徑修改成一致
pxf-privatebigtop.classpath
pxf-privatehdp.classpath
pxf-privatephd.classpath
pxf-public.classpath
6、初始化PXF
cd /opt/pxf/bin #進入pxf的安裝目錄中
執行命令 pxf init
7、啟動PXF
pxf start
8、訪問HDFS檔案資料
1)hadoop dfs -mkdir -p /data/pxf_examples #建立HDFS檔案目錄
2)建立文字資料檔案pxf_hdfs_simple.txt
echo 'Prague,Jan,101,4875.33
Rome,Mar,87,1557.39
Bangalore,May,317,8936.99
Beijing,Jul,411,11600.67' > /tmp/pxf_hdfs_simple.txt
3)將資料檔案新增到hdfs
hadoop dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/
4)檢視儲存在hdfs中的檔案資訊:
hadoop dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
5)使用HdfsTextSimple配置檔案從pxf_hdfs_simple.txt中建立可查詢的HAWQ外部表:
gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
FORMAT 'TEXT' (delimiter=E',');
6)查詢
gpadmin=# SELECT * FROM pxf_hdfs_textsimple;