1. 程式人生 > >HAWQ上安裝PXF外掛,並訪問HDFS檔案資料

HAWQ上安裝PXF外掛,並訪問HDFS檔案資料

在安裝pxf外掛之前,可以先檢視一下基礎軟體對應的版本資訊:在hawq目錄下的pxf/gradle.properties檔案中

因我在安裝pxf之前,已經把hadoop及hawq安裝完,在後期所需低版本的hdfs,需要重新指定低版本的路徑(主要是jar包的路徑)

使用的hadoop版本為2.9.0,hawq版本2.4,hbase版本1.4.3

2、下載原始碼

git clone https://github.com/apache/hawq.git

3、編譯PXF

cd /hawq/pxf  #進入pxf原始碼路徑中
make          #編譯

在編譯過程中,若有錯誤提示資訊,將對應行的註釋資訊刪除即可

4、安裝PXF

mkdir -p /opt/pxf  #建立pxf的安裝目錄
export PXF_HOME=/opt/pxf #指定環境變數
make install #安裝pxf到指定的目錄/opt/pxf中

5、修改配置檔案

1)修改pxf-env.sh

export LD_LIBRARY_PATH=/usr/local/hadoop-2.7.1/lib/native:${LD_LIBRARY_PATH  ---hadoop的lib/native存放目錄
export PXF_LOGDIR=/opt/pxf/logs    ---pxf日誌存放目錄
export PXF_RUNDIR=/opt/pxf         ---pxf安裝目錄
export PXF_USER=${PXF_USER:-pxf    ---pxf所屬使用者
export PXF_PORT=${PXF_PORT:-51200         ---pxf埠號
export PXF_JVM_OPTS="-Xmx512M -Xss256K"   ---JVM_OPTS引數
export HADOOP_DISTRO=CUSTOM
export HADOOP_ROOT=/usr/local/hadoop-2.7.1    ---所需hadoop的版本路徑

2)修改pxf-log4j.properties

log4j.appender.ROLLINGFILE.File=/opt/pxf/logs/pxf-service.log    ---/opt/pxf/logs/日誌儲存路徑
建議,使用絕對路徑,不要使用環境變數

3)修改pxf-private.classpath

# PXF Configuration
/opt/pxf/conf

# PXF Libraries
/opt/pxf/lib/pxf-hbase.jar
/opt/pxf/lib/pxf-hdfs.jar
/opt/pxf/lib/pxf-hive.jar
/opt/pxf/lib/pxf-json.jar
/opt/pxf/lib/pxf-jdbc.jar
/opt/pxf/lib/pxf-ignite.jar

# Hadoop/Hive/HBase configurations
/usr/local/hadoop-2.7.1/etc/hadoop
#/usr/local/hadoop/hive/conf
/usr/local/hbase/conf
/usr/local/hadoop-2.7.1/share/hadoop/common/hadoop-common-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/hadoop-auth-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/asm-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/avro-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-cli-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-codec-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-collections-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-configuration-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-io-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-lang-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-logging-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-compress-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/guava-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/htrace-core*.jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/jetty-*.jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/jersey-core-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/jersey-server-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/log4j-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/protobuf-java-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/slf4j-api-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/snappy-java-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/gson-*[0-9].jar

# Pick Jackson 1.9 jars from hdfs dir for HDP tar and from mapreduce1 for CDH tar
/usr/local/hadoop-2.7.1/share/hadoop/[hdfs|madreduce1]/lib/jackson-core-asl-1.9*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/[hdfs|madreduce1]/lib/jackson-mapper-asl-1.9*[0-9].jar

# Hive Libraries

# HBase Libraries
/usr/local/hbase/lib/hbase-client*.jar
/usr/local/hbase/lib/hbase-common*.jar
/usr/local/hbase/lib/hbase-protocol*.jar
/usr/local/hbase/lib/htrace-core*.jar
/usr/local/hbase/lib/netty*.jar
/usr/local/hbase/lib/zookeeper*.jar
/usr/local/hbase/lib/metrics-core*.jar

注意,要寫上hadoop/hbase對應的絕對路徑,因我沒有安裝hive,把對應hive的路徑都註釋掉了,不然後期會提示缺少對應jar或者路徑錯誤等資訊

4)將下圖中的配置檔案中的路徑與上圖的路徑修改成一致

pxf-privatebigtop.classpath
pxf-privatehdp.classpath
pxf-privatephd.classpath
pxf-public.classpath

6、初始化PXF

cd /opt/pxf/bin #進入pxf的安裝目錄中
執行命令 pxf init

7、啟動PXF

pxf start

8、訪問HDFS檔案資料

1)hadoop dfs -mkdir -p /data/pxf_examples    #建立HDFS檔案目錄
2)建立文字資料檔案pxf_hdfs_simple.txt
echo 'Prague,Jan,101,4875.33
Rome,Mar,87,1557.39
Bangalore,May,317,8936.99
Beijing,Jul,411,11600.67' > /tmp/pxf_hdfs_simple.txt
3)將資料檔案新增到hdfs
hadoop dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/
4)檢視儲存在hdfs中的檔案資訊:
hadoop dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
5)使用HdfsTextSimple配置檔案從pxf_hdfs_simple.txt中建立可查詢的HAWQ外部表:
gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
          FORMAT 'TEXT' (delimiter=E',');
6)查詢
gpadmin=# SELECT * FROM pxf_hdfs_textsimple;