CDH5.15.0+spark1.6.0+hive1.1叢集與zeppelin0.8.1+spark-notebook打通踩坑總結
二進位制all包多為spark2 scala2.11的所以原始碼編譯本地相關版本相容的包的及其它hadoop hive yarn 版本,原始碼git下載編譯排錯見前邊文章,下為編譯合適版本後的安裝過程:
1.zeppelin081/conf/zeppelin-env.sh:
export MASTER=local[2] #yarn-client #export SCALA_HOME=/usr/share/scala export SCALA_HOME=/opt/soft/scala-2.10.5 export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive #export SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2 export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop if [ -n "$HADOOP_HOME" ]; then export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native fi #export SPARK_CONF_DIR=/etc/spark2/conf export SPARK_CONF_DIR=/etc/spark/conf export HIVE_CONF_DIR=/etc/hive/conf export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf} HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf} HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf} if [ -d "$HIVE_CONF_DIR" ]; then HADOOP_CONF_DIR="$HADOOP_CONF_DIR:$HIVE_CONF_DIR" fi export HADOOP_CONF_DIR export ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/hive/conf #export ZEPPELIN_INTP_CLASSPATH_OVERRIDES=:/etc/hive/conf:/usr/share/java/mysql-connector-java.jar:/opt/cloudera/parcels/CDH/lib/spark/lib/spark-assembly-1.6.0-cdh5.15.0-hadoop2.6.0-cdh5.15.0.jar:/opt/cloudera/parcels/CDH/jars/*:/opt/cloudera/parcels/CDH/lib/hive/lib/*:/opt/soft/zeppelin081/interpreter/spark/spark-interpreter-0.8.1.jar
2.ln -s /etc/hive/conf/hive-site.xml conf/
3.修改conf/zeppelin-site.xml 的啟動埠號
4.bin/zeppelin-daemon.sh restart 啟動 ,自動生成相關log run 和webapp目錄
5.看日誌報錯:
vi logs/zeppelin-root-master.log:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common
/collect/Queues
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Queues
解決:替換相關guava包 對應CDH lib目錄相關版本
cp /opt/cloudera/parcels/CDH/lib/hive/lib/guava-14.0.1.jar lib/
還報錯要guava-21
vi logs/zeppelin-root-master.out:
MultiException[java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/Ve
rsioned, java.lang.NoClassDefFoundError: org/glassfish/jersey/jackson/intern
al/jackson/jaxrs/json/JacksonJaxbJsonProvider]
解決:替換相關jackson 包對應CDH lib目錄相關版本
ls lib/|grep jackson
google-http-client-jackson-1.23.0.jar
google-http-client-jackson2-1.23.0.jar
jackson-annotations-2.8.0.jar.bak
jackson-core-2.8.10.jar.bak
jackson-core-asl-1.9.13.jar
jackson-databind-2.8.11.1.jar.bak
jackson-jaxrs-1.8.8.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations-2.8.10.jar.bak
jackson-xc-1.8.8.jar
jersey-media-json-jackson-2.27.jar
[root@master zeppelin081]# cp /opt/cloudera/parcels/CDH/jars/jackson-annotations-2.1.0.jar lib/
[root@master zeppelin081]# cp /opt/cloudera/parcels/CDH/jars/jackson-core-2.1.0.jar lib/
[root@master zeppelin081]# cp /opt/cloudera/parcels/CDH/jars/jackson-databind-2.1.0.jar lib/
[root@master zeppelin081]# cp /opt/cloudera/parcels/CDH/jars/jackson-module-jaxb-annotations-2.1.0.jar lib/
試多版本都不行後查要scala版:
cp /opt/cloudera/parcels/CDH/jars/jackson*2.2.3*.jar lib/
[root@master zeppelin081]# ls lib/jackson-
jackson-annotations-2.1.0.jar.bak
jackson-annotations-2.2.2.jar.bak
jackson-annotations-2.2.3.jar
jackson-annotations-2.3.1.jar.bak
jackson-annotations-2.8.0.jar.bak
jackson-core-2.1.0.jar.bak
jackson-core-2.2.2.jar.bak
jackson-core-2.2.3.jar
jackson-core-2.8.10.jar.bak
jackson-core-asl-1.9.13.jar
jackson-databind-2.1.0.jar.bak
jackson-databind-2.2.2.jar.bak
jackson-databind-2.2.3.jar
jackson-databind-2.8.11.1.jar.bak
jackson-jaxrs-1.8.8.jar
jackson-mapper-asl-1.9.13.jar
jackson-module-jaxb-annotations-2.1.0.jar.bak
jackson-module-jaxb-annotations-2.8.10.jar.bak
jackson-module-scala_2.10-2.2.3.jar
jackson-xc-1.8.8.jar
終於搞定!!!
=============
spark-notebook相對簡單下載解壓Scala [2.10.5] Spark [1.6.0] Hadoop [2.6.0] {Hive ✓} {Parquet ✓}
一樣連線 hive-site.xml: ln -s /etc/hive/conf/hive-site.xml conf/
改埠:vi conf/application.ini
然後可以完全不動直接啟後在改配置,但方便重啟,寫了個指令碼
bin/start.sh
#!/bin/bash
export MASTER=local[2]
#yarn-client
#export SCALA_HOME=/usr/share/scala
export SCALA_HOME=/opt/soft/scala-2.10.5
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
if [ -n "$HADOOP_HOME" ]; then
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi
export SPARK_CONF_DIR=/etc/spark/conf
export HIVE_CONF_DIR=/etc/hive/conf
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf}
HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf}
HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf}
if [ -d "$HIVE_CONF_DIR" ]; then
HADOOP_CONF_DIR="$HADOOP_CONF_DIR:$HIVE_CONF_DIR"
fi
export HADOOP_CONF_DIR
workdir=/opt/soft/spark-notebook
kill -9 `cat ${workdir}/RUNNING_PID`
rm -rf ${workdir}/derby.log ${workdir}/metastore_db ${workdir}/RUNNING_PID
${workdir}/bin/spark-notebook > snb.log 2>&1 &
開始一直連不上HIVE ,後來配置notebook metadata如下:
{
"name": "test",
"user_save_timestamp": "1970-01-01T08:00:00.000Z",
"auto_save_timestamp": "1970-01-01T08:00:00.000Z",
"language_info": {
"name": "scala",
"file_extension": "scala",
"codemirror_mode": "text/x-scala"
},
"trusted": true,
"customLocalRepo": null,
"customRepos": null,
"customDeps": null,
"customImports": [
"import scala.util._",
"import org.apache.spark.SparkContext._"
],
"customArgs": null,
"customSparkConf": {
"spark.master": "local[2]",
"hive.metastore.warehouse.dir": "/user/hive/warehouse",
"hive.metastore.uris": "thrift://master:9083",
"spark.sql.hive.metastore.version": "1.1.0",
"spark.sql.hive.metastore.jars": "/opt/cloudera/parcels/CDH/lib/hadoop/../hive/lib/*",
"hive.metastore.schema.verification": "false",
"spark.jars": "/usr/share/java/mysql-connector-java.jar,/opt/cloudera/parcels/CDH/lib/spark/lib/spark-assembly-1.6.0-cdh5.15.0-hadoop2.6.0-cdh5.15.0.jar",
"spark.driver.extraClassPath": "/etc/spark/conf:/etc/spark/conf/yarn-conf:/etc/hadoop/conf:/etc/hive/conf:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-mapreduce/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hive/lib/*:/opt/cloudera/parcels/CDH/jars/*:/opt/soft/spark-notebook/lib/*",
"spark.executor.extraClassPath": "/etc/spark/conf:/etc/spark/conf/yarn-conf:/etc/hadoop/conf:/etc/hive/conf:/opt/cloudera/parcels/CDH/lib/hadoop/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-hdfs/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-mapreduce/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hadoop-yarn/*:/opt/cloudera/parcels/CDH/lib/hadoop/../hive/lib/*:/opt/cloudera/parcels/CDH/jars/*:/opt/soft/spark-notebook/lib/*"
},
"kernelspec": {
"name": "spark",
"display_name": "Scala [2.10.5] Spark [1.6.0] Hadoop [2.6.0] {Hive ✓} {Parquet ✓}"
}
}
報錯:
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/AlreadyExistsException
經查原來spark-notebook編繹時自動使用HIVE1.2 metastore,cdh使用