spark1.6.3配置
阿新 • • 發佈:2018-12-31
1.本來想寫安裝spark2.3的,但是由於配置Hadoop時候jdk用的是1.7,而Spark2.3只支援JDK1.8。如果spark和Hadoop安裝的JDK版本不一樣,在yarn上執行spark會報錯。所以記錄的是spark1.x的安裝。
2.特別註明下,spark不要用CDH版本,有些jar包找不到,直接用Apache版本就好。
3.解壓後的spark 目錄如下:
[[email protected] spark-1.6.3]$ ll total 1380 drwxr-xr-x 2 zuowei.zhang zuowei.zhang 4096 Nov 3 2016 bin -rw-r--r-- 1 zuowei.zhang zuowei.zhang 1343562 Nov 3 2016 CHANGES.txt drwxr-xr-x 2 zuowei.zhang zuowei.zhang 212 Dec 24 08:35 conf drwxr-xr-x 3 zuowei.zhang zuowei.zhang 19 Nov 3 2016 data drwxr-xr-x 3 zuowei.zhang zuowei.zhang 79 Nov 3 2016 ec2 drwxr-xr-x 3 zuowei.zhang zuowei.zhang 17 Nov 3 2016 examples drwxr-xr-x 2 zuowei.zhang zuowei.zhang 237 Nov 3 2016 lib -rw-r--r-- 1 zuowei.zhang zuowei.zhang 17352 Nov 3 2016 LICENSE drwxr-xr-x 2 zuowei.zhang zuowei.zhang 4096 Nov 3 2016 licenses -rw-r--r-- 1 zuowei.zhang zuowei.zhang 23529 Nov 3 2016 NOTICE drwxr-xr-x 6 zuowei.zhang zuowei.zhang 119 Nov 3 2016 python drwxr-xr-x 3 zuowei.zhang zuowei.zhang 17 Nov 3 2016 R -rw-r--r-- 1 zuowei.zhang zuowei.zhang 3359 Nov 3 2016 README.md -rw-r--r-- 1 zuowei.zhang zuowei.zhang 120 Nov 3 2016 RELEASE drwxr-xr-x 2 zuowei.zhang zuowei.zhang 4096 Nov 3 2016 sbin
4.配置conf 目錄下slave檔案和spark-env.sh檔案
slave檔案配置work節點:
# # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # A Spark Worker will be started on each of the machines listed below. master.cn slave1.cn slave2.cn
spark-env.sh檔案配置如下:
export JAVA_HOME=/opt/java/jdk1.7.0_67 export SPARK_MASTER_IP=master.cn export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=1 export SPARK_WORKER_INSTANCES=1 export SPARK_WORKER_MEMORY=1g #spark on yarn export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_HOME=/opt/cdh5.15.0/spark-1.6.3 export SPARK_JAR=/opt/cdh5.15.0/spark-1.6.3/lib/spark-assembly-1.6.3-hadoop2.6.0.jar export PATH=$SPARK_HOME/bin:$PATH
3.將spark檔案分發到各個節點:
scp -r /opt/cdh5.15.0/spark-1.6.3/ slave1.cn:/opt/cdh5.15.0/
4.spark執行在yarn上例子:
client模式:結果在xshell上可見
bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --executor-memory 1G --num-executors 1 lib/spark-examples-1.6.3-hadoop2.6.0.jar 100
cluster模式:結果在8088介面可見
bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --executor-memory 1G --num-executors 1 lib/spark-examples-1.6.3-hadoop2.6.0.jar 100
5.spark配置HistoryServer,並將如下配置分發到各個節點。
配置conf 目錄下的spark-defaults.conf
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
# Example:
# spark.master spark://master:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master.cn:8020/sparklog
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory 5g
# spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
配置spark-env.sh:
#HistoryServer
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://master.cn:8020/sparklog"
6.在任意一臺節點上啟動start-history-server.sh,在對應節點的UI:http://master.cn:18080/即可檢視