1. 程式人生 > >spark1.6.3配置

spark1.6.3配置

1.本來想寫安裝spark2.3的,但是由於配置Hadoop時候jdk用的是1.7,而Spark2.3只支援JDK1.8。如果spark和Hadoop安裝的JDK版本不一樣,在yarn上執行spark會報錯。所以記錄的是spark1.x的安裝。

2.特別註明下,spark不要用CDH版本,有些jar包找不到,直接用Apache版本就好。

3.解壓後的spark 目錄如下:

[[email protected] spark-1.6.3]$ ll
total 1380
drwxr-xr-x 2 zuowei.zhang zuowei.zhang    4096 Nov  3  2016 bin
-rw-r--r-- 1 zuowei.zhang zuowei.zhang 1343562 Nov  3  2016 CHANGES.txt
drwxr-xr-x 2 zuowei.zhang zuowei.zhang     212 Dec 24 08:35 conf
drwxr-xr-x 3 zuowei.zhang zuowei.zhang      19 Nov  3  2016 data
drwxr-xr-x 3 zuowei.zhang zuowei.zhang      79 Nov  3  2016 ec2
drwxr-xr-x 3 zuowei.zhang zuowei.zhang      17 Nov  3  2016 examples
drwxr-xr-x 2 zuowei.zhang zuowei.zhang     237 Nov  3  2016 lib
-rw-r--r-- 1 zuowei.zhang zuowei.zhang   17352 Nov  3  2016 LICENSE
drwxr-xr-x 2 zuowei.zhang zuowei.zhang    4096 Nov  3  2016 licenses
-rw-r--r-- 1 zuowei.zhang zuowei.zhang   23529 Nov  3  2016 NOTICE
drwxr-xr-x 6 zuowei.zhang zuowei.zhang     119 Nov  3  2016 python
drwxr-xr-x 3 zuowei.zhang zuowei.zhang      17 Nov  3  2016 R
-rw-r--r-- 1 zuowei.zhang zuowei.zhang    3359 Nov  3  2016 README.md
-rw-r--r-- 1 zuowei.zhang zuowei.zhang     120 Nov  3  2016 RELEASE
drwxr-xr-x 2 zuowei.zhang zuowei.zhang    4096 Nov  3  2016 sbin

4.配置conf 目錄下slave檔案和spark-env.sh檔案

slave檔案配置work節點:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# A Spark Worker will be started on each of the machines listed below.
master.cn
slave1.cn
slave2.cn

spark-env.sh檔案配置如下:

export JAVA_HOME=/opt/java/jdk1.7.0_67
export SPARK_MASTER_IP=master.cn
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=1g
#spark on yarn
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/opt/cdh5.15.0/spark-1.6.3
export SPARK_JAR=/opt/cdh5.15.0/spark-1.6.3/lib/spark-assembly-1.6.3-hadoop2.6.0.jar
export PATH=$SPARK_HOME/bin:$PATH

 3.將spark檔案分發到各個節點:

scp -r /opt/cdh5.15.0/spark-1.6.3/ slave1.cn:/opt/cdh5.15.0/

4.spark執行在yarn上例子:

client模式:結果在xshell上可見

bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --executor-memory 1G --num-executors 1 lib/spark-examples-1.6.3-hadoop2.6.0.jar 100

cluster模式:結果在8088介面可見

bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --executor-memory 1G --num-executors 1 lib/spark-examples-1.6.3-hadoop2.6.0.jar 100

5.spark配置HistoryServer,並將如下配置分發到各個節點。

配置conf 目錄下的spark-defaults.conf

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master                     spark://master:7077
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://master.cn:8020/sparklog
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

配置spark-env.sh:

 

#HistoryServer
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://master.cn:8020/sparklog"  

 6.在任意一臺節點上啟動start-history-server.sh,在對應節點的UI:http://master.cn:18080/即可檢視