【原創】大數據基礎之Spark（9）spark部署方式yarn/mesos

阿新 • • 發佈：2019-02-25

cli 原創 container 大數據 per containe ber exe 調整

1 下載 https://spark.apache.org/downloads.html

$ wget http://mirrors.shu.edu.cn/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

2 解壓

$ tar xvf spark-2.4.0-bin-hadoop2.7.tgz
$ cd spark-2.4.0-bin-hadoop2.7

3 配置環境變量SPARK_HOME

$ export SPARK_HOME=/path/to/spark-2.4.0-bin-hadoop2.7

4 啟動

以spark-sql為例

4.1 spark on yarn

只需要配置環境變量 HADOOP_CONF_DIR

$ bin/spark-sql --master yarn

更多參數

--deploy-mode cluster
--driver-memory 4g
--driver-cores 1
--executor-memory 2g
--executor-cores 1
--num-executors 1
--queue thequeue

4.2 spark on mesos

$ bin/spark-sql --master mesos://zk://172.19.28.186:2181,172.19.28.188:2181,172.19.28.190:2181/mesos

更多參數

--deploy-mode cluster
--supervise
--executor-memory 20G
--executor-cores 1
--total-executor-cores 100

註意此時沒有--num-executors參數，間接配置方法 --num-executors = --total-executor-cores / --executor-cores

Executor memory: spark.executor.memory
Executor cores: spark.executor.cores
Number of executors: spark.cores.max/spark.executor.cores

註意：spark on yarn 有可能啟動報錯

19/02/25 17:54:20 ERROR cluster.YarnClientSchedulerBackend: Yarn application has already exited with state FINISHED!

查看nodemanager日誌發現原因

2019-02-25 17:54:19,481 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=48342,containerID=container_1551078668160_0012_02_000001] is running beyond virtual memory limits. Current usage: 380.9 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.

需要調整yarn-site.xml配置

<property>

<name>yarn.nodemanager.vmem-check-enabled</name>

<value>false</value>

</property>

<property>

<name>yarn.nodemanager.vmem-pmem-ratio</name>

<value>4</value>

</property>

【原創】大數據基礎之Spark（9）spark部署方式yarn/mesos

【原創】大數據基礎之Benchmark（4）TPC-DS測試結果（hive spark impala）

內存 1.5 測試數據大數據基礎 .com cpu mas exe apr 1 測試集群內存：256GCPU：32Core （Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz）Disk（系統盤）：300GDisk（數據盤）：1.5T*

【原創】大數據基礎之Kudu（1）簡介、安裝

變化決策 leader 通用修改 amp use case 容錯性 stream kudu 1.7 官方：https://kudu.apache.org/ 一簡介 kudu有很多概念，有分布式文件系統（HDFS），有一致性算法（Zookeeper），有Table

【原創】大數據基礎之Presto（1）簡介、安裝、使用

epo embedded mach img ans 公司 mkdir redis running presto 0.217 官方：http://prestodb.github.io/ 一簡介 Presto is an open source distrib

【原創】大數據基礎之ElasticSearch（5）重要配置及調優

acc del refresh part closed efault end read_only bsp Index Settings 重要索引配置 Index level settings can be set per-index. Settings may be:

【原創】大數據基礎之Logstash（4）高可用

htm 無法 sep fsync sage tin www cert upd logstash高可用體現為不丟數據（前提為服務器短時間內不可用後可恢復比如重啟服務器或重啟進程），具體有兩個方面：進程重啟（服務器重啟）事件消息處理失敗在logstash中對

【原創】大數據基礎之Spark（4）RDD原理及代碼解析

sso 數據 queue running upd parallel input gettime side 一簡介 spark核心是RDD，官方文檔地址：https://spark.apache.org/docs/latest/rdd-programming-guide.h

【原創】大數據基礎之Spark（7）spark讀取文件split過程（即RDD分區數量）

ali ces ORC row mapreduce 獲取 sse repo 大致 spark 2.1.1 spark初始化rdd的時候，需要讀取文件，通常是hdfs文件，在讀文件的時候可以指定最小partition數量，這裏只是建議的數量，實際可能比這個要大（比如文件特別多

【原創】大數據基礎之Spark（9）spark部署方式yarn/mesos

cli 原創 container 大數據 per containe ber exe 調整 1 下載 https://spark.apache.org/downloads.html $ wget http://mirrors.shu.edu.cn/apache/spar

【原創】大數據基礎之Mesos（1）簡介、安裝、使用

物理 variable 服務器集群 ast 過程 ould task pos 編譯 Mesos 1.7.1 官方：http://mesos.apache.org/ 一簡介 Program against your datacenter like it’s a sin

【原創】大數據基礎之集群搭建

centos7 ini redis ril ystemd use ive ges env Cluster Platform redhat/centos7, docker, mesos, cloudera manager(cdh) Checklist 1 check u

【原創】大資料基礎之Hive（1）Hive SQL執行過程

hive 2.1 hive執行sql有兩種方式：執行hive命令，又細分為hive -e，hive -f，hive互動式；執行beeline命令，beeline會連線遠端thrift server；下面分別看這些場景下sql是怎樣被執行的： 1 hive命令啟動

【原創】運維基礎之Ansible（1）簡介、安裝和使用

ets 安裝 yum ant gem get 結構 ges describe 官方：https://www.ansible.com/ 一簡介 Ansible is a radically simple IT automation engine that automate

【原創】運維基礎之Nginx（1）簡介、安裝、使用

官方：http://nginx.org nginx [engine x] is an HTTP and reverse proxy server, a mail proxy server, and a generic TCP/UDP proxy server, originally written by

【原創】算法基礎之Anaconda（1）簡介、安裝、使用

https orf ati 2.7 容易 ice range gcc x86_64 Anaconda 2 官方：https://www.anaconda.com/ 一簡介 The Most Popular Python Data Science Platform A

【原創】運維基礎之Redis（1）簡介、安裝、使用

lists 腳本分享 ngs 參考 ports eos 運維基礎 lru redis 5.0.3 官方：https://redis.io/ 一簡介 Redis is an open source (BSD licensed), in-memory data str

【原創】運維基礎之Nginx（3）location

大小 uri 規則 ati 第一個基礎匹配規則最大 cati nginx location =：精確匹配（必須全部相等） ~：大小寫敏感，正則匹配 ~*：忽略大小寫，正則匹配 ^~：只需匹配uri部分，精確匹配 @：內部服務跳轉，精確匹配規則

大數據基礎之Oozie（2）常見問題

lns odi dir 大數據基礎 dep rect false tar cat 1 oozie如何查看任務日誌？通過oozie job id可以查看流程詳細信息，命令如下： oozie job -info 0012077-180830142722522-oozie-ha

大數據基礎之ORC（1）簡介

ups fields with including seve cor val posit record https://orc.apache.org Optimized Row Columnar (ORC) file 層次結構： file -> stripes

【原創】大資料基礎之Spark（4）RDD原理及程式碼解析

一簡介 spark核心是RDD，官方文件地址：https://spark.apache.org/docs/latest/rdd-programming-guide.html#resilient-distributed-datasets-rdds官方描述如下：重點是可容錯，可並行處理 Spark r

【原創】大資料基礎之Spark（5）Shuffle實現原理及程式碼解析

一簡介 Shuffle，簡而言之，就是對資料進行重新分割槽，其中會涉及大量的網路io和磁碟io，為什麼需要shuffle，以詞頻統計reduceByKey過程為例， serverA：partition1: (hello, 1), (word, 1)serverB：partition2: (hell

【原創】大數據基礎之Spark（9）spark部署方式yarn/mesos

1 下載 https://spark.apache.org/downloads.html

2 解壓

3 配置環境變量SPARK_HOME

4 啟動

4.1 spark on yarn

4.2 spark on mesos

相關推薦