1. 程式人生 > >用pycharm + python寫spark(spark-2.0.1-bin-hadoop2.6)

用pycharm + python寫spark(spark-2.0.1-bin-hadoop2.6)

一、將pyspark放入:
在pycharm看位置

該目錄位置(我的是mac):
/Library/Python/2.7/site-packages

二、env配置:
步驟1:
配置env1
步驟2:
配置env2
步驟3:
配置env3
SPARK_CLASSPATH
/Users/Chaves/workspace/spark/hbase-0.98.3/lib/:/Users/Chaves/workspace/spark/spark-2.0.1-bin-hadoop2.6/lib/:

SPARK_HOME
/Users/Chaves/workspace/spark/spark-2.0.1-bin-hadoop2.6

三、執行命令:
1,spark 終端執行命令
如本機spark包位置
/Users/個人目錄/workspace/spark/spark-2.0.1-bin-hadoop2.6/conf
spark-env.sh

2,修改SPARK_CLASSPATH
2.1,在以下spark的bin目錄下執行:
/Users/個人目錄/workspace/spark/spark-2.0.1-bin-hadoop2.6/bin

2.2,啟動命令(./spark-submit –jars包)
./spark-submit —jars jar包地址 —py-files 工具包 演算法檔案地址 引數1 引數2 引數3 …nt

2.0與1.0的區別:

spark = SparkSession.builder.master("local").appName("pyspark2_0_1_test").getOrCreate()
sc = self.spark
.sparkContext hc = HiveContext(sc) ....