1. 程式人生 > >Hadoop HA 模式下執行spark 程式

Hadoop HA 模式下執行spark 程式

   (1)將Hadoop的hdfs-site.xml 和core-site.xml檔案複製到spark/conf目錄下

  (2)追加如下內容到 spark-defaults.conf檔案

  1. spark.files file:///home/hadoop/spark/conf/hdfs-site.xml,file:///home/hadoop/spark/conf/core-site.xml  
spark.files file:///home/hadoop/spark/conf/hdfs-site.xml,file:///home/hadoop/spark/conf/core-site.xml
          如果不加這個,會有如下問題發生:

Java.lang.IllegalArgumentException: java.NET.UnknownHostException: mycluster
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:231)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)

    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)

  (3)讀取hdfs中的lzo檔案,並且分片來執行

[html] view plain copy print?在CODE上檢視程式碼片派生到我的程式碼片
  1. import org.apache.hadoop.io._  
  2. import com.hadoop.mapreduce._  
  3. val data = sc.newAPIHadoopFile[LongWritable, Text, LzoTextInputFormat]("hdfs://mycluster/user/hive/warehouse/logs_app_nginx/
    logdate=20160322/loghost=70/var.log.nginx.access_20160322.log.70.lzo")  
  4. data.count()  
import org.apache.hadoop.io._
import com.hadoop.mapreduce._
val data = sc.newAPIHadoopFile[LongWritable, Text, LzoTextInputFormat]("hdfs://mycluster/user/hive/warehouse/logs_app_nginx/logdate=20160322/loghost=70/var.log.nginx.access_20160322.log.70.lzo")
data.count()

      (4)讀取hdfs中的萬用字元表示的目錄和子目錄下檔案,並且分片來執行

[html] view plain copy print?在CODE上檢視程式碼片派生到我的程式碼片
  1. import org.apache.hadoop.io._  
  2. import com.hadoop.mapreduce._  
  3. val dirdata = sc.newAPIHadoopFile[LongWritable, Text, LzoTextInputFormat]("hdfs://mycluster/user/hive/warehouse/logs_app_nginx/logdate=20160322/loghost=*/")  
  4. dirdata.count()