Eclipse呼叫hadoop2執行MR程式
hadoop:hadoop2.2 ,windows myeclipse環境;
Eclipse呼叫hadoop執行MR程式其實就是普通的java程式可以提交MR任務到叢集執行而已。在Hadoop1中,只需指定jt(jobtracker)和fs(namenode)即可,一般如下:
Configuration conf = new Configuration();
conf.set("mapred.job.tracker", "192.168.128.138:9001");
conf.set("fs.default.name","192.168.128.138:9000");
上面的程式碼在hadoop1中執行是ok的,完全可以使用java提交任務到叢集執行。但是,hadoop2卻是沒有了jt,新增了yarn。這個要如何使用呢?最簡單的想法,同樣指定其配置,試試。恩,這樣配置後,可以執行,首先是下面的錯誤:Configuration conf = new YarnConfiguration(); conf.set("fs.defaultFS", "hdfs://node31:9000"); conf.set("mapreduce.framework.name", "yarn"); conf.set("yarn.resourcemanager.address", "node31:8032");
2014-04-03 21:20:21,568 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(303)) - Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76) at org.apache.hadoop.yarn.conf.YarnConfiguration.<clinit>(YarnConfiguration.java:345) at org.fansy.hadoop.mr.WordCount.getConf(WordCount.java:104) at org.fansy.hadoop.mr.WordCount.runJob(WordCount.java:84) at org.fansy.hadoop.mr.WordCount.main(WordCount.java:47)
這個錯誤不用管,這個好像是windows呼叫的時候就會出的錯誤。
然後是什麼許可權問題之類的,這個時候就需要去調整下許可權,至少我目前是這樣做的。調整的許可權主要有/tmp 以及執行wordcount的輸入、輸出目錄。命令如下: $HADOOP_HOME/bin/hadoop fs -chmod -R 777 /tmp 。
然後直到你出現了下面的錯誤,那麼,好了,可以說你已經成功了一半了。
用上面出現的錯誤去google,可以得到這個網頁:https://issues.apache.org/jira/browse/MAPREDUCE-5655 。 恩,對的。這個網頁就是我們的solution。2014-04-03 20:32:36,596 ERROR [main] security.UserGroupInformation (UserGroupInformation.java:doAs(1494)) - PriviledgedActionException as:Administrator (auth:SIMPLE) cause:java.io.IOException: Failed to run job : Application application_1396459813671_0001 failed 2 times due to AM Container for appattempt_1396459813671_0001_000002 exited with exitCode: 1 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control at org.apache.hadoop.util.Shell.runCommand(Shell.java:464) at org.apache.hadoop.util.Shell.run(Shell.java:379) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) .Failing this attempt.. Failing the application.
我們分為1、2、3步驟吧。
1. 修改MRapps.java 、YARNRunner.java的原始碼,然後打包替換原來的jar包中的相應class檔案,這兩個jar我已經打包,可以在這裡下載http://download.csdn.net/detail/fansy1990/7143547 。然後替換叢集中相應的jar吧,同時需要注意替換Myeclipse中匯入的包。額,說起Myeclipse中的jar包,這裡還是先上幅jar包的圖吧:
2. 修改mapred-default.xml ,新增:(這個只需在eclipse中匯入的jar包修改即可,修改後的jar包不用上傳到叢集)
<property>
<name>mapred.remote.os</name>
<value>Linux</value>
<description>
Remote MapReduce framework's OS, can be either Linux or Windows
</description>
</property>
(題外話,添加了這個屬性後,按說我new一個Configuration後,我使用conf.get("mapred.remote.os")的時候應該是可以得到Linux的,但是我得到的卻是null,這個就不清楚是怎麼了。)其檔案在:
這時,你再執行程式,額好吧程式基本可以提交了,但是還是報錯,檢視log,可以看到下面的錯誤:
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
額,說了這麼久,還是把我的wordcount程式貼出來吧:package org.fansy.hadoop.mr;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.ClusterStatus;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.yarn.conf.YarnConfiguration;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class WordCount {
private static Logger log = LoggerFactory.getLogger(WordCount.class);
public static class WCMapper extends Mapper<LongWritable, Text, LongWritable, Text> {
public void map(LongWritable key, Text value, Context cxt) throws IOException,InterruptedException {
// String[] values= value.toString().split("[,| ]");
cxt.write(key, value);
}
}
public static class WCReducer extends Reducer<LongWritable, Text, LongWritable,Text> {
public void reduce(LongWritable key, Iterable<Text> values, Context cxt) throws IOException,InterruptedException {
StringBuffer buff = new StringBuffer();
for (Text v:values) {
buff.append(v.toString()+"\t");
}
cxt.write(key, new Text(buff.toString()));
}
}
public static void main(String[] args) throws Exception {
// checkFS();
String input ="hdfs://node31:9000/input/test.dat";
String output="hdfs://node31:9000/output/wc003";
runJob(input,output);
// runJob(args[0],args[1]);
// upload();
}
/**
* test operate the hdfs
* @throws IOException
*/
public static void checkFS() throws IOException{
Configuration conf=getConf();
Path f= new Path("/user");
FileSystem fs = FileSystem.get(f.toUri(),conf);
RemoteIterator<LocatedFileStatus> paths=fs.listFiles(f, true);
while(paths.hasNext()){
System.out.println(paths.next());
}
}
public static void upload() throws IOException{
Configuration conf = getConf();
Path f= new Path("d:\\wordcount.jar");
FileSystem fs = FileSystem.get(f.toUri(),conf);
fs.copyFromLocalFile(true, f, new Path("/input/wordcount.jar"));
System.out.println("done ...");
}
/**
* test the job submit
* @throws IOException
* @throws InterruptedException
* @throws ClassNotFoundException
*/
public static void runJob(String input,String output) throws IOException, ClassNotFoundException, InterruptedException{
Configuration conf=getConf();
Job job = new Job(conf,"word count");
// job.setJar("hdfs://node31:9000/input/wordcount.jar");
job.setJobName("wordcount");
job.setJarByClass(WordCount.class);
// job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(WCMapper.class);
job.setCombinerClass(WCReducer.class);
job.setReducerClass(WCReducer.class);
FileInputFormat.addInputPath(job, new Path(input));
// SequenceFileOutputFormat.setOutputPath(job, new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(output));
System.exit(job.waitForCompletion(true)?0:1);
}
private static Configuration getConf() throws IOException{
Configuration conf = new YarnConfiguration();
conf.set("fs.defaultFS", "hdfs://node31:9000");
conf.set("mapreduce.framework.name", "yarn");
conf.set("yarn.resourcemanager.address", "node31:8032");
// conf.set("mapred.remote.os", "Linux");
System.out.println(conf.get("mapred.remote.os"));
// JobClient client = new JobClient(conf);
// ClusterStatus cluster = client.getClusterStatus();
return conf;
}
}
3. 如何修復上面的報錯?按照那個連結的solution,需要修改mapred-default.xml 和yarn-default.xml ,其中mapred-default.xml剛才已經修改過了,這次再次修改,新增:
<property>
<name>mapreduce.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>
對於yarn-default.xml也是同樣的修改,其在hadoop-yarn-common-2.2.0.jar包中,修改內容如下:<property>
<name>mapreduce.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>
同樣的,上面兩個jar包只用替換myeclipse中的jar包即可,不需要替換叢集中的。
4. 經過上面的替換,然後再次執行,出現下面的錯誤:
Caused by: java.lang.ClassNotFoundException: Class org.fansy.hadoop.mr.WordCount$WCMapper not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 8 more
額,好吧,我應該不用多少了,這樣的錯誤,應該已經說明我們的myeclipse可以提交任務到hadoop2了,並且可以運行了。好吧最後一步,上傳我們打包的wordcount程式的jar檔案到$HADOOP_HOME/share/hadoop/mapreduce/lib下面,然後再次執行。(這裡上傳後不用重啟叢集)呵呵,最後得到下面的結果:2014-04-03 21:17:34,289 ERROR [main] util.Shell (Shell.java:getWinUtilsPath(303)) - Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:278)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:300)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:293)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.yarn.conf.YarnConfiguration.<clinit>(YarnConfiguration.java:345)
at org.fansy.hadoop.mr.WordCount.getConf(WordCount.java:104)
at org.fansy.hadoop.mr.WordCount.runJob(WordCount.java:84)
at org.fansy.hadoop.mr.WordCount.main(WordCount.java:47)
Linux
2014-04-03 21:18:19,853 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-04-03 21:18:20,499 INFO [main] client.RMProxy (RMProxy.java:createRMProxy(56)) - Connecting to ResourceManager at node31/192.168.0.31:8032
2014-04-03 21:18:20,973 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(149)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2014-04-03 21:18:21,020 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(287)) - Total input paths to process : 1
2014-04-03 21:18:21,313 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(394)) - number of splits:1
2014-04-03 21:18:21,336 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - user.name is deprecated. Instead, use mapreduce.job.user.name
2014-04-03 21:18:21,337 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2014-04-03 21:18:21,337 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-03 21:18:21,338 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
2014-04-03 21:18:21,338 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
2014-04-03 21:18:21,339 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
2014-04-03 21:18:21,339 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.job.name is deprecated. Instead, use mapreduce.job.name
2014-04-03 21:18:21,339 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
2014-04-03 21:18:21,340 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
2014-04-03 21:18:21,340 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2014-04-03 21:18:21,342 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2014-04-03 21:18:21,343 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
2014-04-03 21:18:21,343 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840)) - mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
2014-04-03 21:18:21,513 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(477)) - Submitting tokens for job: job_1396463733942_0003
2014-04-03 21:18:21,817 INFO [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(174)) - Submitted application application_1396463733942_0003 to ResourceManager at node31/192.168.0.31:8032
2014-04-03 21:18:21,859 INFO [main] mapreduce.Job (Job.java:submit(1272)) - The url to track the job: http://node31:8088/proxy/application_1396463733942_0003/
2014-04-03 21:18:21,860 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1317)) - Running job: job_1396463733942_0003
2014-04-03 21:18:31,307 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1338)) - Job job_1396463733942_0003 running in uber mode : false
2014-04-03 21:18:31,311 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) - map 0% reduce 0%
2014-04-03 21:19:02,346 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) - map 100% reduce 0%
2014-04-03 21:19:11,416 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345)) - map 100% reduce 100%
2014-04-03 21:19:11,425 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1356)) - Job job_1396463733942_0003 completed successfully
2014-04-03 21:19:11,552 INFO [main] mapreduce.Job (Job.java:monitorAndPrintJob(1363)) - Counters: 43
File System Counters
FILE: Number of bytes read=11139
FILE: Number of bytes written=182249
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=8646
HDFS: Number of bytes written=10161
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=29330
Total time spent by all reduces in occupied slots (ms)=5825
Map-Reduce Framework
Map input records=235
Map output records=235
Map output bytes=10428
Map output materialized bytes=11139
Input split bytes=98
Combine input records=235
Combine output records=235
Reduce input groups=235
Reduce shuffle bytes=11139
Reduce input records=235
Reduce output records=235
Spilled Records=470
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=124
CPU time spent (ms)=21920
Physical memory (bytes) snapshot=299376640
Virtual memory (bytes) snapshot=1671372800
Total committed heap usage (bytes)=152834048
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=8548
File Output Format Counters
Bytes Written=10161
上面你看到Linux,是因為我使用了conf.set("mapred.remote.os", "Linux"); 不過在實際執行的時候卻不需要設定。
另外,如果是linux系統部署的tomcat呼叫hadoop2叢集執行MR程式的話,應該不需要替換其jar吧的,這個還有待驗證。
哈,總算搞定了。這個問題也算是困擾了我好久了,期間幾次想要衝破,結果都是無果而歸,甚是鬱悶。額,其實這個也不算是原創了,哎,國外在02/Dec/13 18:35這個時間點就搞定了。不過,我搜了好久,都沒有中文的相關介紹。(如果有的話,那就是我搜索能力的問題了,居然沒有搜到,哎)。
分享,成長,快樂