大資料教程（9.4）用java -jar的方式執行mr程式

阿新 • • 發佈：2018-12-08

上一篇部落格分享了mapreduce在yarn上的執行流程，本篇博主將分享 1.如何使用:jar -jar的方式執行mr程式、2.如何在本地提交mapreduce程式到叢集上去執行；

一、使用:jar -jar的方式執行mr程式（此方法匯出的jar包包含了所有依賴，所以會佔用很大空間，不推薦使用）

（1）使用jar -jar的方式執行，我們需要匯出為可執行jar

(2).提交任務的主類程式碼需要修改（並且我們需要將jar包上傳到對於的下面設定的jar包的路徑）

二、在本地提交mapreduce程式到叢集上去執行

（1）將core-site.xml、mapred-site.xml、yarn-site.xml 、hdfs-site.xml四個檔案從叢集中下載下來並且放到工程中引用

（2）需要將jar包的位置設定為本地路徑

（3）需要重寫YARNRunner提交類裡面的生成環境變數的程式碼（因為在windows中提交的時候生成的環境變數是windows格式的，我們需要修改為linux版本的），該程式碼需要放在專案工程中，包名要一樣，且打包的時候要包含.

/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements.  See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership.  The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License.  You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.hadoop.mapred;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.nio.ByteBuffer;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.classification.InterfaceAudience.Private;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileContext;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.UnsupportedFileSystemException;
import org.apache.hadoop.io.DataOutputBuffer;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.ipc.ProtocolSignature;
import org.apache.hadoop.mapreduce.Cluster.JobTrackerStatus;
import org.apache.hadoop.mapreduce.ClusterMetrics;
import org.apache.hadoop.mapreduce.Counters;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.JobID;
import org.apache.hadoop.mapreduce.JobStatus;
import org.apache.hadoop.mapreduce.MRJobConfig;
import org.apache.hadoop.mapreduce.QueueAclsInfo;
import org.apache.hadoop.mapreduce.QueueInfo;
import org.apache.hadoop.mapreduce.TaskAttemptID;
import org.apache.hadoop.mapreduce.TaskCompletionEvent;
import org.apache.hadoop.mapreduce.TaskReport;
import org.apache.hadoop.mapreduce.TaskTrackerInfo;
import org.apache.hadoop.mapreduce.TaskType;
import org.apache.hadoop.mapreduce.TypeConverter;
import org.apache.hadoop.mapreduce.protocol.ClientProtocol;
import org.apache.hadoop.mapreduce.security.token.delegation.DelegationTokenIdentifier;
import org.apache.hadoop.mapreduce.v2.LogParams;
import org.apache.hadoop.mapreduce.v2.api.MRClientProtocol;
import org.apache.hadoop.mapreduce.v2.api.protocolrecords.GetDelegationTokenRequest;
import org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils;
import org.apache.hadoop.mapreduce.v2.util.MRApps;
import org.apache.hadoop.security.Credentials;
import org.apache.hadoop.security.SecurityUtil;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.security.authorize.AccessControlList;
import org.apache.hadoop.security.token.Token;
import org.apache.hadoop.yarn.api.ApplicationConstants;
import org.apache.hadoop.yarn.api.ApplicationConstants.Environment;
import org.apache.hadoop.yarn.api.records.ApplicationAccessType;
import org.apache.hadoop.yarn.api.records.ApplicationId;
import org.apache.hadoop.yarn.api.records.ApplicationReport;
import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext;
import org.apache.hadoop.yarn.api.records.ContainerLaunchContext;
import org.apache.hadoop.yarn.api.records.LocalResource;
import org.apache.hadoop.yarn.api.records.LocalResourceType;
import org.apache.hadoop.yarn.api.records.LocalResourceVisibility;
import org.apache.hadoop.yarn.api.records.Priority;
import org.apache.hadoop.yarn.api.records.ReservationId;
import org.apache.hadoop.yarn.api.records.Resource;
import org.apache.hadoop.yarn.api.records.ResourceRequest;
import org.apache.hadoop.yarn.api.records.URL;
import org.apache.hadoop.yarn.api.records.YarnApplicationState;
import org.apache.hadoop.yarn.conf.YarnConfiguration;
import org.apache.hadoop.yarn.exceptions.YarnException;
import org.apache.hadoop.yarn.factories.RecordFactory;
import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider;
import org.apache.hadoop.yarn.security.client.RMDelegationTokenSelector;
import org.apache.hadoop.yarn.util.ConverterUtils;

import com.google.common.annotations.VisibleForTesting;

/**
 * This class enables the current JobClient (0.22 hadoop) to run on YARN.
 */
@SuppressWarnings("unchecked")
public class YARNRunner implements ClientProtocol {

    private static final Log           LOG                   = LogFactory.getLog(YARNRunner.class);

    private static final String        RACK_GROUP            = "rack";
    private static final String        NODE_IF_RACK_GROUP    = "node1";
    private static final String        NODE_IF_NO_RACK_GROUP = "node2";

    /**
     * Matches any of the following patterns with capturing groups:
     * <ul>
     * <li>/rack</li>
     * <li>/rack/node</li>
     * <li>node (assumes /default-rack)</li>
     * </ul>
     * The groups can be retrieved using the RACK_GROUP, NODE_IF_RACK_GROUP,
     * and/or NODE_IF_NO_RACK_GROUP group keys.
     */
    private static final Pattern       RACK_NODE_PATTERN     = Pattern.compile(String.format(
            "(?<%s>[^/]+?)|(?<%s>/[^/]+?)(?:/(?<%s>[^/]+?))?", NODE_IF_NO_RACK_GROUP, RACK_GROUP, NODE_IF_RACK_GROUP));

    private final static RecordFactory recordFactory         = RecordFactoryProvider.getRecordFactory(null);

    public final static Priority       AM_CONTAINER_PRIORITY = recordFactory.newRecordInstance(Priority.class);
    static {
        AM_CONTAINER_PRIORITY.setPriority(0);
    }

    private ResourceMgrDelegate resMgrDelegate;
    private ClientCache         clientCache;
    private Configuration       conf;
    private final FileContext   defaultFileContext;

    /**
     * Yarn runner incapsulates the client interface of yarn
     * 
     * @param conf the configuration object for the client
     */
    public YARNRunner(Configuration conf) {
        this(conf, new ResourceMgrDelegate(new YarnConfiguration(conf)));
    }

    /**
     * Similar to {@link #YARNRunner(Configuration)} but allowing injecting
     * {@link ResourceMgrDelegate}. Enables mocking and testing.
     * 
     * @param conf the configuration object for the client
     * @param resMgrDelegate the resourcemanager client handle.
     */
    public YARNRunner(Configuration conf, ResourceMgrDelegate resMgrDelegate) {
        this(conf, resMgrDelegate, new ClientCache(conf, resMgrDelegate));
    }

    /**
     * Similar to
     * {@link YARNRunner#YARNRunner(Configuration, ResourceMgrDelegate)} but
     * allowing injecting {@link ClientCache}. Enable mocking and testing.
     * 
     * @param conf the configuration object
     * @param resMgrDelegate the resource manager delegate
     * @param clientCache the client cache object.
     */
    public YARNRunner(Configuration conf, ResourceMgrDelegate resMgrDelegate, ClientCache clientCache) {
        this.conf = conf;
        try {
            this.resMgrDelegate = resMgrDelegate;
            this.clientCache = clientCache;
            this.defaultFileContext = FileContext.getFileContext(this.conf);
        } catch (UnsupportedFileSystemException ufe) {
            throw new RuntimeException("Error in instantiating YarnClient", ufe);
        }
    }

    @Private
    /**
     * Used for testing mostly.
     * 
     * @param resMgrDelegate the resource manager delegate to set to.
     */
    public void setResourceMgrDelegate(ResourceMgrDelegate resMgrDelegate) {
        this.resMgrDelegate = resMgrDelegate;
    }

    @Override
    public void cancelDelegationToken(Token<DelegationTokenIdentifier> arg0) throws IOException, InterruptedException {
        throw new UnsupportedOperationException("Use Token.renew instead");
    }

    @Override
    public TaskTrackerInfo[] getActiveTrackers() throws IOException, InterruptedException {
        return resMgrDelegate.getActiveTrackers();
    }

    @Override
    public JobStatus[] getAllJobs() throws IOException, InterruptedException {
        return resMgrDelegate.getAllJobs();
    }

    @Override
    public TaskTrackerInfo[] getBlacklistedTrackers() throws IOException, InterruptedException {
        return resMgrDelegate.getBlacklistedTrackers();
    }

    @Override
    public ClusterMetrics getClusterMetrics() throws IOException, InterruptedException {
        return resMgrDelegate.getClusterMetrics();
    }

    @VisibleForTesting
    void addHistoryToken(Credentials ts) throws IOException, InterruptedException {
        /* check if we have a hsproxy, if not, no need */
        MRClientProtocol hsProxy = clientCache.getInitializedHSProxy();
        if (UserGroupInformation.isSecurityEnabled() && (hsProxy != null)) {
            /*
             * note that get delegation token was called. Again this is hack for
             * oozie to make sure we add history server delegation tokens to the
             * credentials
             */
            RMDelegationTokenSelector tokenSelector = new RMDelegationTokenSelector();
            Text service = resMgrDelegate.getRMDelegationTokenService();
            if (tokenSelector.selectToken(service, ts.getAllTokens()) != null) {
                Text hsService = SecurityUtil.buildTokenService(hsProxy.getConnectAddress());
                if (ts.getToken(hsService) == null) {
                    ts.addToken(hsService, getDelegationTokenFromHS(hsProxy));
                }
            }
        }
    }

    @VisibleForTesting
    Token<?> getDelegationTokenFromHS(MRClientProtocol hsProxy) throws IOException, InterruptedException {
        GetDelegationTokenRequest request = recordFactory.newRecordInstance(GetDelegationTokenRequest.class);
        request.setRenewer(Master.getMasterPrincipal(conf));
        org.apache.hadoop.yarn.api.records.Token mrDelegationToken;
        mrDelegationToken = hsProxy.getDelegationToken(request).getDelegationToken();
        return ConverterUtils.convertFromYarn(mrDelegationToken, hsProxy.getConnectAddress());
    }

    @Override
    public Token<DelegationTokenIdentifier> getDelegationToken(Text renewer) throws IOException, InterruptedException {
        // The token is only used for serialization. So the type information
        // mismatch should be fine.
        return resMgrDelegate.getDelegationToken(renewer);
    }

    @Override
    public String getFilesystemName() throws IOException, InterruptedException {
        return resMgrDelegate.getFilesystemName();
    }

    @Override
    public JobID getNewJobID() throws IOException, InterruptedException {
        return resMgrDelegate.getNewJobID();
    }

    @Override
    public QueueInfo getQueue(String queueName) throws IOException, InterruptedException {
        return resMgrDelegate.getQueue(queueName);
    }

    @Override
    public QueueAclsInfo[] getQueueAclsForCurrentUser() throws IOException, InterruptedException {
        return resMgrDelegate.getQueueAclsForCurrentUser();
    }

    @Override
    public QueueInfo[] getQueues() throws IOException, InterruptedException {
        return resMgrDelegate.getQueues();
    }

    @Override
    public QueueInfo[] getRootQueues() throws IOException, InterruptedException {
        return resMgrDelegate.getRootQueues();
    }

    @Override
    public QueueInfo[] getChildQueues(String parent) throws IOException, InterruptedException {
        return resMgrDelegate.getChildQueues(parent);
    }

    @Override
    public String getStagingAreaDir() throws IOException, InterruptedException {
        return resMgrDelegate.getStagingAreaDir();
    }

    @Override
    public String getSystemDir() throws IOException, InterruptedException {
        return resMgrDelegate.getSystemDir();
    }

    @Override
    public long getTaskTrackerExpiryInterval() throws IOException, InterruptedException {
        return resMgrDelegate.getTaskTrackerExpiryInterval();
    }

    @Override
    public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
            throws IOException, InterruptedException {

        addHistoryToken(ts);

        ApplicationSubmissionContext appContext = createApplicationSubmissionContext(conf, jobSubmitDir, ts);

        // Submit to ResourceManager
        try {
            ApplicationId applicationId = resMgrDelegate.submitApplication(appContext);

            ApplicationReport appMaster = resMgrDelegate.getApplicationReport(applicationId);
            String diagnostics = (appMaster == null ? "application report is null" : appMaster.getDiagnostics());
            if (appMaster == null || appMaster.getYarnApplicationState() == YarnApplicationState.FAILED
                    || appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
                throw new IOException("Failed to run job : " + diagnostics);
            }
            return clientCache.getClient(jobId).getJobStatus(jobId);
        } catch (YarnException e) {
            throw new IOException(e);
        }
    }

    private LocalResource createApplicationResource(FileContext fs, Path p, LocalResourceType type) throws IOException {
        return createApplicationResource(fs, p, null, type, LocalResourceVisibility.APPLICATION, false);
    }

    private LocalResource createApplicationResource(FileContext fs, Path p, String fileSymlink, LocalResourceType type,
                                                    LocalResourceVisibility viz, Boolean uploadToSharedCache)
            throws IOException {
        LocalResource rsrc = recordFactory.newRecordInstance(LocalResource.class);
        FileStatus rsrcStat = fs.getFileStatus(p);
        // We need to be careful when converting from path to URL to add a fragment
        // so that the symlink name when localized will be correct.
        Path qualifiedPath = fs.getDefaultFileSystem().resolvePath(rsrcStat.getPath());
        URI uriWithFragment = null;
        boolean useFragment = fileSymlink != null && !fileSymlink.equals("");
        try {
            if (useFragment) {
                uriWithFragment = new URI(qualifiedPath.toUri() + "#" + fileSymlink);
            } else {
                uriWithFragment = qualifiedPath.toUri();
            }
        } catch (URISyntaxException e) {
            throw new IOException("Error parsing local resource path." + " Path was not able to be converted to a URI: "
                    + qualifiedPath, e);
        }
        rsrc.setResource(URL.fromURI(uriWithFragment));
        rsrc.setSize(rsrcStat.getLen());
        rsrc.setTimestamp(rsrcStat.getModificationTime());
        rsrc.setType(type);
        rsrc.setVisibility(viz);
        rsrc.setShouldBeUploadedToSharedCache(uploadToSharedCache);
        return rsrc;
    }

    private Map<String, LocalResource> setupLocalResources(Configuration jobConf, String jobSubmitDir)
            throws IOException {
        Map<String, LocalResource> localResources = new HashMap<>();

        Path jobConfPath = new Path(jobSubmitDir, MRJobConfig.JOB_CONF_FILE);

        URL yarnUrlForJobSubmitDir = URL.fromPath(defaultFileContext.getDefaultFileSystem()
                .resolvePath(defaultFileContext.makeQualified(new Path(jobSubmitDir))));
        LOG.debug("Creating setup context, jobSubmitDir url is " + yarnUrlForJobSubmitDir);

        localResources.put(MRJobConfig.JOB_CONF_FILE,
                createApplicationResource(defaultFileContext, jobConfPath, LocalResourceType.FILE));
        if (jobConf.get(MRJobConfig.JAR) != null) {
            Path jobJarPath = new Path(jobConf.get(MRJobConfig.JAR));
            // We hard code the job.jar symlink because mapreduce code expects the
            // job.jar to be named that way.
            FileContext fccc = FileContext.getFileContext(jobJarPath.toUri(), jobConf);
            LocalResourceVisibility jobJarViz = jobConf.getBoolean(MRJobConfig.JOBJAR_VISIBILITY,
                    MRJobConfig.JOBJAR_VISIBILITY_DEFAULT) ? LocalResourceVisibility.PUBLIC
                            : LocalResourceVisibility.APPLICATION;
            LocalResource rc = createApplicationResource(FileContext.getFileContext(jobJarPath.toUri(), jobConf),
                    jobJarPath, MRJobConfig.JOB_JAR, LocalResourceType.PATTERN, jobJarViz,
                    jobConf.getBoolean(MRJobConfig.JOBJAR_SHARED_CACHE_UPLOAD_POLICY,
                            MRJobConfig.JOBJAR_SHARED_CACHE_UPLOAD_POLICY_DEFAULT));
            String pattern = conf.getPattern(JobContext.JAR_UNPACK_PATTERN, JobConf.UNPACK_JAR_PATTERN_DEFAULT)
                    .pattern();
            rc.setPattern(pattern);
            localResources.put(MRJobConfig.JOB_JAR, rc);
        } else {
            // Job jar may be null. For e.g, for pipes, the job jar is the hadoop
            // mapreduce jar itself which is already on the classpath.
            LOG.info("Job jar is not present. " + "Not adding any jar to the list of resources.");
        }

        // TODO gross hack
        for (String s : new String[] { MRJobConfig.JOB_SPLIT, MRJobConfig.JOB_SPLIT_METAINFO }) {
            localResources.put(MRJobConfig.JOB_SUBMIT_DIR + "/" + s,
                    createApplicationResource(defaultFileContext, new Path(jobSubmitDir, s), LocalResourceType.FILE));
        }

        return localResources;
    }

    private List<String> setupAMCommand(Configuration jobConf) {
        List<String> vargs = new ArrayList<>(8);
        // TODO   ----angelababy的男朋友所改-------有任何問題，請聯絡angelababy

        System.out.println(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME) + "/bin/java");
        System.out.println("$JAVA_HOME/bin/java");
        vargs.add("$JAVA_HOME/bin/java");
        vargs.add(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME) + "/bin/java");
        vargs.remove("%JAVA_HOME%/bin/java");
        Path amTmpDir = new Path(MRApps.crossPlatformifyMREnv(conf, Environment.PWD),
                YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR);
        vargs.add("-Djava.io.tmpdir=" + amTmpDir);
        MRApps.addLog4jSystemProperties(null, vargs, conf);

        // Check for Java Lib Path usage in MAP and REDUCE configs
        warnForJavaLibPath(conf.get(MRJobConfig.MAP_JAVA_OPTS, ""), "map", MRJobConfig.MAP_JAVA_OPTS,
                MRJobConfig.MAP_ENV);
        warnForJavaLibPath(conf.get(MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, ""), "map",
                MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, MRJobConfig.MAPRED_ADMIN_USER_ENV);
        warnForJavaLibPath(conf.get(MRJobConfig.REDUCE_JAVA_OPTS, ""), "reduce", MRJobConfig.REDUCE_JAVA_OPTS,
                MRJobConfig.REDUCE_ENV);
        warnForJavaLibPath(conf.get(MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, ""), "reduce",
                MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, MRJobConfig.MAPRED_ADMIN_USER_ENV);

        // Add AM admin command opts before user command opts
        // so that it can be overridden by user
        String mrAppMasterAdminOptions = conf.get(MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS,
                MRJobConfig.DEFAULT_MR_AM_ADMIN_COMMAND_OPTS);
        warnForJavaLibPath(mrAppMasterAdminOptions, "app master", MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS,
                MRJobConfig.MR_AM_ADMIN_USER_ENV);
        vargs.add(mrAppMasterAdminOptions);

        // Add AM user command opts
        String mrAppMasterUserOptions = conf.get(MRJobConfig.MR_AM_COMMAND_OPTS,
                MRJobConfig.DEFAULT_MR_AM_COMMAND_OPTS);
        warnForJavaLibPath(mrAppMasterUserOptions, "app master", MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.MR_AM_ENV);
        vargs.add(mrAppMasterUserOptions);

        if (jobConf.getBoolean(MRJobConfig.MR_AM_PROFILE, MRJobConfig.DEFAULT_MR_AM_PROFILE)) {
            final String profileParams = jobConf.get(MRJobConfig.MR_AM_PROFILE_PARAMS,
                    MRJobConfig.DEFAULT_TASK_PROFILE_PARAMS);
            if (profileParams != null) {
                vargs.add(String.format(profileParams,
                        ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + TaskLog.LogName.PROFILE));
            }
        }

        vargs.add(MRJobConfig.APPLICATION_MASTER_CLASS);
        vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDOUT);
        vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + ApplicationConstants.STDERR);
        return vargs;
    }

    private ContainerLaunchContext setupContainerLaunchContextForAM(Configuration jobConf,
                                                                    Map<String, LocalResource> localResources,
                                                                    ByteBuffer securityTokens, List<String> vargs)
            throws IOException {

        Vector<String> vargsFinal = new Vector<>(8);
        // Final command
        StringBuilder mergedCommand = new StringBuilder();
        for (CharSequence str : vargs) {
            mergedCommand.append(str).append(" ");
        }
        vargsFinal.add(mergedCommand.toString());

        LOG.debug("Command to launch container for ApplicationMaster is : " + mergedCommand);

        // Setup the CLASSPATH in environment
        // i.e. add { Hadoop jars, job jar, CWD } to classpath.
        Map<String, String> environment = new HashMap<>();
        MRApps.setClasspath(environment, conf);

        // Shell
        environment.put(Environment.SHELL.name(),
                conf.get(MRJobConfig.MAPRED_ADMIN_USER_SHELL, MRJobConfig.DEFAULT_SHELL));

        // Add the container working directory in front of LD_LIBRARY_PATH
        MRApps.addToEnvironment(environment, Environment.LD_LIBRARY_PATH.name(),
                MRApps.crossPlatformifyMREnv(conf, Environment.PWD), conf);

        // Setup the environment variables for Admin first
        MRApps.setEnvFromInputString(environment,
                conf.get(MRJobConfig.MR_AM_ADMIN_USER_ENV, MRJobConfig.DEFAULT_MR_AM_ADMIN_USER_ENV), conf);
        // Setup the environment variables (LD_LIBRARY_PATH, etc)
        MRApps.setEnvFromInputString(environment, conf.get(MRJobConfig.MR_AM_ENV), conf);

        // Parse distributed cache
        MRApps.setupDistributedCache(jobConf, localResources);

        Map<ApplicationAccessType, String> acls = new HashMap<>(2);
        acls.put(ApplicationAccessType.VIEW_APP,
                jobConf.get(MRJobConfig.JOB_ACL_VIEW_JOB, MRJobConfig.DEFAULT_JOB_ACL_VIEW_JOB));
        acls.put(ApplicationAccessType.MODIFY_APP,
                jobConf.get(MRJobConfig.JOB_ACL_MODIFY_JOB, MRJobConfig.DEFAULT_JOB_ACL_MODIFY_JOB));
        // TODO BY DHT
        for (String key : environment.keySet()) {
            String org = environment.get(key);
            String linux = getLinux(org);
            environment.put(key, linux);
        }
        return ContainerLaunchContext.newInstance(localResources, environment, vargsFinal, null, securityTokens, acls);
    }

    private String getLinux(String org) {
        StringBuilder sb = new StringBuilder();
        int c = 0;
        for (int i = 0; i < org.length(); i++) {
            if (org.charAt(i) == '%') {
                c++;
                if (c % 2 == 1) {
                    sb.append("$");
                }
            } else {
                switch (org.charAt(i)) {
                    case ';':
                        sb.append(":");
                        break;

                    case '\\':
                        sb.append("/");
                        break;
                    default:
                        sb.append(org.charAt(i));
                        break;
                }
            }
        }
        return (sb.toString());
    }

    /**
     * Constructs all the necessary information to start the MR AM.
     * 
     * @param jobConf the configuration for the MR job
     * @param jobSubmitDir the directory path for the job
     * @param ts the security credentials for the job
     * @return ApplicationSubmissionContext
     * @throws IOException on IO error (e.g. path resolution)
     */
    public ApplicationSubmissionContext createApplicationSubmissionContext(Configuration jobConf, String jobSubmitDir,
                                                                           Credentials ts)
            throws IOException {
        ApplicationId applicationId = resMgrDelegate.getApplicationId();

        // Setup LocalResources
        Map<String, LocalResource> localResources = setupLocalResources(jobConf, jobSubmitDir);

        // Setup security tokens
        DataOutputBuffer dob = new DataOutputBuffer();
        ts.writeTokenStorageToStream(dob);
        ByteBuffer securityTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength());

        // Setup ContainerLaunchContext for AM container
        List<String> vargs = setupAMCommand(jobConf);
        ContainerLaunchContext amContainer = setupContainerLaunchContextForAM(jobConf, localResources, securityTokens,
                vargs);

        String regex = conf.get(MRJobConfig.MR_JOB_SEND_TOKEN_CONF);
        if (regex != null && !regex.isEmpty()) {
            setTokenRenewerConf(amContainer, conf, regex);
        }

        Collection<String> tagsFromConf = jobConf.getTrimmedStringCollection(MRJobConfig.JOB_TAGS);

        // Set up the ApplicationSubmissionContext
        ApplicationSubmissionContext appContext = recordFactory.newRecordInstance(ApplicationSubmissionContext.class);
        appContext.setApplicationId(applicationId); // ApplicationId
        appContext.setQueue( // Queue name
                jobConf.get(JobContext.QUEUE_NAME, YarnConfiguration.DEFAULT_QUEUE_NAME));
        // add reservationID if present
        ReservationId reservationID = null;
        try {
            reservationID = ReservationId.parseReservationId(jobConf.get(JobContext.RESERVATION_ID));
        } catch (NumberFormatException e) {
            // throw exception as reservationid as is invalid
            String errMsg = "Invalid reservationId: " + jobConf.get(JobContext.RESERVATION_ID)
                    + " specified for the app: " + applicationId;
            LOG.warn(errMsg);
            throw new IOException(errMsg);
        }
        if (reservationID != null) {
            appContext.setReservationID(reservationID);
            LOG.info("SUBMITTING ApplicationSubmissionContext app:" + applicationId + " to queue:"
                    + appContext.getQueue() + " with reservationId:" + appContext.getReservationID());
        }
        appContext.setApplicationName( // Job name
                jobConf.get(JobContext.JOB_NAME, YarnConfiguration.DEFAULT_APPLICATION_NAME));
        appContext.setCancelTokensWhenComplete(conf.getBoolean(MRJobConfig.JOB_CANCEL_DELEGATION_TOKEN, true));
        appContext.setAMContainerSpec(amContainer); // AM Container
        appContext
                .setMaxAppAttempts(conf.getInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, MRJobConfig.DEFAULT_MR_AM_MAX_ATTEMPTS));

        // Setup the AM ResourceRequests
        List<ResourceRequest> amResourceRequests = generateResourceRequests();
        appContext.setAMContainerResourceRequests(amResourceRequests);

        // set labels for the AM container requests if present
        String amNodelabelExpression = conf.get(MRJobConfig.AM_NODE_LABEL_EXP);
        if (null != amNodelabelExpression && amNodelabelExpression.trim().length() != 0) {
            for (ResourceRequest amResourceRequest : amResourceRequests) {
                amResourceRequest.setNodeLabelExpression(amNodelabelExpression.trim());
            }
        }
        // set labels for the Job containers
        appContext.setNodeLabelExpression(jobConf.get(JobContext.JOB_NODE_LABEL_EXP));

        appContext.setApplicationType(MRJobConfig.MR_APPLICATION_TYPE);
        if (tagsFromConf != null && !tagsFromConf.isEmpty()) {
            appContext.setApplicationTags(new HashSet<>(tagsFromConf));
        }

        String jobPriority = jobConf.get(MRJobConfig.PRIORITY);
        if (jobPriority != null) {
            int iPriority;
            try {
                iPriority = TypeConverter.toYarnApplicationPriority(jobPriority);
            } catch (IllegalArgumentException e) {
                iPriority = Integer.parseInt(jobPriority);
            }
            appContext.setPriority(Priority.newInstance(iPriority));
        }

        return appContext;
    }

    private List<ResourceRequest> generateResourceRequests() throws IOException {
        Resource capability = recordFactory.newRecordInstance(Resource.class);
        capability.setMemorySize(conf.getInt(MRJobConfig.MR_AM_VMEM_MB, MRJobConfig.DEFAULT_MR_AM_VMEM_MB));
        capability.setVirtualCores(conf.getInt(MRJobConfig.MR_AM_CPU_VCORES, MRJobConfig.DEFAULT_MR_AM_CPU_VCORES));
        if (LOG.isDebugEnabled()) {
            LOG.debug("AppMaster capability = " + capability);
        }

        List<ResourceRequest> amResourceRequests = new ArrayList<>();
        // Always have an ANY request
        ResourceRequest amAnyResourceRequest = createAMResourceRequest(ResourceRequest.ANY, capability);
        Map<String, ResourceRequest> rackRequests = new HashMap<>();
        amResourceRequests.add(amAnyResourceRequest);
        Collection<String> amStrictResources = conf.getStringCollection(MRJobConfig.AM_STRICT_LOCALITY);
        for (String amStrictResource : amStrictResources) {
            amAnyResourceRequest.setRelaxLocality(false);
            Matcher matcher = RACK_NODE_PATTERN.matcher(amStrictResource);
            if (matcher.matches()) {
                String nodeName;
                String rackName = matcher.group(RACK_GROUP);
                if (rackName == null) {
                    rackName = "/default-rack";
                    nodeName = matcher.group(NODE_IF_NO_RACK_GROUP);
                } else {
                    nodeName = matcher.group(NODE_IF_RACK_GROUP);
                }
                ResourceRequest amRackResourceRequest = rackRequests.get(rackName);
                if (amRackResourceRequest == null) {
                    amRackResourceRequest = createAMResourceRequest(rackName, capability);
                    amResourceRequests.add(amRackResourceRequest);
                    rackRequests.put(rackName, amRackResourceRequest);
                }
                if (nodeName != null) {
                    amRackResourceRequest.setRelaxLocality(false);
                    ResourceRequest amNodeResourceRequest = createAMResourceRequest(nodeName, capability);
                    amResourceRequests.add(amNodeResourceRequest);
                }
            } else {
                String errMsg = "Invalid resource name: " + amStrictResource + " specified.";
                LOG.warn(errMsg);
                throw new IOException(errMsg);
            }
        }
        if (LOG.isDebugEnabled()) {
            for (ResourceRequest amResourceRequest : amResourceRequests) {
                LOG.debug("ResourceRequest: resource = " + amResourceRequest.getResourceName() + ", locality = "
                        + amResourceRequest.getRelaxLocality());
            }
        }
        return amResourceRequests;
    }

    private ResourceRequest createAMResourceRequest(String resource, Resource capability) {
        ResourceRequest resourceRequest = recordFactory.newRecordInstance(ResourceRequest.class);
        resourceRequest.setPriority(AM_CONTAINER_PRIORITY);
        resourceRequest.setResourceName(resource);
        resourceRequest.setCapability(capability);
        resourceRequest.setNumContainers(1);
        resourceRequest.setRelaxLocality(true);
        return resourceRequest;
    }

    private void setTokenRenewerConf(ContainerLaunchContext context, Configuration conf, String regex)
            throws IOException {
        DataOutputBuffer dob = new DataOutputBuffer();
        Configuration copy = new Configuration(false);
        copy.clear();
        int count = 0;
        for (Map.Entry<String, String> map : conf) {
            String key = map.getKey();
            String val = map.getValue();
            if (key.matches(regex)) {
                copy.set(key, val);
                count++;
            }
        }
        copy.write(dob);
        ByteBuffer appConf = ByteBuffer.wrap(dob.getData(), 0, dob.getLength());
        LOG.info("Send configurations that match regex expression: " + regex + " , total number of configs: " + count
                + ", total size : " + dob.getLength() + " bytes.");
        if (LOG.isDebugEnabled()) {
            for (Iterator<Map.Entry<String, String>> itor = copy.iterator(); itor.hasNext();) {
                Map.Entry<String, String> entry = itor.next();
                LOG.info(entry.getKey() + " ===> " + entry.getValue());
            }
        }
        context.setTokensConf(appConf);
    }

    @Override
    public void setJobPriority(JobID arg0, String arg1) throws IOException, InterruptedException {
        ApplicationId appId = TypeConverter.toYarn(arg0).getAppId();
        try {
            resMgrDelegate.updateApplicationPriority(appId, Priority.newInstance(Integer.parseInt(arg1)));
        } catch (YarnException e) {
            throw new IOException(e);
        }
    }

    @Override
    public long getProtocolVersion(String arg0, long arg1) throws IOException {
        return resMgrDelegate.getProtocolVersion(arg0, arg1);
    }

    @Override
    public long renewDelegationToken(Token<DelegationTokenIdentifier> arg0) throws IOException, InterruptedException {
        throw new UnsupportedOperationException("Use Token.renew instead");
    }

    @Override
    public Counters getJobCounters(JobID arg0) throws IOException, InterruptedException {
        return clientCache.getClient(arg0).getJobCounters(arg0);
    }

    @Override
    public String getJobHistoryDir() throws IOException, InterruptedException {
        return JobHistoryUtils.getConfiguredHistoryServerDoneDirPrefix(conf);
    }

    @Override
    public JobStatus getJobStatus(JobID jobID) throws IOException, InterruptedException {
        JobStatus status = clientCache.getClient(jobID).getJobStatus(jobID);
        return status;
    }

    @Override
    public TaskCompletionEvent[] getTaskCompletionEvents(JobID arg0, int arg1, int arg2)
            throws IOException, InterruptedException {
        return clientCache.getClient(arg0).getTaskCompletionEvents(arg0, arg1, arg2);
    }

    @Override
    public String[] getTaskDiagnostics(TaskAttemptID arg0) throws IOException, InterruptedException {
        return clientCache.getClient(arg0.getJobID()).getTaskDiagnostics(arg0);
    }

    @Override
    public TaskReport[] getTaskReports(JobID jobID, TaskType taskType) throws IOException, InterruptedException {
        return clientCache.getClient(jobID).getTaskReports(jobID, taskType);
    }

    private void killUnFinishedApplication(ApplicationId appId) throws IOException {
        ApplicationReport application = null;
        try {
            application = resMgrDelegate.getApplicationReport(appId);
        } catch (YarnException e) {
            throw new IOException(e);
        }
        if (application.getYarnApplicationState() == YarnApplicationState.FINISHED
                || application.getYarnApplicationState() == YarnApplicationState.FAILED
                || application.getYarnApplicationState() == YarnApplicationState.KILLED) {
            return;
        }
        killApplication(appId);
    }

    private void killApplication(ApplicationId appId) throws IOException {
        try {
            resMgrDelegate.killApplication(appId);
        } catch (YarnException e) {
            throw new IOException(e);
        }
    }

    private boolean isJobInTerminalState(JobStatus status) {
        return status.getState() == JobStatus.State.KILLED || status.getState() == JobStatus.State.FAILED
                || status.getState() == JobStatus.State.SUCCEEDED;
    }

    @Override
    public void killJob(JobID arg0) throws IOException, InterruptedException {
        /* check if the status is not running, if not send kill to RM */
        JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0);
        ApplicationId appId = TypeConverter.toYarn(arg0).getAppId();

        // get status from RM and return
        if (status == null) {
            killUnFinishedApplication(appId);
            return;
        }

        if (status.getState() != JobStatus.State.RUNNING) {
            killApplication(appId);
            return;
        }

        try {
            /* send a kill to the AM */
            clientCache.getClient(arg0).killJob(arg0);
            long currentTimeMillis = System.currentTimeMillis();
            long timeKillIssued = currentTimeMillis;
            long killTimeOut = conf.getLong(MRJobConfig.MR_AM_HARD_KILL_TIMEOUT_MS,
                    MRJobConfig.DEFAULT_MR_AM_HARD_KILL_TIMEOUT_MS);
            while ((currentTimeMillis < timeKillIssued + killTimeOut) && !isJobInTerminalState(status)) {
                try {
                    Thread.sleep(1000L);
                } catch (InterruptedException ie) {
                    /** interrupted, just break */
                    break;
                }
                currentTimeMillis = System.currentTimeMillis();
                status = clientCache.getClient(arg0).getJobStatus(arg0);
                if (status == null) {
                    killUnFinishedApplication(appId);
                    return;
                }
            }
        } catch (IOException io) {
            LOG.debug("Error when checking for application status", io);
        }
        if (status != null && !isJobInTerminalState(status)) {
            killApplication(appId);
        }
    }

    @Override
    public boolean killTask(TaskAttemptID arg0, boolean arg1) throws IOException, InterruptedException {
        return clientCache.getClient(arg0.getJobID()).killTask(arg0, arg1);
    }

    @Override
    public AccessControlList getQueueAdmins(String arg0) throws IOException {
        return new AccessControlList("*");
    }

    @Override
    public JobTrackerStatus getJobTrackerStatus() throws IOException, InterruptedException {
        return JobTrackerStatus.RUNNING;
    }

    @Override
    public ProtocolSignature getProtocolSignature(String protocol, long clientVersion, int clientMethodsHash)
            throws IOException {
        return ProtocolSignature.getProtocolSignature(this, protocol, clientVersion, clientMethodsHash);
    }

    @Override
    public LogParams getLogFileParams(JobID jobID, TaskAttemptID taskAttemptID) throws IOException {
        return clientCache.getClient(jobID).getLogFilePath(jobID, taskAttemptID);
    }

    private static void warnForJavaLibPath(String opts, String component, String javaConf, String envConf) {
        if (opts != null && opts.contains("-Djava.library.path")) {
            LOG.warn("Usage of -Djava.library.path in " + javaConf + " can cause "
                    + "programs to no longer function if hadoop native libraries "
                    + "are used. These values should be set as part of the " + "LD_LIBRARY_PATH in the " + component
                    + " JVM env using " + envConf + " config settings.");
        }
    }

    public void close() throws IOException {
        if (resMgrDelegate != null) {
            resMgrDelegate.close();
            resMgrDelegate = null;
        }
        if (clientCache != null) {
            clientCache.close();
            clientCache = null;
        }
    }
}

（4）提交執行結果

[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.app-submission.cross-platform
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.local-dirs
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.runtime.linux.allowed-runtimes
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.loadedjob.tasks.max
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.leveldb-state-store.path
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobtracker.system.dir
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.is.minicluster
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.zk-appid-node.split-index
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.multipart.purge.age
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.threads.max
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapred.reducer.new-api
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for io.compression.codec.bzip2.library
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jvm.system-properties-to-log
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.http.authentication.signature.secret.file
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.reduce.maxattempts
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.localizer.cache.target-size-mb
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for s3native.replication
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.minicluster.fixed.ports
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.log-aggregation-status.time-out.ms
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.opportunistic-containers-use-pause-for-preemption
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.federation.registry.base-dir
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.cleaner.interval-ms
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.amrmproxy.address
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.azure.local.sas.key.mode
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ipc.client.idlethreshold
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.address
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.state-store-class
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.container-localizer.java.opts
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3.buffer.dir
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.wasb.impl
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.remote-app-log-dir
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.caller.context.signature.max.size
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.entity-group-fs-store.summary-store
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.map.memory.mb
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.node-ip-cache.expiry-interval-secs
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.webapp.rest-csrf.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.AbstractFileSystem.ftp.impl
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.leveldb-state-store.path
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.fast.upload.active.blocks
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.s3guard.ddb.table.create
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.decommissioning-nodes-watcher.poll-interval-secs
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.max.total.tasks
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.kms.client.encrypted.key.cache.size
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.task.exit.timeout.check-interval-ms
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.http.policy
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.app.mapreduce.am.hard-kill-timeout-ms
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.leveldb-state-store.compaction-interval-secs
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.node-labels.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.jhist.format
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.speculative.slowtaskthreshold
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.delete.debug-delay-sec
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.scheduler.configuration.store.class
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3n.multipart.uploads.block.size
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.scheduler.maximum-allocation-mb
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ipc.client.fallback-to-simple-auth-allowed
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.har.impl.disable.cache
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.hostname
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.log-aggregation.compression-type
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.http.authentication.type
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.runtime.linux.docker.default-container-network
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.submithostaddress
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.zk-max-znode-size.bytes
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.vmem-check-enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.running.reduce.limit
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.reduce.shuffle.input.buffer.percent
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.task.io.sort.mb
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.amrmproxy.interceptor-class.pipeline
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.metrics.runtime.buckets
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.http-cross-origin.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.kms.client.authentication.retry-count
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ipc.maximum.data.length
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.disk-health-checker.enable
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.reduce.skip.proc-count.auto-incr
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.container.liveness-monitor.interval-ms
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.ssl.client.conf
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.client.completion.pollinterval
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.vmem-pmem-ratio
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.log-aggregation.policy.class
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.intermediate-data-encryption.enable
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.AbstractFileSystem.hdfs.impl
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.client.resolve.remote.symlinks
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.hostname
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.input.fileinputformat.split.maxsize
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.reduce.input.buffer.percent
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.sharedcache.cleaner.resource-sleep-ms
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.webapp.ui-actions.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.sharedcache.cleaner.period-mins
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.reservation-system.planfollower.time-step
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for dfs.replication
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.writer.class
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.shuffle.ssl.file.buffer.size
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.shuffle.listen.queue.size
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.task.userlog.limit.kb
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.buffer.dir
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.kms.client.encrypted.key.cache.low-watermark
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.user.group.static.mapping.overrides
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.map.output.compress
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.token.tracking.ids.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.default-container-executor.log-dirs.permissions
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.auto-update.containers
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.http.staticuser.user
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.container-monitor.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.webapp.cross-origin.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.http.cross-origin.allowed-methods
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.shuffle.port
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.shuffle.connection-keep-alive.timeout
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.nodemanager.minimum.version
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.reduce.shuffle.merge.percent
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.task.skip.start.attempts
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.node-labels.configuration-type
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.AbstractFileSystem.swebhdfs.impl
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.task.io.sort.factor
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ipc.server.max.connections
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for tfile.io.chunk.size
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3.block.size
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.opportunistic-container-allocation.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.jobhistory.principal
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.multiobjectdelete.enable
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ipc.client.low-latency
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.ha.automatic-failover.zk-base-path
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.speculative.minimum-allowed-tasks
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for seq.io.sort.factor
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for hadoop.security.group.mapping
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ftp.bytes-per-checksum
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.sharedcache.store.in-memory.check-period-mins
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.s3a.connection.timeout
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.nm-container-queuing.min-queue-length
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.disk-validator
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.linux-container-executor.resources-handler.class
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.ttl-enable
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.nodemanager.amrmproxy.enabled
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.entity-group-fs-store.done-dir
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.classloader
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.hdfs-servers
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.log-aggregation.file-formats
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.ubertask.maxreduces
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for fs.permissions.umask-mode
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.sharedcache.checksum.algo.impl
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.am.max-attempts
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for ha.failover-controller.graceful-fence.connection.retries
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.map.skip.proc-count.auto-incr
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.job.speculative.speculative-cap-running-tasks
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for seq.io.sort.mb
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.sharedcache.cleaner.initial-delay-mins
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.resourcemanager.delegation-token-renewer.thread-count
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for mapreduce.map.output.value.class
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.app.mapreduce.am.resource.cpu-vcores
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for yarn.router.webapp.address
[main] DEBUG org.apache.hadoop.conf.Configuration - Handling deprecation for io.seqfile.local.dir
[main] DEBUG org.apache.hadoop.hdfs.DFSClient - DFSClient writeChunk allocating new packet seqno=0, src=/tmp/hadoop-yarn/staging/hadoop/.staging/job_1544299229935_0012/job.xml, packetSize=65016, chunksPerPacket=126, bytesCurBlock=0
[main] DEBUG org.apache.hadoop.hdfs.DFSOutputStream - enqueue full packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false lastByteOffsetInBlock: 64512, src=/tmp/hadoop-yarn/staging/hadoop/.staging/job_1544299229935_0012/job.xml, bytesCurBlock=64512, blockSize=134217728, appendChunk=false, null@null
[main] DEBUG org.apache.hadoop.hdfs.DataStreamer - Queued packet 0
[main] DEBUG org.apache.hadoop.hdfs.DFSClient - computePacketChunkSize: src=/tmp/hadoop-yarn/staging/hadoop/.staging/job_1544299229935_0012/job.xml, chunkSize=516, chunksPerPacket=126, packetSize=65016
[Thread-13] DEBUG org.apache.hadoop.hdfs.DataStreamer - Allocating new block
[IPC Parameter Sending Thread #0] DEBUG org.apache.hadoop.ipc.Client - IPC Client (570918864) connection to centos-aaron-h1/192.168.29.144:9000 from hadoop sending #26 org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock
[main] DEBUG org.apache.hadoop.hdfs.DFSClient - DFSClient writeChunk allocating new packet seqno=1, src=/tmp/hadoop-yarn/staging/hadoop/.staging/job_1544299229935_0012/job.xml, packetSize=65016, chunksPerPacket=126, bytesCurBlock=64512
[IPC Client (570918864) connection to centos-aaron-h1/192.168.29.144:9000 from hadoop] DEBUG org.apache.hadoop.ipc.Client - IPC Client (570918864) connection to centos-aaron-h1/192.168.29.144:9000 from hadoop got value #26
[Thread-13] DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine - Call: addBlock took 2ms
[Thread-13] DEBUG org.apache.hadoop.hdfs.DataStreamer - pipeline = [DatanodeInfoWithStorage[192.168.29.146:50010,DS-85cb0c99-6ac4-4a88-a296-37176b2da45d,DISK], DatanodeInfoWithStorage[192.168.29.145:50010,DS-0e82a2ed-35d7-4b9d-a9ed-88a0743fd157,DISK]]
[Thread-13] DEBUG org.apache.hadoop.hdfs.DataStreamer - Connecting to datanode 192.168.29.146:50010
[Thread-13] DEBUG org.apache.hadoop.hdfs.DataStreamer - Send buf size 65536
[Thread-13] DEBUG org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient - SASL client skipping handshake in unsecured configuration for addr = /192.168.29.146, datanodeId = DatanodeInfoWithStorage[192.168.29.146:50010,DS-85cb0c99-6ac4-4a88-a296-37176b2da45d,DISK]
[DataStreamer for file /tmp/hadoop-yarn/staging/hadoop/.staging/job_1544299229935_0012/job.xml block BP-314684760-192.168.29.144-1543969528334:blk_1073741896_1072] DEBUG org.apache.hadoop.hdfs.DataStreamer - nodes [DatanodeInfoWithStorage[192.168.29.146:50010,DS-85cb0c99-6ac4-4a88-a296-37176b2da45d,DISK], DatanodeInfoWithStorage[192.168.29.145:50010,DS-0e82a2ed-35d7-4b9d-a9ed-88a0743fd157,DISK]] storageTypes [DISK, DISK] storageIDs [DS-85cb0c99-6ac4-4a88-a296-37176b2da45d, DS-0e82a2ed-35d7-4b9d-a9ed-88a0743fd157]
[DataStreamer for file /tmp/hadoop-yarn/staging/hadoop/.staging/job_1544299229935_0012/job.xml block BP-314684760-192.168.29.144-1543969528334:blk_1073741896_1072] DEBUG org.apache.hadoop.hdfs.DataStreamer - DataStreamer block BP-314684760-192.168.29.144-1543969528334:blk_1073741896_1072 sending packet packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false lastByteOffsetInBlock: 64512
[ResponseProcessor for block BP-314684760-192.168.29.144-1543969528334:blk_1073741896_1072] DEBUG org.apache.hadoop.hdfs.DataStreamer - DFSClient seqno: 0 reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 1730764 flag: 0 flag: 0
[main] DEBUG org.apache.hadoop.hdfs.DataStreamer - Queued packet 1
[main] DEBUG org.apache.hadoop.hdfs.DataStreamer - Queued packet 2
[main] DEBUG org.apache.hadoop.hdfs.DataStreamer - Waiting for ack for: 2
[DataStreamer for file /tmp/hadoop-yarn/staging/hadoop/.staging/job_1544299229935_0012/job.xml block BP-314684760-192.168.29.144-1543969528334:blk_1073741896_1072] DEBUG org.apache.hadoop.hdfs.DataStreamer - DataStreamer block BP-314684760-192.168.29.144-1543969528334:blk_1073741896_1072 sending packet packet seqno:

 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    大資料教程（9.4）用java -jar的方式執行mr程式
              上一篇部落格分享了mapreduce在yarn上的執行流程，本篇博主將分享 1.如何使用:jar -jar的方式執行mr程式、2.如何在本地提交mapreduce程式到叢集上去執行； 
     

  
 

    

    
    大資料教程（9.5）用MR實現sql中的jion邏輯
              上一篇部落格講解了使用jar -jar的方式來執行提交MR程式，以及通過修改YarnRunner的原始碼來實現MR的windows開發環境提交到叢集的方式。本篇博主將分享sql中常見的join操作。 
  &nbs 

  
 

    

    
    大資料教程（8.4）移動流量分析案例
       
 
         前面分享了使用mapreduce做wordcount單詞統計的實現與原理。本篇博主將繼續分享一個移動流量分析的經典案例，來幫助在實際工作中理解和使用hadoop平臺。 
     &n 

  
 

    

    
    大資料教程（7.4）HDFS的java客戶端API（流處理方式）
       
 
         博主上一篇部落格分享了namenode和datanode的工作原理，本章節將繼前面的HDFS的java客戶端簡單API後深度講述HDFS流處理API。 
       &nb 

  
 

    

    
    大資料教程（9.3）MR執行在yarn叢集流程分析&&本地模式除錯MR程式_
       
 
        mapreduce在yarn叢集中流程分析： 
  
         在windows本地環境的除錯需要先安裝好windows環境，具體請看windows安裝篇；  

  
 

    

    
    大資料教程（9.3）MR執行在yarn叢集流程分析&&本地模式除錯MR程式_
             mapreduce在yarn叢集中流程分析： 
 
        在windows本地環境的除錯需要先安裝好windows環境，具體請看windows安裝篇； 
  

  
 

    

    
    大資料教程（9.6）map端join實現
              上一篇文章講了mapreduce配合實現join，本節博主將講述在map端的join實現； 
        一、需求 
    &n 

  
 

    

    
    大資料教程（13.4）azkaban例項演示
      
                                                        開發十年，就只剩下這套架構體系了！
>>>   
                                        
              

  
 

    

    
    大資料教程（7.3）namenode管理元資料的機制&datanode工作機制介紹
       
 
                前面兩篇部落格介紹了HDFS客戶端讀寫資料流程，本篇博主將帶給小夥伴們namenode和datanode的工作機制的分享。 
   

  
 

    

    
    大資料教程（7.5）hadoop中內建rpc框架的使用教程
       
 
          博主上一篇部落格分享了hadoop客戶端java API的使用，本章節帶領小夥伴們一起來體驗下hadoop的內建rpc框架。首先，由於hadoop的內建rpc框架的設計目的是為了內部的元件提供 

  
 

    

    
    大資料教程（8.3）wordcount程式執行過程的解析
       
 
         上一篇部落格分享了wordcount的原始碼編寫、原理實現，本節將對wordcount在hadoop內部執行過程進行解析。 
         執行流程圖如下： 
 

  
 

    

    
    大資料教程（8.2）wordcount程式原理及程式碼實現/執行
       
 
         上一篇部落格分享了mapreduce的程式設計思想，本節博主將帶小夥伴們瞭解wordcount程式的原理和程式碼實現/執行細節。通過本節可以對mapreduce程式有一個大概的認識，其實hadoop中的map、reduce程 

  
 

    

    
    大資料教程（8.1）mapreduce核心思想
       
 
         上一章介紹了hadoop的HDFS檔案系統的原理及API使用。本章博主將繼續對hadoop的mapreduce程式設計框架進行分享。 
         mapreduce原理篇 
  

  
 

    

    
    大資料教程（8.5）mapreduce原理之並行度
       
 
         上一篇部落格介紹了mapreduce的移動流量分析的實戰案例，本篇將繼續分享mapreduce的並行度原理。 
     一、mapTask並行度的決定機制 
    

  
 

    

    
    大資料教程（8.6）yarn客戶端提交job的流程梳理和總結&自定義partition程式設計
       
 
         上一篇部落格博主分享了mapreduce的並行原理，本篇部落格將繼續分享yarn客戶端提交job的流程和自定義partition程式設計。 
         一、 

  
 

    

    
    大資料教程（8.7）流量彙總排序的mr實現
          上一章我們有講到一個mapreduce案例——移動流量排序，如果我們要將最後的輸出結果按總流量大小逆序輸出，該怎麼實現呢？本節博主將分享這個實現的過程。 
    一、分析 
       & 

  
 

    

    
    大資料教程（8.8）MR內部的shuffle過程詳解&combiner的執行機制及程式碼實現
              之前的文章已經簡單介紹過mapreduce的運作流程，不過其內部的shuffle過程並未深入講解；本篇部落格將分享shuffle的全過程。 
       

  
 

    

    
    大資料教程（8.8）MR內部的shuffle過程詳解&combiner的執行機制及程式碼實現
       
 
         之前的文章已經簡單介紹過mapreduce的運作流程，不過其內部的shuffle過程並未深入講解；本篇部落格將分享shuffle的全過程。 
       

  
 

    

    
    大資料教程（13.2）Flume多個agent連線
      
                                                                
                                                   上一節介紹了Flume如何將資料收集到hdfs檔案系統上 

  
 

    

    
    大資料教程（13.3）azkaban簡介&安裝
      
                                                        開發十年，就只剩下這套架構體系了！
>>>