1. 程式人生 > >Hadoop Ls命令添加顯示條數限制參數

Hadoop Ls命令添加顯示條數限制參數

實現 處理 objects none ada indicate isp 基本 tex

前言

在hadoop的FsShell命令中,預計非常多人比較經常使用的就是hadoop fs -ls,-lsr,-cat等等這種與Linux系統中差點兒一致的文件系統相關的命令.可是細致想想,這裏還是有一些些的不同的.首先,從規模的本身來看,單機版的文件系統,文件數目少,內容不多,而HDFS則是一個分布式系統,裏面能容納巨大數量的文件文件夾.因此在這個前提之下,你假設任意運行ls或lsr命令,有的時候會得到恐怖的數據條數的顯示記錄,有的時候我們不得不通過Ctrl+C的方式中止命令.所以對於未知文件夾的命令運行,能否夠在ls命令中添加顯示限制的參數呢,這樣能夠控制一下文件記錄信息的數量.這就是本文的一個出發點.


Ls命令工作流程

要想加入參數,就要先理解眼下Ls命令工作的原理和過程.以下我從源碼的層面進行簡單的分析.首先這裏有個結構關系:

Ls-->FsCommand-->Command

從左到右依次為孩子到父親.所以Command類是最基礎的類,命令行操作的運行入口就在這裏.進入到Command.java方法中,你會看到有以下這種方法:

/**
   * Invokes the command handler.  The default behavior is to process options,
   * expand arguments, and then process each argument.
   * <pre>
   * run
   * |-> [email protected]
/* */ #processOptions(LinkedList)} * \-> [email protected] #processRawArguments(LinkedList)} * |-> [email protected] #expandArguments(LinkedList)} * | \-> [email protected] #expandArgument(String)}* * \-> [email protected] #processArguments(LinkedList)} * |-> [email protected]
/* */ #processArgument(PathData)}* * | |-> [email protected] #processPathArgument(PathData)} * | \-> [email protected] #processPaths(PathData, PathData...)} * | \-> [email protected] #processPath(PathData)}* * \-> [email protected] #processNonexistentPath(PathData)} * </pre> * Most commands will chose to implement just * [email protected] #processOptions(LinkedList)} and [email protected] #processPath(PathData)} * * @param argv the list of command line arguments * @return the exit code for the command * @throws IllegalArgumentException if called with invalid arguments */ public int run(String...argv) { LinkedList<String> args = new LinkedList<String>(Arrays.asList(argv)); try { if (isDeprecated()) { displayWarning( "DEPRECATED: Please use ‘"+ getReplacementCommand() + "‘ instead."); } processOptions(args); processRawArguments(args); } catch (IOException e) { displayError(e); } return (numErrors == 0) ? exitCode : exitCodeForError(); }

首先會進行參數的預處理,在這裏會把參數中的一些參數給剝離出來,由於這是一個抽象方法,所以終於的實現類在Ls.java中,代碼例如以下:

  @Override
  protected void processOptions(LinkedList<String> args)
  throws IOException {
    CommandFormat cf = new CommandFormat(0, Integer.MAX_VALUE, "d", "h", "R");
    cf.parse(args);
    dirRecurse = !cf.getOpt("d");
    setRecursive(cf.getOpt("R") && dirRecurse);
    humanReadable = cf.getOpt("h");
    if (args.isEmpty()) args.add(Path.CUR_DIR);
  }
把這些參數逐一取出,然後這些參數會從args列表中被移除,最後就會剩下詳細的目標瀏覽文件或文件夾的參數.以下就會進入到這種方法中:

  /**
   * Allows commands that don‘t use paths to handle the raw arguments.
   * Default behavior is to expand the arguments via
   * [email protected] #expandArguments(LinkedList)} and pass the resulting list to
   * [email protected] #processArguments(LinkedList)} 
   * @param args the list of argument strings
   * @throws IOException
   */
  protected void processRawArguments(LinkedList<String> args)
  throws IOException {
    processArguments(expandArguments(args));
  }
然後在expandArguments中會做一層從文件字符串到PathData詳細對象的轉化

 /**
   *  Expands a list of arguments into [email protected] PathData} objects.  The default
   *  behavior is to call [email protected] #expandArgument(String)} on each element
   *  which by default globs the argument.  The loop catches IOExceptions,
   *  increments the error count, and displays the exception.
   * @param args strings to expand into [email protected] PathData} objects
   * @return list of all [email protected] PathData} objects the arguments
   * @throws IOException if anything goes wrong...
   */
  protected LinkedList<PathData> expandArguments(LinkedList<String> args)
  throws IOException {
    LinkedList<PathData> expandedArgs = new LinkedList<PathData>();
    for (String arg : args) {
      try {
        expandedArgs.addAll(expandArgument(arg));
      } catch (IOException e) { // other exceptions are probably nasty
        displayError(e);
      }
    }
    return expandedArgs;
  }
  /**
   * Expand the given argument into a list of [email protected] PathData} objects.
   * The default behavior is to expand globs.  Commands may override to
   * perform other expansions on an argument.
   * @param arg string pattern to expand
   * @return list of [email protected] PathData} objects
   * @throws IOException if anything goes wrong...
   */
  protected List<PathData> expandArgument(String arg) throws IOException {
    PathData[] items = PathData.expandAsGlob(arg, getConf());
    if (items.length == 0) {
      // it‘s a glob that failed to match
      throw new PathNotFoundException(arg);
    }
    return Arrays.asList(items);
  }
最後以最後的PathData列表的信息來到終於的processArgument方法

/**
   *  Processes the command‘s list of expanded arguments.
   *  [email protected] #processArgument(PathData)} will be invoked with each item
   *  in the list.  The loop catches IOExceptions, increments the error
   *  count, and displays the exception.
   *  @param args a list of [email protected] PathData} to process
   *  @throws IOException if anything goes wrong... 
   */
  protected void processArguments(LinkedList<PathData> args)
  throws IOException {
    for (PathData arg : args) {
      try {
        processArgument(arg);
      } catch (IOException e) {
        displayError(e);
      }
    }
  }
然後對每一個pathData信息運行處理操作

  /**
   * Processes a [email protected] PathData} item, calling
   * [email protected] #processPathArgument(PathData)} or
   * [email protected] #processNonexistentPath(PathData)} on each item.
   * @param item [email protected] PathData} item to process
   * @throws IOException if anything goes wrong...
   */
  protected void processArgument(PathData item) throws IOException {
    if (item.exists) {
      processPathArgument(item);
    } else {
      processNonexistentPath(item);
    }
  }
然後運行Ls.java中的processPathArgument方法

  @Override
  protected void processPathArgument(PathData item) throws IOException {
    // implicitly recurse once for cmdline directories
    if (dirRecurse && item.stat.isDirectory()) {
      recursePath(item);
    } else {
      super.processPathArgument(item);
    }
  }
在這裏會進程是否為文件夾的推斷,假設是文件夾則會進行遞歸推斷一次,進行子文件夾文件的展示.我們直接看是單文件的處理,基礎方法在Comman.java中定義.

  /**
   *  This is the last chance to modify an argument before going into the
   *  (possibly) recursive [email protected] #processPaths(PathData, PathData...)}
   *  -> [email protected] #processPath(PathData)} loop.  Ex.  ls and du use this to
   *  expand out directories.
   *  @param item a [email protected] PathData} representing a path which exists
   *  @throws IOException if anything goes wrong... 
   */
  protected void processPathArgument(PathData item) throws IOException {
    // null indicates that the call is not via recursion, ie. there is
    // no parent directory that was expanded
    depth = 0;
    processPaths(null, item);
  }
然後processPaths又是在子類中詳細實現

  @Override
  protected void processPaths(PathData parent, PathData ... items)
  throws IOException {
    if (parent != null && !isRecursive() && items.length != 0) {
      out.println("Found " + items.length + " items");
    }
    adjustColumnWidths(items);
    super.processPaths(parent, items);
  }
然後再次進行一個相似這種來回,運行processPaths方法

  /**
   *  Iterates over the given expanded paths and invokes
   *  [email protected] #processPath(PathData)} on each element.  If "recursive" is true,
   *  will do a post-visit DFS on directories.
   *  @param parent if called via a recurse, will be the parent dir, else null
   *  @param items a list of [email protected] PathData} objects to process
   *  @throws IOException if anything goes wrong...
   */
  protected void processPaths(PathData parent, PathData ... items)
  throws IOException {
    // TODO: this really should be iterative
    for (PathData item : items) {
      try {
        processPath(item);
        if (recursive && isPathRecursable(item)) {
          recursePath(item);
        }
        postProcessPath(item);
      } catch (IOException e) {
        displayError(e);
      }
    }
  }
最後展示的操作就是在這種方法中進行的

@Override
  protected void processPath(PathData item) throws IOException {
    FileStatus stat = item.stat;
    String line = String.format(lineFormat,
        (stat.isDirectory() ? "d" : "-"),
        stat.getPermission() + (stat.getPermission().getAclBit() ?

"+" : " "), (stat.isFile() ?

stat.getReplication() : "-"), stat.getOwner(), stat.getGroup(), formatSize(stat.getLen()), dateFormat.format(new Date(stat.getModificationTime())), item ); out.println(line); }

到這裏整個ls調用的流程就基本結束了,預計有些讀者要被這來回的方法繞暈了,只是沒有關系,我們主要知道終於控制文件顯示的方法在哪裏,稍稍改改就能夠達到我們的目的.


Ls限制顯示參數的加入

如今我來教大家怎樣新增ls命令參數.首先定義參數說明

public static final String NAME = "ls";
   public static final String USAGE = "[-d] [-h] [-R] [-l] [<path> ...]";
   public static final String DESCRIPTION =
 		    "List the contents that match the specified file pattern. If " +
 		    "path is not specified, the contents of /user/<currentUser> " +
@@ -53,7 +55,9 @@ public static void registerCommands(CommandFactory factory) {
 		    "-d:  Directories are listed as plain files.\n" +
 		    "-h:  Formats the sizes of files in a human-readable fashion " +
 		    "rather than a number of bytes.\n" +=
		    "-R:  Recursively list the contents of directories.\n" +
		    "-l:  The limited number of files records‘s info which would be " +
		    "displayed, the max value is 1024.\n";

定義相關變量

 
   protected int maxRepl = 3, maxLen = 10, maxOwner = 0, maxGroup = 0;
   protected int limitedDisplayedNum = 1024;
   protected int displayedRecordNum = 0;
   protected String lineFormat;
   protected boolean dirRecurse;
 
   protected boolean limitedDisplay = false;
   protected boolean humanReadable = false;
默認最大顯示數目1024個.然後在參數解析的方法中進行新增參數的解析

   @Override
   protected void processOptions(LinkedList<String> args)
   throws IOException {
     CommandFormat cf = new CommandFormat(0, Integer.MAX_VALUE, "d", "h", "R", "l");
     cf.parse(args);
     dirRecurse = !cf.getOpt("d");
     setRecursive(cf.getOpt("R") && dirRecurse);
     humanReadable = cf.getOpt("h");
     limitedDisplay = cf.getOpt("l");
     if (args.isEmpty()) args.add(Path.CUR_DIR);
   }
然後是最核心的修改,processPaths方法

protected void processPaths(PathData parent, PathData ... items)
     if (parent != null && !isRecursive() && items.length != 0) {
       out.println("Found " + items.length   " items");
     }

     PathData[] newItems;
     if (limitedDisplay) {
       int length = items.length;
        if (length > limitedDisplayedNum) {
          length = limitedDisplayedNum;
          out.println("Found " + items.length + " items"
              + ", more than the limited displayed num " + limitedDisplayedNum);
        }
        newItems = new PathData[length];
  
        for (int i = 0; i < length; i++) {
          newItems[i] = items[i];
        }
        items = null;
      } else {
        newItems = items;
      }
  
      adjustColumnWidths(newItems);
      super.processPaths(parent, newItems);
   }

邏輯不難. 以下是測試的一個樣例,我在測試的jar包中設置了默認限制數目1個,然後用ls命令分別測試帶參數與不帶參數的情況,測試截圖例如以下:

技術分享

此部分代碼已經提交至開源社區,編號HADOOP-12641.鏈接在文章尾部列出.


相關鏈接

Issue鏈接:https://issues.apache.org/jira/browse/HADOOP-12641

github patch鏈接:https://github.com/linyiqun/open-source-patch/blob/master/hadoop/HADOOP-12641/HADOOP-12641.001.patch



Hadoop Ls命令添加顯示條數限制參數