《Hadoop》之"踽踽獨行"(八)Hadoop叢集的啟動指令碼整理及守護執行緒原始碼
在上一章的偽分散式叢集搭建中,我們使用start-dfs.sh指令碼啟動了叢集環境,並且上傳了一個檔案到HDFS上,還使用了mapreduce程式對HDFS上的這個檔案進行了單詞統計。今天我們就來簡單瞭解一下啟動指令碼的相關內容和HDFS的一些重要的預設配置屬性。
一、啟動指令碼
hadoop的指令碼/指令目錄,就兩個,一個是bin/,一個是sbin/。現在,就來看看幾個比較重要的指令碼/指令。
1、sbin/start-all.sh
# Start all hadoop daemons. Run this on master node. # 開啟所有的hadoop守護程序,在主節點上執行 echo "This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh" #這個指令碼已經被棄用,使用start-dfs.sh和start-yarn.sh替代 bin=`dirname "${BASH_SOURCE-$0}"` bin=`cd "$bin"; pwd` DEFAULT_LIBEXEC_DIR="$bin"/../libexec HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR} . $HADOOP_LIBEXEC_DIR/hadoop-config.sh #執行libexe/hadoop-config.sh指令,載入配置檔案 # start hdfs daemons if hdfs is present if [ -f "${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh ]; then "${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh --config $HADOOP_CONF_DIR #執行 sbin/start-dfs.sh指令 fi # start yarn daemons if yarn is present if [ -f "${HADOOP_YARN_HOME}"/sbin/start-yarn.sh ]; then "${HADOOP_YARN_HOME}"/sbin/start-yarn.sh --config $HADOOP_CONF_DIR #執行 sbin/start-yarn.sh指令 fi
我們可以看到,這個指令碼的內容不多,實際上被棄用了,只不過是在這個start-all.sh指令碼中,先執行hadoop-config.sh指令載入hadoop的一些環境變數,然後再分別執行start-dfs.sh指令碼和start-yarn.sh指令碼。
從此可以看出,我們也可以直接執行start-dfs.sh指令碼來啟動hadoop叢集,無需執行start-all.sh指令碼而已。(如果配置了yarn,再執行start-yarn.sh指令碼)。
2、libexec/hadoop-config.sh
this="${BASH_SOURCE-$0}" common_bin=$(cd -P -- "$(dirname -- "$this")" && pwd -P) script="$(basename -- "$this")" this="$common_bin/$script" [ -f "$common_bin/hadoop-layout.sh" ] && . "$common_bin/hadoop-layout.sh" HADOOP_COMMON_DIR=${HADOOP_COMMON_DIR:-"share/hadoop/common"} HADOOP_COMMON_LIB_JARS_DIR=${HADOOP_COMMON_LIB_JARS_DIR:-"share/hadoop/common/lib"} HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_COMMON_LIB_NATIVE_DIR:-"lib/native"} HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"} HDFS_LIB_JARS_DIR=${HDFS_LIB_JARS_DIR:-"share/hadoop/hdfs/lib"} YARN_DIR=${YARN_DIR:-"share/hadoop/yarn"} YARN_LIB_JARS_DIR=${YARN_LIB_JARS_DIR:-"share/hadoop/yarn/lib"} MAPRED_DIR=${MAPRED_DIR:-"share/hadoop/mapreduce"} MAPRED_LIB_JARS_DIR=${MAPRED_LIB_JARS_DIR:-"share/hadoop/mapreduce/lib"} # the root of the Hadoop installation # See HADOOP-6255 for directory structure layout HADOOP_DEFAULT_PREFIX=$(cd -P -- "$common_bin"/.. && pwd -P) HADOOP_PREFIX=${HADOOP_PREFIX:-$HADOOP_DEFAULT_PREFIX} export HADOOP_PREFIX ............................ ...........省略細節,看重點.............. .................................... # 呼叫 hadoop-env.sh載入其他環境變數 if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then . "${HADOOP_CONF_DIR}/hadoop-env.sh" fi
這個指令碼的作用,其實就是配置了一些hadoop叢集的所需要的環境變數而已,內部還執行了hadoop-env.sh指令碼,載入其他的比較重要的環境變數,如jdk等等
3、sbin/start-dfs.sh
# Start hadoop dfs daemons. #開啟HDFS的相關守護執行緒
# Optinally upgrade or rollback dfs state. #可選升級或回滾DFS狀態
# Run this on master node. #在主節點上執行這個指令碼
#這是start-dfs.sh的用法 單獨啟動一個clusterId
usage="Usage: start-dfs.sh [-upgrade|-rollback] [other options such as -clusterId]"
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
#使用hdfs-config.sh載入環境變數
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh
# get arguments
if [[ $# -ge 1 ]]; then
startOpt="$1"
shift
case "$startOpt" in
-upgrade)
nameStartOpt="$startOpt"
;;
-rollback)
dataStartOpt="$startOpt"
;;
*)
echo $usage
exit 1
;;
esac
fi
#Add other possible options
nameStartOpt="$nameStartOpt [email protected]"
#---------------------------------------------------------
# namenodes
NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)
echo "Starting namenodes on [$NAMENODES]"
#執行hadoop-daemons.sh 呼叫bin/hdfs指令 啟動namenode守護執行緒
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$NAMENODES" \
--script "$bin/hdfs" start namenode $nameStartOpt
#---------------------------------------------------------
# datanodes (using default slaves file)
if [ -n "$HADOOP_SECURE_DN_USER" ]; then
echo \
"Attempting to start secure cluster, skipping datanodes. " \
"Run start-secure-dns.sh as root to complete startup."
else
#執行hadoop-daemons.sh 呼叫bin/hdfs指令 啟動datanode守護執行緒
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--script "$bin/hdfs" start datanode $dataStartOpt
fi
#---------------------------------------------------------
# secondary namenodes (if any)
SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -secondarynamenodes 2>/dev/null)
if [ -n "$SECONDARY_NAMENODES" ]; then
echo "Starting secondary namenodes [$SECONDARY_NAMENODES]"
#執行hadoop-daemons.sh 呼叫bin/hdfs指令 啟動secondarynamenode守護執行緒
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$SECONDARY_NAMENODES" \
--script "$bin/hdfs" start secondarynamenode
fi
...................................
............省略細節.................
...................................
# eof
在start-dfs.sh指令碼中,先執行hdfs-config.sh指令碼載入環境變數,然後通過hadoop-daemons.sh指令碼又呼叫bin/hdfs指令來分別開啟namenode、datanode以及secondarynamenode等守護程序。
如此我們也能發現,其實直接執行hadoop-daemons.sh指令碼,配合其用法,也應該可以啟動HDFS等相關守護程序。
4、sbin/hadoop-daemons.sh
# 在所有的從節點上執行hadoop指令
# Run a Hadoop command on all slave hosts.
#hadoop-daemons.sh指令碼的用法,
usage="Usage: hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop] command args..."
# if no args specified, show usage
if [ $# -le 1 ]; then
echo $usage
exit 1
fi
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
#呼叫hadoop-config.sh載入環境比那裡
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh
#呼叫sbin/slaves.sh指令碼 載入配置檔案,然後使用hadoop-daemon.sh指令碼讀取配置檔案
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "[email protected]"
參考hadoop-daemons.sh的使用方法,不難發現直接使用hadoop-daemons.sh指令碼,然後配合指令,就可以啟動相關守護執行緒,如:
hadoop-daemons.sh start namenode #啟動主節點
hadoop-daemons.sh start datanode #啟動從節點
hadoop-daemons.sh start secondarynamenode #啟動第二主節點
在這個指令碼中,我們可以看到內部執行了slaves.sh指令碼讀取環境變數,然後再呼叫了hadoop-daemon.sh指令碼讀取相關配置資訊並執行了hadoop指令。
5、sbin/slaves.sh
# Run a shell command on all slave hosts.
#
# Environment Variables
#
# HADOOP_SLAVES File naming remote hosts.
# Default is ${HADOOP_CONF_DIR}/slaves.
# HADOOP_CONF_DIR Alternate conf dir. Default is ${HADOOP_PREFIX}/conf.
# HADOOP_SLAVE_SLEEP Seconds to sleep between spawning remote commands.
# HADOOP_SSH_OPTS Options passed to ssh when running remote commands.
##
# 使用方法
usage="Usage: slaves.sh [--config confdir] command..."
# if no args specified, show usage
if [ $# -le 0 ]; then
echo $usage
exit 1
fi
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh #讀取環境變數
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
. "${HADOOP_CONF_DIR}/hadoop-env.sh" #讀取環境變數
fi
# Where to start the script, see hadoop-config.sh
# (it set up the variables based on command line options)
if [ "$HADOOP_SLAVE_NAMES" != '' ] ; then
SLAVE_NAMES=$HADOOP_SLAVE_NAMES
else
SLAVE_FILE=${HADOOP_SLAVES:-${HADOOP_CONF_DIR}/slaves}
SLAVE_NAMES=$(cat "$SLAVE_FILE" | sed 's/#.*$//;/^$/d')
fi
# start the daemons
for slave in $SLAVE_NAMES ; do
ssh $HADOOP_SSH_OPTS $slave $"${@// /\\ }" \
2>&1 | sed "s/^/$slave: /" &
if [ "$HADOOP_SLAVE_SLEEP" != "" ]; then
sleep $HADOOP_SLAVE_SLEEP
fi
done
這個指令碼也就是載入環境變數,然後通過ssh連線從節點。
6、sbin/hadoop-daemon.sh
#!/usr/bin/env bash
# Runs a Hadoop command as a daemon. 以守護程序的形式執行hadoop命令
.....................
.....................、
# 使用方法 command就是hadoop指令,下面有判讀
usage="Usage: hadoop-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] [--script script] (start|stop) <hadoop-command> <args...>"
.....................
.....................
#使用hadoop-config.sh載入環境變數
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh
#使用hadoop-env.sh載入環境變數
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
. "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi
.....................
.....................
case $startStop in
(start)
[ -w "$HADOOP_PID_DIR" ] || mkdir -p "$HADOOP_PID_DIR"
if [ -f $pid ]; then
if kill -0 `cat $pid` > /dev/null 2>&1; then
echo $command running as process `cat $pid`. Stop it first.
exit 1
fi
fi
if [ "$HADOOP_MASTER" != "" ]; then
echo rsync from $HADOOP_MASTER
rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_PREFIX"
fi
hadoop_rotate_log $log
echo starting $command, logging to $log
cd "$HADOOP_PREFIX"
#判斷command是什麼指令,然後呼叫bin/hdfs指令 讀取配置檔案,執行相關指令
case $command in
namenode|secondarynamenode|datanode|journalnode|dfs|dfsadmin|fsck|balancer|zkfc)
if [ -z "$HADOOP_HDFS_HOME" ]; then
hdfsScript="$HADOOP_PREFIX"/bin/hdfs
else
hdfsScript="$HADOOP_HDFS_HOME"/bin/hdfs
fi
nohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "[email protected]" > "$log" 2>&1 < /dev/null &
;;
(*)
nohup nice -n $HADOOP_NICENESS $hadoopScript --config $HADOOP_CONF_DIR $command "[email protected]" > "$log" 2>&1 < /dev/null &
;;
esac
........................
........................
esac
在hadoop-daemon.sh指令碼中,同樣讀取了環境變數,然後依據傳入的引數[email protected](上一個指令碼中)來判斷要啟動的hadoop的守護執行緒($command),最後呼叫bin/hdfs指令 讀取配置資訊 並啟動hadoop的守護執行緒。
7、bin/hdfs
這是一個指令,而非shell指令碼。我們可以發現,在啟動hadoop叢集時,不管使用什麼指令碼,最終都指向了bin/hdfs這個指令,那麼這個指令裡到底是什麼呢,我們來看一下,就明白了。
bin=`which $0`
bin=`dirname ${bin}`
bin=`cd "$bin" > /dev/null; pwd`
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hdfs-config.sh
#除了上面繼續載入環境變化外,這個函式其實就是提示我們在使用什麼
#比如namenode -format 是格式化DFS filesystem
#再比如 namenode 說的是執行一個DFS namenode
# 我們往下看
function print_usage(){
echo "Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND"
echo " where COMMAND is one of:"
echo " dfs run a filesystem command on the file systems supported in Hadoop."
echo " classpath prints the classpath"
echo " namenode -format format the DFS filesystem"
echo " secondarynamenode run the DFS secondary namenode"
echo " namenode run the DFS namenode"
echo " journalnode run the DFS journalnode"
echo " zkfc run the ZK Failover Controller daemon"
echo " datanode run a DFS datanode"
echo " dfsadmin run a DFS admin client"
echo " haadmin run a DFS HA admin client"
echo " fsck run a DFS filesystem checking utility"
echo " balancer run a cluster balancing utility"
echo " jmxget get JMX exported values from NameNode or DataNode."
echo " mover run a utility to move block replicas across"
echo " storage types"
echo " oiv apply the offline fsimage viewer to an fsimage"
echo " oiv_legacy apply the offline fsimage viewer to an legacy fsimage"
echo " oev apply the offline edits viewer to an edits file"
echo " fetchdt fetch a delegation token from the NameNode"
echo " getconf get config values from configuration"
echo " groups get the groups which users belong to"
echo " snapshotDiff diff two snapshots of a directory or diff the"
echo " current directory contents with a snapshot"
echo " lsSnapshottableDir list all snapshottable dirs owned by the current user"
echo " Use -help to see options"
echo " portmap run a portmap service"
echo " nfs3 run an NFS version 3 gateway"
echo " cacheadmin configure the HDFS cache"
echo " crypto configure HDFS encryption zones"
echo " storagepolicies list/get/set block storage policies"
echo " version print the version"
echo ""
echo "Most commands print help when invoked w/o parameters."
# There are also debug commands, but they don't show up in this listing.
}
if [ $# = 0 ]; then
print_usage
exit
fi
COMMAND=$1
shift
case $COMMAND in
# usage flags
--help|-help|-h)
print_usage
exit
;;
esac
# Determine if we're starting a secure datanode, and if so, redefine appropriate variables
if [ "$COMMAND" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then
if [ -n "$JSVC_HOME" ]; then
if [ -n "$HADOOP_SECURE_DN_PID_DIR" ]; then
HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
fi
if [ -n "$HADOOP_SECURE_DN_LOG_DIR" ]; then
HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"
fi
HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"
starting_secure_dn="true"
else
echo "It looks like you're trying to start a secure DN, but \$JSVC_HOME"\
"isn't set. Falling back to starting insecure DN."
fi
fi
# Determine if we're starting a privileged NFS daemon, and if so, redefine appropriate variables
if [ "$COMMAND" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; then
if [ -n "$JSVC_HOME" ]; then
if [ -n "$HADOOP_PRIVILEGED_NFS_PID_DIR" ]; then
HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR
fi
if [ -n "$HADOOP_PRIVILEGED_NFS_LOG_DIR" ]; then
HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"
fi
HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER
HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"
starting_privileged_nfs="true"
else
echo "It looks like you're trying to start a privileged NFS server, but"\
"\$JSVC_HOME isn't set. Falling back to starting unprivileged NFS server."
fi
fi
# 停停停,對就是這
# 我們可以看到,通過相應的hadoop指令,在載入相應的class檔案
# 然後在jvm執行此程式。別忘記了,hadoop是用java語言開發的
if [ "$COMMAND" = "namenode" ] ; then
CLASS='org.apache.hadoop.hdfs.server.namenode.NameNode' #namenode守護執行緒對應的CLASS位元組碼
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"
elif [ "$COMMAND" = "zkfc" ] ; then
CLASS='org.apache.hadoop.hdfs.tools.DFSZKFailoverController'
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_ZKFC_OPTS"
elif [ "$COMMAND" = "secondarynamenode" ] ; then
CLASS='org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode' #SecondaryNameNode守護執行緒對應的CLASS位元組碼
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_SECONDARYNAMENODE_OPTS"
elif [ "$COMMAND" = "datanode" ] ; then
CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode' #DataNode守護執行緒對應的CLASS位元組碼
if [ "$starting_secure_dn" = "true" ]; then
HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
else
HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
fi
elif [ "$COMMAND" = "journalnode" ] ; then
CLASS='org.apache.hadoop.hdfs.qjournal.server.JournalNode'
HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOURNALNODE_OPTS"
.......................................
...............省略很多..............
.......................................
# Check to see if we should start a secure datanode
if [ "$starting_secure_dn" = "true" ]; then
if [ "$HADOOP_PID_DIR" = "" ]; then
HADOOP_SECURE_DN_PID="/tmp/hadoop_secure_dn.pid"
else
HADOOP_SECURE_DN_PID="$HADOOP_PID_DIR/hadoop_secure_dn.pid"
fi
JSVC=$JSVC_HOME/jsvc
if [ ! -f $JSVC ]; then
echo "JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run secure datanodes. "
echo "Please download and install jsvc from http://archive.apache.org/dist/commons/daemon/binaries/ "\
"and set JSVC_HOME to the directory containing the jsvc binary."
exit
fi
if [[ ! $JSVC_OUTFILE ]]; then
JSVC_OUTFILE="$HADOOP_LOG_DIR/jsvc.out"
fi
if [[ ! $JSVC_ERRFILE ]]; then
JSVC_ERRFILE="$HADOOP_LOG_DIR/jsvc.err"
fi
#執行 java位元組碼檔案
exec "$JSVC" \
-Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \
-errfile "$JSVC_ERRFILE" \
-pidfile "$HADOOP_SECURE_DN_PID" \
-nodetach \
-user "$HADOOP_SECURE_DN_USER" \
-cp "$CLASSPATH" \
$JAVA_HEAP_MAX $HADOOP_OPTS \
org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "[email protected]"
elif [ "$starting_privileged_nfs" = "true" ] ; then
if [ "$HADOOP_PID_DIR" = "" ]; then
HADOOP_PRIVILEGED_NFS_PID="/tmp/hadoop_privileged_nfs3.pid"
else
HADOOP_PRIVILEGED_NFS_PID="$HADOOP_PID_DIR/hadoop_privileged_nfs3.pid"
fi
JSVC=$JSVC_HOME/jsvc
if [ ! -f $JSVC ]; then
echo "JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run privileged NFS gateways. "
echo "Please download and install jsvc from http://archive.apache.org/dist/commons/daemon/binaries/ "\
"and set JSVC_HOME to the directory containing the jsvc binary."
exit
fi
if [[ ! $JSVC_OUTFILE ]]; then
JSVC_OUTFILE="$HADOOP_LOG_DIR/nfs3_jsvc.out"
fi
if [[ ! $JSVC_ERRFILE ]]; then
JSVC_ERRFILE="$HADOOP_LOG_DIR/nfs3_jsvc.err"
fi
#執行 java位元組碼檔案
exec "$JSVC" \
-Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \
-errfile "$JSVC_ERRFILE" \
-pidfile "$HADOOP_PRIVILEGED_NFS_PID" \
-nodetach \
-user "$HADOOP_PRIVILEGED_NFS_USER" \
-cp "$CLASSPATH" \
$JAVA_HEAP_MAX $HADOOP_OPTS \
org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter "[email protected]"
else
#執行 java位元組碼檔案
# run it
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "[email protected]"
fi
看完懂了嗎?在這個指令中,載入了各個守護執行緒對應的CLASS位元組碼檔案,然後在JVM上來執行相應的守護執行緒。
hadoop的另一個指令bin/hadoop,內部也呼叫了bin/hdfs指令,感興趣的話,可以自己看看,我就不展示出來了。至於跟yarn有關的指令碼和指令也是相同的邏輯關係,我也不一一展示了。
使用圖片重寫整理了一下啟動指令碼的執行先後順序:
使用文字再次整理一下:
#一個指令碼啟動所有執行緒
start-all.sh #執行此指令碼可以啟動所有執行緒
1. hadoop-config.sh
a. hadoop-env.sh
2. start-dfs.sh #執行此指令碼可以啟動HDFS相關執行緒
a.hadoop-config.sh
b.hadoop-daemons.sh hdfs namenode
hadoop-daemons.sh hdfs datanode
hadoop-daemons.sh hdfs secondarynamenode
3. start-yarn.sh #執行此指令碼可以啟動YARN相關執行緒#啟動單個執行緒
#方法1:
hadoop-daemons.sh --config [start|stop] command
1. hadoop-config.sh
a. hadoop-env.sh
2. slaves.sh
a. hadoop-config.sh
b. hadoop-env.sh
3. hadoop-daemon.sh --config [start|stop] command
a.hdfs $command
#方法2:
hadoop-daemon.sh --config [start|stop] command
1. hadoop-config.sh
a. hadoop-env.sh
2. hdfs $command
二、底層原始碼檢視
我們通過捋順啟動指令碼發現,啟動namenode對應的位元組碼檔案是:org.apache.hadoop.hdfs.server.namenode.NameNode。啟動datanode對應的位元組碼檔案是:org.apache.hadoop.hdfs.server.datanode.DataNode。而啟動secondarynamenode對應的位元組碼檔案是:org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode。
這些原始碼所在的har包:hadoop-hdfs-2.7.3-sources.jar
1、namenode的原始碼
package org.apache.hadoop.hdfs.server.namenode;
.......................
import org.apache.hadoop.hdfs.HdfsConfiguration;
..........................
@InterfaceAudience.Private
public class NameNode implements NameNodeStatusMXBean {
static{ //靜態塊
HdfsConfiguration.init(); //呼叫HdfsConfiguration的init方法,進行讀取配置檔案
}
...................
public static void main(String argv[]) throws Exception {
if (DFSUtil.parseHelpArgument(argv, NameNode.USAGE, System.out, true)) {
System.exit(0);
}
try {
StringUtils.startupShutdownMessage(NameNode.class, argv, LOG);
NameNode namenode = createNameNode(argv, null); //建立namenode
if (namenode != null) {
namenode.join(); //啟動namenode執行緒
}
} catch (Throwable e) {
LOG.error("Failed to start namenode.", e);
terminate(1, e);
}
}
...........
}
看一下HdfsConfiguration類
package org.apache.hadoop.hdfs;
/**
* Adds deprecated keys into the configuration.
*/
@InterfaceAudience.Private
public class HdfsConfiguration extends Configuration {
static { //靜態塊
addDeprecatedKeys();
// adds the default resources
Configuration.addDefaultResource("hdfs-default.xml"); //取預設配置檔案
Configuration.addDefaultResource("hdfs-site.xml"); //讀取個人設定檔案
}
public static void init() {}
private static void addDeprecatedKeys() {}
public static void main(String[] args) {
init();
Configuration.dumpDeprecatedKeys();
}
}
2、datanode原始碼
package org.apache.hadoop.hdfs.server.datanode;
..............
import org.apache.hadoop.hdfs.HdfsConfiguration;
..............
@InterfaceAudience.Private
public class DataNode extends ReconfigurableBase
implements InterDatanodeProtocol, ClientDatanodeProtocol,
TraceAdminProtocol, DataNodeMXBean {
public static final Log LOG = LogFactory.getLog(DataNode.class);
static{
HdfsConfiguration.init(); //同樣在靜態塊中呼叫了HdfsConfiguration類,用於載入配置檔案
}
}
---------------------------------------如有疑問,敬請留言------------------------------------------