Linux系統內對高CPU的監控及日誌分析
阿新 • • 發佈:2017-09-18
文件 mos anti sage them 其中 generate ted cpu
使用linux系統時,占用cpu資源過高和,用腳本排查:
1,實時監控,一旦有cpu占用高的進程,程序啟動;
2,再對進程分析,得出對應線程;
3,對對應線程所在的程序日誌文檔進行分析,比如Websphere中間件就有很詳備的文件系統;
4,對於日誌文件中error,worning等詳細查看,但由於有時候日誌文件過於龐大,並且容易忽略某些細節,如果用sed和awk,結合四則表達式,可以有效的定位其中的錯誤並不放過任何細節。
此腳本同,通過一個local腳本和一個remote腳本,能準確監控,並定位日誌文件,並分析文件
highCpuAnalysis_l.sh:
############################################################################### #The source code is createdin 10.19.90.165 and 192.168.86.198 # This script is used to Analysis data for Performance, High CPU Issues on Linux‘ # Usage: ./highCpuAnalysis.sh $IP $USER # Author: HuangTao # Email:[email protected]126.com # ############################################################################### ########################## # Define Variables # ########################## export USER=$1; export IP=$2; ##Usage: if [ $# -eq 0 ] || [ $# -eq 1 ] then echo " Unable to find USER and IP." echo " Please rerun the script as follows:./highCpuAnalysis.sh USER IP" echo "eg: ./highCpuAnalysis_l.sh root 192.168.86.198 " exit 1 fi ##get the remote server‘s WAS application server name export wasappname=$(ssh[email protected]$IP ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $NF}‘) ##get the remote server‘s hostname export remotehostname=$(ssh [email protected]$IP hostname) ##get the current directory export dir=$(pwd) ############################################################################### ##Copy the script:highCpuAnalysis_r.sh to target host echo "*********************************************************************" echo "Step 1: " echo "copy the highCpuAnalysis_r.sh to the remote host, and " scp highCpuAnalysis_r.sh [email protected]$IP:/tmp/ ssh [email protected]$IP cd /tmp ssh [email protected]$IP chmod 755 /tmp/highCpuAnalysis_r.sh echo "is RUNING on $remotehostname($IP). " ############################################################################### ##run the script, make the script run on target remote host: ssh [email protected]$IP /tmp/highCpuAnalysis_r.sh echo "*************************************************************************" echo "Step 6:" echo "Copy the report and javacore to the local fenxi host:" ############################################################################### ##Copy the report and javacore to the local host then delete them: export dir=$(pwd) scp [email protected]$IP:/tmp/HighCpuReport* . scp [email protected]$IP:/tmp/javacore*.gz . tar -zxvf javacore*.gz ##Remove all related files in remate server ssh [email protected]$IP rm -f /tmp/HighCpu*Report* ssh [email protected]$IP rm -f /tmp/javacore* ssh [email protected]$IP rm -f /tmp/highCpuAnalysis_r.sh ssh [email protected]$IP rm -f /tmp/topdashH.* echo " " echo "*********************************************************************" echo "step 7:" echo "Show All information:" echo "Remote hostname: $remotehostname($IP)." echo "Remote Appserver name:$wasappname." echo "Report and javacore:" rm -f javacore*.gz ls -rlt HighCpu*Report* |tail -1 ls -rtl javacore* |tail -3 echo "*******************************END**********************************"
highCpuAnalysis_r.sh
##aaa############################################################################# #The source code is created in 10.19.90.165 and 192.168.86.198. # This script is used to Analysis data for Performance, High CPU Issues on Linux‘ # Usage: ./HighCpuAnalysis.sh # Author: HuangTao # Email:[email protected]126.com # ############################################################################### ########################## # Define Variables # ########################## # How long the top dash H data should be taken in once(second). TOP_DASH_H_VAL=30 # How many times dash H data should be taken. TOP_DASH_H_VAL_T=3 # How long one javacores should be taken(second) . JAVACORE_VAL=60 # How many times javacores should be taken. JAVACORE_VAL_T=3 ##get High CPU pid export pid=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | head -10 | sed -n ‘2p‘ |awk ‘{print $3}‘) ##get turn pid number to hexadecimal (from 10 to 16) export pid16=$(echo "obase=10; $pid" | bc) ##check the pid if WAS process export was=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $4}‘) ##get the WAS application name export wasappname=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $NF}‘) ##get hostname export hostname=$(hostname) ############################################################################### ########################## # Get High CPU PID # ########################## ## put the report in /tmp/HighCpuReport.$pid.$hostname.out echo "Script execude time:" $(date) > /tmp/HighCpuReport.$pid.$hostname.out echo " " if [ $was = wasuser ] || [ $was = wasadmin ] then echo "*********************************************************************" echo "Step 2:" echo "The Highest CPU pid is : $pid, the process is WAS porcess. " else echo "The Highest CPU pid : $pid is NOT WAS process." | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " " exit 1 fi sleep 1; echo "*********************************************************************" ############################################################################### ######################### # # # Start collection of: # # * top dash H # # # ######################### # Start the collection of top dash H data. echo "Step 3:" echo "Starting collection of top dash H data ..." echo "Need $[$TOP_DASH_H_VAL*TOP_DASH_H_VAL_T] seconds to complete this step:" top -bH -d $TOP_DASH_H_VAL -n $TOP_DASH_H_VAL_T -p $pid > /tmp/topdashH.$pid.$hostname.out #eg: top -bH -d 30 -n 3 -p 7031 #eg: grep -v Swap toplog.out |grep -v Task |grep -v "Cpu(s)"|grep -v "Mem:" |grep -v top| sort -k 1 -r | head -10 | sed -n ‘2p‘ |awk ‘{print $3}‘ #echo "Analysis the snapshot of /tmp/topdashH.$pid.$hostname.out can find out the hight CPU thread" ; echo "Collected The top dash H data ." sleep 2; ############################################################################### ########################### # Find out the Thread of most CPU # and TIME consumner Top 10 . ########################### ##delete the /tmp/topdashH.$pid.$hostname.out when completed the data Collection ################################################################################ # Start collection of: # # * javacores # ######################### # Javacores are output to the working directory of the JVM; in most cases this is the <profile_root> echo "*********************************************************************" echo "Step 4:" echo "Starting collection of Javacores ..." echo "Need $[$JAVACORE_VAL*$JAVACORE_VAL_T] seconds to complete This step:" ##clear the javacore about this PID first: rm -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid* ##then generate the javacore kill -3 $pid ; echo "Collected the first javacore for PID $pid ." sleep $JAVACORE_VAL kill -3 $pid ; echo "Collected the second javacore for PID $pid ." sleep $JAVACORE_VAL kill -3 $pid ; echo "Collected the third javacore for PID $pid ." sleep $JAVACORE_VAL ##mv the javacore to the /tmp DIR and then zip: rm -f /tmp/javacore* mv -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid* /tmp/ cd /tmp tar -zcvf javacore.$(date +%Y%m%d"."%H%M%S).$pid.gz javacore*$pid* ################################################################################ echo "*********************************************************************" echo "Step 5:" echo "Print out the Analysis infomantion:" echo " " | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo "*********The most CPU consumner top 10 PROCESS :*********************" | tee -a /tmp/HighCpuReport.$pid.$hostname.out ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r |head -10 | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*" | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " " echo "****The most CPU consumner top 10 *Threads* from process $pid:********" | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND " | tee -a /tmp/HighCpuReport.$pid.$hostname.out cat /tmp/topdashH.$pid.$hostname.out|grep -v Cpu|sort -k9 -n -r -k1 -u |head -10 | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*" | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " " | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo "****The most TIME consumner top 10 *Threads* from process $pid:*******" | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND " | tee -a /tmp/HighCpuReport.$pid.$hostname.out cat /tmp/topdashH.$pid.$hostname.out | grep -v Cpu|sort -k11 -n -r -k1 -u |head -10 | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*" | tee -a /tmp/HighCpuReport.$pid.$hostname.out echo " " echo "Pleae check the javacore and HighCpuReport.$pid.$hostname.out under current directory."
至於為什麽要用,2段腳本它的效果如何,希望本人能有機會當面和您溝通。
Linux系統內對高CPU的監控及日誌分析