1. 程式人生 > >Linux系統內對高CPU的監控及日誌分析

Linux系統內對高CPU的監控及日誌分析

文件 mos anti sage them 其中 generate ted cpu

使用linux系統時,占用cpu資源過高和,用腳本排查:

1,實時監控,一旦有cpu占用高的進程,程序啟動;

2,再對進程分析,得出對應線程;

3,對對應線程所在的程序日誌文檔進行分析,比如Websphere中間件就有很詳備的文件系統;

4,對於日誌文件中error,worning等詳細查看,但由於有時候日誌文件過於龐大,並且容易忽略某些細節,如果用sed和awk,結合四則表達式,可以有效的定位其中的錯誤並不放過任何細節。

此腳本同,通過一個local腳本和一個remote腳本,能準確監控,並定位日誌文件,並分析文件

highCpuAnalysis_l.sh:

###############################################################################
#The source code is created 
in 10.19.90.165 and 192.168.86.198 # This script is used to Analysis data for Performance, High CPU Issues on Linux# Usage: ./highCpuAnalysis.sh $IP $USER # Author: HuangTao # Email:[email protected]126.com # ############################################################################### ########################## # Define Variables # ########################## export USER
=$1; export IP=$2; ##Usage: if [ $# -eq 0 ] || [ $# -eq 1 ] then echo " Unable to find USER and IP." echo " Please rerun the script as follows:./highCpuAnalysis.sh USER IP" echo "eg: ./highCpuAnalysis_l.sh root 192.168.86.198 " exit 1 fi ##get the remote servers WAS application server name export wasappname=$(ssh
[email protected]$IP ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n 2p |awk {print $NF}) ##get the remote servers hostname export remotehostname=$(ssh [email protected]$IP hostname) ##get the current directory export dir=$(pwd) ############################################################################### ##Copy the script:highCpuAnalysis_r.sh to target host echo "*********************************************************************" echo "Step 1: " echo "copy the highCpuAnalysis_r.sh to the remote host, and " scp highCpuAnalysis_r.sh [email protected]$IP:/tmp/ ssh [email protected]$IP cd /tmp ssh [email protected]$IP chmod 755 /tmp/highCpuAnalysis_r.sh echo "is RUNING on $remotehostname($IP). " ############################################################################### ##run the script, make the script run on target remote host: ssh [email protected]$IP /tmp/highCpuAnalysis_r.sh echo "*************************************************************************" echo "Step 6:" echo "Copy the report and javacore to the local fenxi host:" ############################################################################### ##Copy the report and javacore to the local host then delete them: export dir=$(pwd) scp [email protected]$IP:/tmp/HighCpuReport* . scp [email protected]$IP:/tmp/javacore*.gz . tar -zxvf javacore*.gz ##Remove all related files in remate server ssh [email protected]$IP rm -f /tmp/HighCpu*Report* ssh [email protected]$IP rm -f /tmp/javacore* ssh [email protected]$IP rm -f /tmp/highCpuAnalysis_r.sh ssh [email protected]$IP rm -f /tmp/topdashH.* echo " " echo "*********************************************************************" echo "step 7:" echo "Show All information:" echo "Remote hostname: $remotehostname($IP)." echo "Remote Appserver name:$wasappname." echo "Report and javacore:" rm -f javacore*.gz ls -rlt HighCpu*Report* |tail -1 ls -rtl javacore* |tail -3 echo "*******************************END**********************************"

highCpuAnalysis_r.sh

##aaa#############################################################################
#The source code is created in 10.19.90.165 and 192.168.86.198.
# This script is used to Analysis data for Performance, High CPU Issues on Linux# Usage:    ./HighCpuAnalysis.sh 
# Author: HuangTao
# Email:[email protected]126.com
# 
###############################################################################
##########################
#  Define Variables      #
########################## 
# How long the top dash H data should be taken in once(second). 
TOP_DASH_H_VAL=30  
# How many times dash H data should be taken. 
TOP_DASH_H_VAL_T=3

# How long one javacores should be taken(second) .    
JAVACORE_VAL=60 
# How many times javacores should be taken.   
JAVACORE_VAL_T=3 


##get High CPU pid
export pid=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | head -10 | sed -n 2p |awk {print $3})

##get turn pid number to hexadecimal (from 10 to 16)
export pid16=$(echo "obase=10; $pid" | bc)

##check the pid if WAS process
export was=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n 2p |awk {print $4})

##get the WAS application name
export wasappname=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n 2p |awk {print $NF})

##get hostname
export hostname=$(hostname)

###############################################################################
##########################
# Get High CPU PID       #
########################## 
## put the report in /tmp/HighCpuReport.$pid.$hostname.out
echo "Script execude time:" $(date)  > /tmp/HighCpuReport.$pid.$hostname.out
echo "   " 
if [ $was = wasuser ]  || [ $was = wasadmin ] 
then 
echo "*********************************************************************"                                                                       
echo "Step 2:"
echo "The Highest CPU pid is :  $pid, the process is WAS porcess. "               
else
echo "The Highest CPU pid :  $pid is NOT WAS process."  | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   " 
exit 1
fi
sleep 1;
echo "*********************************************************************" 
###############################################################################
#########################
#                       #
# Start collection of:  #
#  * top dash H         #
#                       #
#########################
# Start the collection of top dash H data.
echo  "Step 3:" 
echo  "Starting collection of top dash H data ..." 
echo  "Need $[$TOP_DASH_H_VAL*TOP_DASH_H_VAL_T] seconds to complete this step:"
      top -bH -d $TOP_DASH_H_VAL -n $TOP_DASH_H_VAL_T -p $pid > /tmp/topdashH.$pid.$hostname.out 
    #eg:   top -bH -d 30 -n 3 -p 7031
    #eg:  grep -v Swap toplog.out |grep -v Task |grep -v "Cpu(s)"|grep -v "Mem:" |grep -v top| sort -k 1 -r | head -10 | sed -n 2p |awk {print $3}
#echo "Analysis the snapshot of /tmp/topdashH.$pid.$hostname.out can find out the hight CPU thread" ;
echo  "Collected The top dash H data ." 
sleep 2;
###############################################################################
###########################
#  Find out the Thread of  most CPU  
#  and TIME consumner  Top 10 .                         
###########################


##delete the /tmp/topdashH.$pid.$hostname.out   when completed the data Collection
 
################################################################################
# Start collection of:  #
#  * javacores          #
#########################
# Javacores are output to the working directory of the JVM; in most cases this is the <profile_root>
echo "*********************************************************************"    
echo  "Step 4:" 
echo  "Starting collection of Javacores ..." 
echo  "Need $[$JAVACORE_VAL*$JAVACORE_VAL_T] seconds to complete This step:"
##clear the javacore about this PID first:
rm -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid* 
##then generate the javacore
        kill -3 $pid ;
        echo "Collected the first javacore for PID $pid ."   
        sleep $JAVACORE_VAL
        
        kill -3 $pid ;
        echo "Collected the second javacore for PID $pid ." 
        sleep $JAVACORE_VAL

        kill -3 $pid ;
        echo "Collected the third javacore for PID $pid ."     
        sleep $JAVACORE_VAL    
        
##mv the javacore to the /tmp DIR and then zip:
rm -f /tmp/javacore*
mv -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid*     /tmp/
cd /tmp
tar -zcvf javacore.$(date +%Y%m%d"."%H%M%S).$pid.gz javacore*$pid*
################################################################################

echo "*********************************************************************"  
echo  "Step 5:" 
echo  "Print out the Analysis infomantion:" 
echo "   "                                                                      | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*********The most CPU consumner top 10 PROCESS :*********************"   | tee -a /tmp/HighCpuReport.$pid.$hostname.out
ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r |head -10                          | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   "
 
echo "****The most CPU consumner top 10 *Threads* from process $pid:********"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND "    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
cat /tmp/topdashH.$pid.$hostname.out|grep -v Cpu|sort -k9  -n -r  -k1 -u |head -10             | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   "                                                                      | tee -a /tmp/HighCpuReport.$pid.$hostname.out

echo "****The most TIME consumner top 10 *Threads* from process $pid:*******"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND "    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
cat /tmp/topdashH.$pid.$hostname.out | grep -v Cpu|sort -k11  -n  -r -k1 -u  |head -10         | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   "

echo "Pleae check the javacore and HighCpuReport.$pid.$hostname.out under current directory."
 

至於為什麽要用,2段腳本它的效果如何,希望本人能有機會當面和您溝通。

Linux系統內對高CPU的監控及日誌分析