jstack是java虛擬機器自帶的一種執行緒堆疊跟蹤工具。

/opt/java8/bin/jstack

Usage:
jstack [-l] <pid>
(to connect to running process) #連線活動執行緒
jstack -F [-m] [-l] <pid>
(to connect to a hung process) #連線阻塞執行緒
jstack [-m] [-l] <executable> <core>
(to connect to a core file) #連線dump的檔案
jstack [-m] [-l] [server_id@]<remote server IP or hostname>
(to connect to a remote debug server) #連線遠端伺服器 Options:
-F to force a thread dump. Use when jstack <pid> does not respond (process is hung)
-m to print both java and native frames (mixed mode)
-l long listing. Prints additional information about locks
-h or -help to print this help message
$ jstack 5611
2021-06-06 22:05:43
Full thread dump Java HotSpot(TM) Client VM (25.271-b09 mixed mode): "RemoteInvocationHandler [#2]" #63304 daemon prio=5 os_prio=0 tid=0xe3f4bc00 nid=0x240 in Object.wait() [0xe3b3e000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0xea7480c8> (a java.lang.ref.ReferenceQueue$Lock)
at hudson.remoting.RemoteInvocationHandler$Unexporter.run(RemoteInvocationHandler.java:603)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at hudson.remoting.AtmostOneThreadExecutor$Worker.run(AtmostOneThreadExecutor.java:118)
at java.lang.Thread.run(Thread.java:748) "Attach Listener" #63199 daemon prio=9 os_prio=0 tid=0xe2a52800 nid=0x7938 waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE "Ping thread for channel hudson.remoting.Channel@4afd8d:channel" #24 daemon prio=5 os_prio=0 tid=0xe4637000 nid=0x1617 waiting on condition [0xe34b9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at hudson.remoting.PingThread.run(PingThread.java:95) "Channel reader thread: channel" #9 prio=5 os_prio=0 tid=0xe3f50800 nid=0x15fa runnable [0xe3aed000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:255)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
- locked <0xea4b13b0> (a java.io.BufferedInputStream)
at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:92)
at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) "Service Thread" #6 daemon prio=9 os_prio=0 tid=0xf7085400 nid=0x15f2 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE "C1 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0xf7082000 nid=0x15f1 waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE "Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0xf7080800 nid=0x15f0 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE "Finalizer" #3 daemon prio=8 os_prio=0 tid=0xf7066400 nid=0x15ef in Object.wait() [0xe45ad000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:144)
- locked <0xea486e40> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:165)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:216) "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0xf7063800 nid=0x15ee in Object.wait() [0xe45fe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0xea486fe0> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153) "main" #1 prio=5 os_prio=0 tid=0xf7007400 nid=0x15ec in Object.wait() [0xf717d000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at hudson.remoting.Channel.join(Channel.java:1187)
- locked <0xea747e30> (a hudson.remoting.Channel)
at hudson.remoting.Launcher.main(Launcher.java:796)
at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:718)
at hudson.remoting.Launcher.run(Launcher.java:398)
at hudson.remoting.Launcher.main(Launcher.java:298) "VM Thread" os_prio=0 tid=0xf705e000 nid=0x15ed runnable "VM Periodic Task Thread" os_prio=0 tid=0xf7087800 nid=0x15f3 waiting on condition JNI global references: 268

執行緒狀態圖

jstack統計執行緒數:

jstack 5611 | grep 'java.lang.Thread.State' | wc -l

舉例說明CPU高獲取其執行緒ID然後分析

以我們最近出現的一個實際故障為例,介紹怎麼定位和解決這類問題。



根據top命令,發現PID為28555的Java程序佔用CPU高達200%,出現故障。

通過ps aux | grep PID命令,可以進一步確定是tomcat程序出現了問題。但是,怎麼定位到具體執行緒或者程式碼呢?

首先顯示執行緒列表:

ps -mp pid -o THREAD,tid,time

找到了耗時最高的執行緒28802,佔用CPU時間快兩個小時了!

其次將需要的執行緒ID轉換為16進位制格式:

printf "%x\n" tid



最後列印執行緒的堆疊資訊:

jstack pid |grep tid -A 30



找到出現問題的程式碼了!

現在來分析下具體的程式碼:ShortSocketIO.readBytes(ShortSocketIO.java:106)

ShortSocketIO是應用封裝的一個用短連線Socket通訊的工具類。readBytes函式的程式碼如下:

public byte[] readBytes(int length) throws IOException {
if ((this.socket == null) || (!this.socket.isConnected())) {
throw new IOException("++++ attempting to read from closed socket");
}
byte[] result = null;
ByteArrayOutputStream bos = new ByteArrayOutputStream();
if (this.recIndex >= length) {
bos.write(this.recBuf, 0, length);
byte[] newBuf = new byte[this.recBufSize];
if (this.recIndex > length) {
System.arraycopy(this.recBuf, length, newBuf, 0, this.recIndex - length);
}
this.recBuf = newBuf;
this.recIndex -= length;
} else {
int totalread = length;
if (this.recIndex > 0) {
totalread -= this.recIndex;
bos.write(this.recBuf, 0, this.recIndex);
this.recBuf = new byte[this.recBufSize];
this.recIndex = 0;
}
int readCount = 0;
** while (totalread > 0) {
if ((readCount = this.in.read(this.recBuf)) > 0) **{
if (totalread > readCount) {
bos.write(this.recBuf, 0, readCount);
this.recBuf = new byte[this.recBufSize];
this.recIndex = 0;
} else {
bos.write(this.recBuf, 0, totalread);
byte[] newBuf = new byte[this.recBufSize];
System.arraycopy(this.recBuf, totalread, newBuf, 0, readCount - totalread);
this.recBuf = newBuf;
this.recIndex = (readCount - totalread);
}
totalread -= readCount;
}
}
}

問題就出在加粗的程式碼部分。如果this.in.read()返回的資料小於等於0時,迴圈就一直進行下去了。而這種情況在網路擁塞的時候是可能發生的。

至於具體怎麼修改就看業務邏輯應該怎麼對待這種特殊情況了。

最後,總結下排查CPU故障的方法和技巧有哪些:

1、top命令:Linux命令。可以檢視實時的CPU使用情況。也可以檢視最近一段時間的CPU使用情況。

2、PS命令:Linux命令。強大的程序狀態監控命令。可以檢視程序以及程序中執行緒的當前CPU使用情況。屬於當前狀態的取樣資料。

3、jstack:Java提供的命令。可以檢視某個程序的當前執行緒棧執行情況。根據這個命令的輸出可以定位某個程序的所有執行緒的當前執行狀態、執行程式碼,以及是否死鎖等等。

4、pstack:Linux命令。可以檢視某個程序的當前執行緒棧執行情況。

Java 系統性能分析 命令

  1. cpu分析

    top , pidstat(sysstat)

    pid -p PID -t 1 10

    vmstat 1 CPU上下文切換、執行佇列、利用率

    ps Hh -eo tid

    pcpu 檢視具體執行緒的CPU消耗

    sar 來檢視一定世界範圍內以及歷史的cpu消耗情況資訊

檢視java執行緒資訊

jstack pid | grep 'nid=0x9999'

  1. cs sy消耗比較高

    上下文切換效能偏高, jstack -l pid, 檢視on object monitor

  2. io消耗

    pidstat -d -t -p pid 1 100

    iostat

  3. 網路io消耗

    cat /proc/interruptes

    sar -n FULL 1 2