1. 程式人生 > >Java執行狀態分析3: 執行緒堆疊資訊分析

Java執行狀態分析3: 執行緒堆疊資訊分析

背景

當CPU飆升的時候,我們需要知道CPU此時在幹嘛,具體什麼程序、什麼執行緒讓CPU飆升

執行緒是作業系統能夠進行運算排程的最小單位,跟蹤佔用CPU高的執行緒 可以瞭解執行緒的執行情況,定位可能或者問題程式碼

堆疊結構

Thread Summary

The first line of each thread represents the thread summary, which contains the following items:

SECTION EXAMPLE DESCRIPTION
Name "Reference Handler" Human-readable name of the thread. This name can be set by calling the setNamemethod on a Threadobject and be obtained by calling getName on the object.
ID #2 A unique ID associated with each Thread object. This number is generated, starting at 1, for all threads in the system. Each time a Thread
object is created, the sequence number is incremented and then assigned to the newly created Thread. This ID is read-only and can be obtained by calling getId on a Thread object.
Daemon status daemon A tag denoting if the thread is a daemon thread. If the thread is a daemon, this tag will be present; if the thread is a non-daemon thread, no tag will be present. For example, Thread-0
is not a daemon thread and therefore has no associated daemon tag in its summary: Thread-0" #12 prio=5....
Priority prio=10 The numeric priority of the Java thread. Note that this does not necessarily correspond to the priority of the OS thread to with the Java thread is dispatched. The priority of a Thread object can be set using the setPriority method and obtained using the getPriority method.
OS Thread Priority os_prio=2 The OS thread priority. This priority can differ from the Java thread priority and corresponds to the OS thread on which the Java thread is dispatched.
Address tid=0x00000250e4979000 The address of the Java thread. This address represents the pointer address of the Java Native Interface (JNI) native Thread object (the C++ Thread object that backs the Java thread through the JNI). This value is obtained by converting the pointer to this (of the C++ object that backs the Java Thread object) to an integer on line 879 of hotspot/share/runtime/thread.cpp: st->print("tid=" INTPTR_FORMAT " ", p2i(this));Although the key for this item (tid) may appear to be the thread ID, it is actually the address of the underlying JNI C++ Thread object and thus is not the ID returned when calling getId on a Java Thread object.
OS Thread ID nid=0x3c28 The unique ID of the OS thread to which the Java Thread is mapped. This value is printed on line 42 of hotspot/share/runtime/osThread.cpp: st->print("nid=0x%x ", thread_id());
Status waiting on condition A human-readable string depicting the current status of the thread. This string provides supplementary information beyond the basic thread state (see below) and can be useful in discovering the intended actions of a thread (i.e. was the thread trying to acquire a lock or waiting on a condition when it blocked).
Last Known Java Stack Pointer [0x000000b82a9ff000] The last known Stack Pointer (SP) for the stack associated with the thread. This value is supplied using native C++ code and is interlaced with the Java Thread class using the JNI. This value is obtained using the last_Java_sp() native method and is formatted into the thread dump on line 2886 of hotspot/share/runtime/thread.cpp: st->print_cr("[" INTPTR_FORMAT "]", (intptr_t)last_Java_sp() & ~right_n_bits(12));For simple thread dumps, this information may not be useful, but for more complex diagnostics, this SP value can be used to trace lock acquisition through a program.

Thread State

The second line represents the current state of the thread. The possible states for a thread are captured in the Thread.State enumeration:

  • NEW

  • RUNNABLE

  • BLOCKED

  • WAITING

  • TIMED_WAITING

  • TERMINATED

以上內容來源:https://dzone.com/articles/how-to-read-a-thread-dump

自帶分析工具

定位高cpu程序、執行緒參考上文 Java執行狀態分析2:執行緒狀態及堆疊資訊

ps -mp {pid} -o THREAD,tid,time|sort -k 3 -n
root      0.4  19    - futex_    -      -  8063 00:12:37
root      0.4  19    - futex_    -      -  8072 00:11:25
root      0.4  19    - futex_    -      -  8074 00:11:28
root      0.4  19    - futex_    -      -  8083 00:11:50
root      0.4  19    - futex_    -      -  8086 00:10:44
root      0.4  19    - futex_    -      -  8087 00:10:38
root      0.4  19    - futex_    -      -  8091 00:11:04
root      0.4  19    - futex_    -      -  8093 00:11:35
root      0.4  19    - futex_    -      -  8094 00:11:57
root      0.4  19    - futex_    -      -  8095 00:11:01
root      0.4  19    - futex_    -      -  8096 00:10:17
root      0.4  19    - futex_    -      -  8099 00:12:08
root      0.4  19    - futex_    -      -  8100 00:12:11
root      0.4  19    - futex_    -      -  8101 00:12:26
root      0.5  19    - ep_pol    -      -  7071 00:12:42
root      0.5  19    - ep_pol    -      -  7073 00:12:53
root      0.5  19    - ep_pol    -      -  7078 00:14:19
root      0.5  19    - futex_    -      -  8000 00:12:42
root      0.5  19    - futex_    -      -  8021 00:13:38
root      0.5  19    - futex_    -      -  8032 00:14:43
root      0.5  19    - futex_    -      -  8098 00:13:17
root      0.5  19    - futex_    -      -  8103 00:12:54
root      0.8  19    - ep_pol    -      -  7069 00:22:14
root      0.9  19    - ep_pol    -      -  7149 00:24:44
root      1.3  19    - futex_    -      -  7100 00:35:06
root      2.2  19    - futex_    -      - 20595 00:01:03

找到佔用CPU高的執行緒id( 執行緒多,正序sort方便查佔用CPU最大的執行緒)

將tid轉16進位制

printf %x {tid}

printf %x 18849 49a1

查對應高load執行緒堆疊資訊

jstack {程序id}|grep -A 30 {執行緒id}
"AsyncAppender-Worker-ASYNC-IS-FILE" #9635 daemon prio=5 os_prio=0 tid=0x00007fcee0f55800 nid=0x36c9 waiting on condition [0x00007fd09f9fc000]
   java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x0000000576dd0688> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  at java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:403)
  at ch.qos.logback.core.AsyncAppenderBase$Worker.run(AsyncAppenderBase.java:264)

看了下 是logback日誌列印佔用,跟蹤了下日誌列印,這塊日誌列印的量相對較多,正常穩定高峰期間每秒大概產生10000條日誌(7個節點),是其他服務的4倍左右

"RxIoScheduler-172" #9646 daemon prio=5 os_prio=0 tid=0x00007fcd98b5c000 nid=0x36dc waiting on condition [0x00007fcd2fe93000]
   java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x0000000588b75208> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
"RxIoScheduler-174" #9864 daemon prio=5 os_prio=0 tid=0x00007fcd99158000 nid=0x40a9 waiting on condition [0x00007fce0f1f0000]
   java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x00000006cb53ddf0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
"hystrix-commonability-core-297" #9960 daemon prio=5 os_prio=0 tid=0x00007fd07c016000 nid=0x41f5 waiting on condition [0x00007fcd2a940000]
   java.lang.Thread.State: TIMED_WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x0000000243e8fff8> (a java.util.concurrent.SynchronousQueue$TransferStack)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
  at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
  at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
"RxIoScheduler-175" #10015 daemon prio=5 os_prio=0 tid=0x00007fcd991b5800 nid=0x45e8 waiting on condition [0x00007fcd55cdb000]
   java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x0000000732e899a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081)
  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
​
"http-nio-18002-exec-123" #2505 daemon prio=5 os_prio=0 tid=0x00007fce90042800 nid=0xafa waiting on condition [0x00007fcd2932a000]
--
"RxComputationScheduler-13" #277 daemon prio=5 os_prio=0 tid=0x00007fcec4214000 nid=0x128 runnable [0x00007fce0d9d8000]
   java.lang.Thread.State: TIMED_WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x000000023c4df068> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
​
"RxComputationScheduler-30" #350 daemon prio=5 os_prio=0 tid=0x00007fcf34093000 nid=0x171 runnable [0x00007fcd54bcc000]
   java.lang.Thread.State: TIMED_WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  <0x000000023cd13530> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
  at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
  at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)

hystrix rxjava CPU較高,這塊後續分析

 

線上分析工具

推薦使用:

https://fastthread.io

可本地安裝,區分收費免費,簡單除錯跟蹤可以直接線上使用

上傳dump檔案,進入分析介面,介面比較直觀的統計了執行緒情況,通過執行緒狀態做了分組

執行緒概覽

主要需要關心下執行緒的總數 是否符合我們的預期,blocked執行緒可能會導致死鎖情況

目前我們系統執行緒數相對較多,200 Tomcat worker執行緒、+ 10個系統hystrix執行緒 + 非同步任務執行緒 + 其他執行緒,這邊800多個在預期內,一些等待執行緒也是正常狀態

關於blocked執行緒,需要看下量是否多,是否是正常範圍內的,同線程狀態角度,涉及同步操作,是會產生blocked的,工程這邊kafka在poll期間程式碼實現通過synchronized實現,這塊是正常場景

waiting : nio Tomcat在壓力小的時候,worker等待新請求到來

Timed_waiting: nio tomcat 壓力大場景 worker通過執行緒池實現,請求進入佇列,新請求場景會等待釋放執行緒,同時處理預設200,出現這種場景,可以跟蹤下當時的流量和rt情況,檢視釋放因為請求量和響應時間問題導致這塊阻塞,主機效能好可以適當提升下併發處理量,並從介面效能層面跟蹤排查

非同步任務場景,進入條件等待也將產生timed_waiting

執行緒分組統計

這塊可以比較方便的看各個執行緒組的執行緒數量情況,通過這邊看是否符合預期場景

執行緒數量原則上是儘可能的保持小,多執行緒可以更多的佔用CPU,一定程度減少響應時間,但並非是提示系統性能的手段,過多的執行緒將代理CPU上下文切換時間,執行緒上下文拷貝也帶來系統資源的開銷

 

參考文件

https://dzone.com/articles/how-to-r