1. 程式人生 > >linux適當的時候用ps、top檢視執行緒狀態

linux適當的時候用ps、top檢視執行緒狀態

最近定位了一個執行緒沒有正常退出的bug,導致一直建立執行緒,然後排程超時掛死的bug,花了兩天的時間,要是儘早用sp看一下,這個問題就結束了,所以別看命令簡單,關鍵時候還是好用的。

1. pstree

pstree以樹結構顯示程序
$ pstree -p work | grep ad
sshd(22669)---bash(22670)---ad_preprocess(4551)-+-{ad_preprocess}(4552)
                                                |-{ad_preprocess}(4553)
                                                |-{ad_preprocess}(4554)
                                                |-{ad_preprocess}(4555)
                                                |-{ad_preprocess}(4556)
                                                `-{ad_preprocess}(4557)

work為工作使用者,-p為顯示程序識別碼,ad_preprocess共啟動了6個子執行緒,加上主執行緒共7個執行緒

2. ps -Lf

$ ps -Lf 4551
UID        PID  PPID   LWP  C NLWP STIME TTY      STAT   TIME CMD
work      4551 22670  4551  2    7 16:30 pts/2    Sl+    0:02 ./ad_preprocess
work      4551 22670  4552  0    7 16:30 pts/2    Sl+    0:00 ./ad_preprocess
work      4551 22670  4553  0    7 16:30 pts/2    Sl+    0:00 ./ad_preprocess
work      4551 22670  4554  0    7 16:30 pts/2    Sl+    0:00 ./ad_preprocess
work      4551 22670  4555  0    7 16:30 pts/2    Sl+    0:00 ./ad_preprocess
work      4551 22670  4556  0    7 16:30 pts/2    Sl+    0:00 ./ad_preprocess
work      4551 22670  4557  0    7 16:30 pts/2    Sl+    0:00 ./ad_preprocess

程序共啟動了7個執行緒

linux上程序有5種狀態:
1. 執行(正在執行或在執行佇列中等待)
2. 中斷(休眠中, 受阻, 在等待某個條件的形成或接受到訊號)
3. 不可中斷(收到訊號不喚醒和不可執行, 程序必須等待直到有中斷髮生)
4. 僵死(程序已終止, 但程序描述符存在, 直到父程序呼叫wait4()系統呼叫後釋放)
5. 停止(程序收到SIGSTOP, SIGSTP, SIGTIN, SIGTOU訊號後停止執行執行)

ps工具標識程序的5種狀態碼:
D 不可中斷 uninterruptible sleep (usually IO)
R 執行 runnable (on run queue)
S 中斷 sleeping
T 停止 traced or stopped
Z 僵死 a defunct (”zombie”) process

3. pstack

pstack顯示每個程序的棧跟蹤

$ pstack 4551
Thread 7 (Thread 1084229984 (LWP 4552)):
#0  0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6
#1  0x00000000006f0730 in ub::EPollEx::poll ()
#2  0x00000000006f172a in ub::NetReactor::callback ()
#3  0x00000000006fbbbb in ub::UBTask::CALLBACK ()
#4  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6  0x0000000000000000 in ?? ()
Thread 6 (Thread 1094719840 (LWP 4553)):
#0  0x000000302afc63dc in epoll_wait () from /lib64/tls/libc.so.6
#1  0x00000000006f0730 in ub::EPollEx::poll ()
#2  0x00000000006f172a in ub::NetReactor::callback ()
#3  0x00000000006fbbbb in ub::UBTask::CALLBACK ()
#4  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6  0x0000000000000000 in ?? ()
Thread 5 (Thread 1105209696 (LWP 4554)):
#0  0x000000302b80baa5 in __nanosleep_nocancel ()
#1  0x000000000079e758 in comcm::ms_sleep ()
#2  0x00000000006c8581 in ub::UbClientManager::healthyCheck ()
#3  0x00000000006c8471 in ub::UbClientManager::start_healthy_check ()
#4  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6  0x0000000000000000 in ?? ()
Thread 4 (Thread 1115699552 (LWP 4555)):
#0  0x000000302b80baa5 in __nanosleep_nocancel ()
#1  0x0000000000482b0e in armor::armor_check_thread ()
#2  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#3  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#4  0x0000000000000000 in ?? ()
Thread 3 (Thread 1126189408 (LWP 4556)):
#0  0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1  0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6
#2  0x000000000044c972 in Business_config_manager::run ()
#3  0x0000000000457b83 in Thread::run_thread ()
#4  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#5  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#6  0x0000000000000000 in ?? ()
Thread 2 (Thread 1136679264 (LWP 4557)):
#0  0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1  0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6
#2  0x00000000004524bb in Process_thread::sleep_period ()
#3  0x0000000000452641 in Process_thread::run ()
#4  0x0000000000457b83 in Thread::run_thread ()
#5  0x000000302b80610a in start_thread () from /lib64/tls/libpthread.so.0
#6  0x000000302afc6003 in clone () from /lib64/tls/libc.so.6
#7  0x0000000000000000 in ?? ()
Thread 1 (Thread 182894129792 (LWP 4551)):
#0  0x000000302af8f1a5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1  0x000000302af8f010 in sleep () from /lib64/tls/libc.so.6
#2  0x0000000000420d79 in Ad_preprocess::run ()
#3  0x0000000000450ad0 in main ()

4、kill 終止程序
有十幾種控制程序的方法,下面是一些常用的方法:
kill -STOP [pid]
傳送SIGSTOP (17,19,23)停止一個程序,而並不消滅這個程序。
kill -CONT [pid]
傳送SIGCONT (19,18,25)重新開始一個停止的程序。
kill -KILL [pid]
傳送SIGKILL (9)強迫程序立即停止,並且不實施清理操作。
kill -9 -1
終止你擁有的全部程序。
SIGKILL 和 SIGSTOP 訊號不能被捕捉、封鎖或者忽略,但是,其它的訊號可以。所以這是你的終極武器。