1. 程式人生 > >DB2調優(二)資源監控

DB2調優(二)資源監控

報告 介紹 mem physical useful nfa data ats destroy

本次性能調優項目中由於涉及的環節較多,最好能夠將生成環境中的所有內容進行監控,同時考慮最低開銷,這樣就從應用服務器和數據庫服務器兩個服務器進行,以nmon作為監控基礎數據,同時監控JVM和數據庫告警和快照。
所有監控的內容都是手段,只有從海量的監控日誌中得到規律性、有意義的數據才是性能優化的基礎。有了數據就是對數據的分析,本文將首先介紹需要獲取的數據,內容也將是我從項目獲取的經驗。
基礎環境:

兩臺數據庫服務器,做的數據庫集群。

應用服務器 - JVM線程

項目中主要使用tongweb(老系統版本很低),監控內容類似如下:

監控內容

...
"2018-01-11T02:25:55.663+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnCreated","10",
"2018-01-11T02:25:55.663+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnAcquired","111292",
"2018-01-11T02:25:55.663+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnNotSuccessfullyMatched","0",
"2018-01-11T02:26:25.670+0800","com.tongtech.tongweb:type=jvm,category=monitor,server=server","UpTime","222520621",
"2018-01-11T02:26:25.670+0800","com.tongtech.tongweb:type=jvm,category=monitor,server=server","HeapSize","2143485952",
"2018-01-11T02:26:25.671+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnUsed","0",
"2018-01-11T02:26:25.671+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnSuccessfullyMatched","0",
"2018-01-11T02:26:25.671+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","WaitQueueLength","0",
"2018-01-11T02:26:25.671+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnDestroyed","0",
"2018-01-11T02:26:25.671+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","ConnRequestWaitTime","4",
"2018-01-11T02:26:25.672+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnFailedValidation","0",
"2018-01-11T02:26:25.672+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnReleased","111292",
"2018-01-11T02:26:25.672+0800","com.tongtech.tongweb:name=***,type=jdbc-connection-pool,category=monitor,server=server","NumConnFree","10",
...

關註內容

tongweb的監控數據獲取連接池狀態等信息,我們的方法是通過Excel宏的方式將日誌內轉換成可讀數據,並進行圖形分析。具體內容將單獨說明。
JVM線程監控說明

監控意義

通過對tongweb的JVM監控,可初步判定性能高峰時間點、連接池是否滿,同時進一步判定連接高峰期的性能瓶頸是否出現在應用上,這對今後的性能分析尤為重要,可將主要性能問題歸類,減少不必要的工作。

應用服務器 - netstat

在Internet RFC標準中,Netstat的定義是: Netstat是在內核中訪問網絡連接狀態及其相關信息的程序,它能提供TCP連接,TCP和UDP監聽,進程內存管理的相關報告。

監控內容

以下是在項目中獲取的日誌摘取

...
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 0.0.0.0:2049            0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:139             0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:427         0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:427           0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:58862           0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:2544            0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:21              0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:631             0.0.0.0:*               LISTEN      
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:445             0.0.0.0:*               LISTEN      
tcp        0      0 0.0.0.0:669             0.0.0.0:*               LISTEN  
...

應用服務器 - nmon

作為本次性能優化主要的分析手段,nmon起著尤為重要的作用,以下是wiki的解釋,有時間可以了解

nmon collects the following operating system statistics:
CPU and CPU threads Utilisation
CPU frequency for servers or virtual machines that can alter their clock rate
GPU stats including utilisation, MHz and temperatures
Physical and Virtual Memory use
Disk read & write and transfers
Disk Groups decided by the user
Swap and Paging
Network read & write and transfers
Local File-systems
Network File-system (NFS)
Top Processes by CPU use, Memory size and I/O rates
Kernel stats including Run Queue, context-switch, fork, Load Average & Uptime
Large and Huge memory pages
Virtual Machine stats (depending on the hardware) - useful for Linux running KVM to host virtual machines
Resources in the Server and virtual machine

總結其實nmon更像是系統性能開銷的快照,結合對nmon的分析工具可以很清楚的掌握系統的各項指標。
下載分析工具

數據庫服務器 - 告警

了解數據庫的告警日誌也是掌握當前性能的關鍵環節。

日誌如下,如出現error可以針對具體情況進行分析解決。

2018-01-11-00.36.36.090562+480 I13363168A459      LEVEL: Error
PID     : 2228842              TID  : 142490      PROC : db2sysc
INSTANCE: db2             NODE : 000         DB   : TRADE
EDUID   : 142490               EDUNAME: db2agent (**) 0
FUNCTION: DB2 UDB, Query Gateway, sqlqg_fedstp_hook, probe:40
MESSAGE : Unexpected error returned from outer RC=
DATA #1 : Hexdump, 4 bytes
0x07000007053F28D0 : 8126 0012                                  .&..

數據庫服務器 - 快照

數據庫日誌快照將作為主要分析依據,在快照中可以分析數據庫時間的開銷情況,如下:

...

Number of automatic storage paths          = 1
Automatic storage path                     = /db2data
      Node number                          = 0
      State                                = In Use
      File system ID                       = 9223372079804448776
      Storage path free space (bytes)      = 69730709504
      File system used space (bytes)       = 139648946176
      File system total space (bytes)      = 209379655680

...

本文只是列出了分析的方法,具體操作有時間我會慢慢總結。
工具的利用固然重要,但是性能調優並不是僅僅如此,必須步步為營做好長期作戰的準備。

DB2調優(二)資源監控