1. 程式人生 > >一次Full GC 過程的日誌分析

一次Full GC 過程的日誌分析


基礎概念:
Full GC,新生代,舊生代和持久代都發生GC,說明這次GC是發生了Stop-The-World的
ParNew GC發生區域,和收集器有關,這裡是ParNew收集器,如果為Serial收集器,則顯示DefNew。如果Parallel Scavenge收集器,那它配套的新生代稱為“PSYoungGen”
GC 332674.627 後面的數字為JVM啟動後到現在的時間
4194304K->2097152K(4194304K),GC前大小->GC後大小(總量)後面是時間
12708916K->11511721K(18874368K), 方括號外的為JAVA堆的大小
CMS Perm : 62737K->62170K(262144K),持久帶GC前後大小(總量)-XX:PermSize -XX:MaxPermSize
[Times: user=16.08 sys=0.67, real=2.01 secs] 這裡面的user、sys和real與Linux的time命令所輸出的時間含義一致,分別代表使用者態消耗的CPU時間、核心態消耗的CPU事件和操作從開始到結束所經過的牆鍾時間(Wall Clock Time),由於多核等情況前者時間會大雨real時間
[1 CMS-initial-mark: 9414569K(14680064K) 舊生代的使用量(總量),由JVM引數 -Xmx 減去 -Xmn的值決定

當時的log:
2016-03-07T16:19:41.876+0800: 332665.291: [GC 332665.291: [ParNew: 4194304K->2097152K(4194304K), 2.0051080 secs] 12708916K->11511721K(18874368K), 2.0059970 secs] [Times: user=16.08 sys=0.67, real=2.01 secs]
2016-03-07T16:19:43.886+0800: 332667.301: [GC [1 CMS-initial-mark: 9414569K(14680064K)] 11550068K(18874368K), 3.3386230 secs] [Times: user=3.33 sys=0.00, real=3.34 secs]
2016-03-07T16:19:47.226+0800: 332670.640: [CMS-concurrent-mark-start]
2016-03-07T16:19:51.212+0800: 332674.626: [GC 332674.627: [ParNew: 4194304K->2097152K(4194304K), 2.1854100 secs] 13608873K->12438922K(18874368K), 2.1863460 secs] [Times: user=17.75 sys=0.73, real=2.18 secs]
2016-03-07T16:19:57.435+0800: 332680.849: [GC 332680.850: [ParNew: 4194304K->2097152K(4194304K), 2.2184010 secs] 14536074K->13394507K(18874368K), 2.2193090 secs] [Times: user=17.90 sys=0.73, real=2.22 secs]
2016-03-07T16:20:03.112+0800: 332686.526: [GC 332686.527: [ParNew: 4194304K->1828266K(4194304K), 1.9129620 secs] 15491659K->14026818K(18874368K), 1.9138510 secs] [Times: user=15.57 sys=0.64, real=1.91 secs]
2016-03-07T16:20:09.003+0800: 332692.418: [GC 332692.418: [ParNew: 3925418K->2097152K(4194304K), 2.0599170 secs] 16123970K->15235066K(18874368K), 2.0608010 secs] [Times: user=17.08 sys=0.60, real=2.06 secs]
2016-03-07T16:20:14.993+0800: 332698.407: [GC 332698.408: [ParNew: 4194304K->2097152K(4194304K), 1.9636240 secs] 17332218K->16063831K(18874368K), 1.9645380 secs] [Times: user=16.40 sys=0.56, real=1.96 secs]
2016-03-07T16:20:20.981+0800: 332704.396: [GC 332704.396: [ParNew: 4194304K->4194304K(4194304K), 0.0000300 secs]332704.396: [CMS2016-03-07T16:20:26.292+0800: 332709.706: [CMS-concurrent-mark: 28.673/39.066 secs] [Times: user=198.02 sys=3.86, real=39.06 secs]
 (concurrent mode failure): 13966679K->14680064K(14680064K), 83.3313810 secs] 18160983K->15733347K(18874368K), [CMS Perm : 62737K->62170K(262144K)], 83.3323760 secs] [Times: user=93.59 sys=0.21, real=83.32 secs]
2016-03-07T16:21:44.355+0800: 332787.769: [GC [1 CMS-initial-mark: 14680064K(14680064K)] 15794750K(18874368K), 1.9568130 secs] [Times: user=1.96 sys=0.00, real=1.96 secs]
2016-03-07T16:21:46.312+0800: 332789.727: [CMS-concurrent-mark-start]
2016-03-07T16:22:06.161+0800: 332809.575: [CMS-concurrent-mark: 19.485/19.849 secs] [Times: user=80.27 sys=0.34, real=19.85 secs]
2016-03-07T16:22:06.161+0800: 332809.576: [CMS-concurrent-preclean-start]
2016-03-07T16:22:14.802+0800: 332818.216: [Full GC 332818.217: [CMS2016-03-07T16:22:39.945+0800: 332843.359: [CMS-concurrent-preclean: 33.775/33.784 secs] [Times: user=43.01 sys=0.00, real=33.78 secs]
 (concurrent mode failure): 14680064K->14680064K(14680064K), 107.8068800 secs] 18874368K->16636083K(18874368K), [CMS Perm : 62208K->62189K(262144K)], 107.8080400 secs] [Times: user=107.84 sys=0.00, real=107.79 secs]
2016-03-07T16:24:03.175+0800: 332926.590: [GC [1 CMS-initial-mark: 14680064K(14680064K)] 16799809K(18874368K), 3.6179980 secs] [Times: user=3.62 sys=0.00, real=3.62 secs]
2016-03-07T16:24:06.794+0800: 332930.208: [CMS-concurrent-mark-start]
2016-03-07T16:24:22.329+0800: 332945.744: [CMS-concurrent-mark: 15.519/15.535 secs] [Times: user=62.76 sys=0.00, real=15.54 secs]
2016-03-07T16:24:22.330+0800: 332945.744: [CMS-concurrent-preclean-start]
2016-03-07T16:24:33.575+0800: 332956.989: [Full GC 332956.990: [CMS2016-03-07T16:24:56.657+0800: 332980.071: [CMS-concurrent-preclean: 34.312/34.327 secs] [Times: user=46.40 sys=0.00, real=34.32 secs]
 (concurrent mode failure): 14680064K->14680063K(14680064K), 109.5823190 secs] 18874367K->17654739K(18874368K), [CMS Perm : 62193K->62189K(262144K)], 109.5831290 secs] [Times: user=109.65 sys=0.00, real=109.57 secs]
2016-03-07T16:26:23.920+0800: 333067.335: [GC [1 CMS-initial-mark: 14680063K(14680064K)] 17712480K(18874368K), 5.1742070 secs] [Times: user=5.18 sys=0.00, real=5.17 secs]
2016-03-07T16:26:29.095+0800: 333072.510: [CMS-concurrent-mark-start]
2016-03-07T16:26:44.112+0800: 333087.526: [Full GC 333087.526: [CMS2016-03-07T16:26:44.585+0800: 333087.999: [CMS-concurrent-mark: 15.467/15.490 secs] [Times: user=61.61 sys=0.00, real=15.49 secs]
 (concurrent mode failure): 14680063K->14680064K(14680064K), 77.7754890 secs] 18874367K->15185441K(18874368K), [CMS Perm : 62193K->62179K(262144K)], 77.7762560 secs] [Times: user=78.75 sys=0.00, real=77.76 secs]
2016-03-07T16:28:01.996+0800: 333165.410: [GC [1 CMS-initial-mark: 14680064K(14680064K)] 15279065K(18874368K), 1.2033830 secs] [Times: user=1.21 sys=0.00, real=1.20 secs]
2016-03-07T16:28:03.200+0800: 333166.614: [CMS-concurrent-mark-start]
2016-03-07T16:28:22.552+0800: 333185.966: [CMS-concurrent-mark: 19.330/19.352 secs] [Times: user=81.59 sys=0.00, real=19.35 secs]
2016-03-07T16:28:22.552+0800: 333185.966: [CMS-concurrent-preclean-start]
2016-03-07T16:28:35.178+0800: 333198.592: [Full GC 333198.593: [CMS2016-03-07T16:28:57.794+0800: 333221.208: [CMS-concurrent-preclean: 35.234/35.242 secs] [Times: user=50.21 sys=0.00, real=35.24 secs]
 (concurrent mode failure): 14680064K->14680063K(14680064K), 103.5000360 secs] 18874367K->15402236K(18874368K), [CMS Perm : 62184K->62180K(262144K)], 103.5009710 secs] [Times: user=103.52 sys=0.00, real=103.48 secs]
2016-03-07T16:30:18.751+0800: 333302.165: [GC [1 CMS-initial-mark: 14680063K(14680064K)] 15517749K(18874368K), 1.5607860 secs] [Times: user=1.57 sys=0.00, real=1.56 secs]
2016-03-07T16:30:20.312+0800: 333303.727: [CMS-concurrent-mark-start]
2016-03-07T16:30:42.220+0800: 333325.634: [CMS-concurrent-mark: 21.881/21.908 secs] [Times: user=88.56 sys=0.00, real=21.91 secs]
2016-03-07T16:30:42.220+0800: 333325.635: [CMS-concurrent-preclean-start]
2016-03-07T16:30:50.347+0800: 333333.761: [Full GC 333333.762: [CMS2016-03-07T16:31:16.589+0800: 333360.004: [CMS-concurrent-preclean: 34.361/34.369 secs] [Times: user=42.84 sys=0.00, real=34.36 secs]
 (concurrent mode failure): 14680063K->14680063K(14680064K), 106.9292860 secs] 18874367K->15439329K(18874368K), [CMS Perm : 62189K->62180K(262144K)], 106.9302400 secs] [Times: user=106.94 sys=0.00, real=106.92 secs]
2016-03-07T16:32:37.752+0800: 333441.166: [GC [1 CMS-initial-mark: 14680063K(14680064K)] 15681799K(18874368K), 1.9164530 secs]
2016-03-07T16:42:31.839+0800: 10.199: [GC 10.199: [ParNew: 2097152K->15901K(4194304K), 0.0188800 secs] 2097152K->15901K(18874368K), 0.0190050 secs] [Times: user=0.13 sys=0.01, real=0.02 secs]
2016-03-07T16:52:30.715+0800: 609.075: [GC 609.076: [ParNew: 2099051K->291937K(4194304K), 0.7579600 secs] 2099051K->291937K(18874368K), 0.7583510 secs] [Times: user=0.85 sys=0.37, real=0.76 secs]
2016-03-07T17:02:02.482+0800: 1180.842: [GC 1180.843: [ParNew: 2389089K->374558K(4194304K), 0.7744590 secs] 2389089K->374558K(18874368K), 0.7748880 secs] [Times: user=0.84 sys=0.36, real=0.78 secs]
Heap
 par new generation   total 4194304K, used 1872694K [0x00000002f0000000, 0x0000000470000000, 0x0000000470000000)
  eden space 2097152K,  71% used [0x00000002f0000000, 0x000000034b705f88, 0x0000000370000000)
  from space 2097152K,  17% used [0x00000003f0000000, 0x0000000406dc7908, 0x0000000470000000)
  to   space 2097152K,   0% used [0x0000000370000000, 0x0000000370000000, 0x00000003f0000000)
 concurrent mark-sweep generation total 14680064K, used 0K [0x0000000470000000, 0x00000007f0000000, 0x00000007f0000000)
 concurrent-mark-sweep perm gen total 262144K, used 50238K [0x00000007f0000000, 0x0000000800000000, 0x0000000800000000)




當時的JVM引數:
-server -Xms20g -Xmx20g -Xmn6g -XX:PermSize=256m -XX:MaxPermSize=256m -Xss256K -XX:+DisableExplicitGC -XX:SurvivorRatio=1 -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:LargePageSizeInBytes=128m -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60 -Dconf.type=confcenter -Djava.awt.headless=true -Dfile.encoding=gbk


GC過程:
ParNew收集器是年輕代的並行GC,在對新生代GC時需要配合舊生代CMS GC,CMS(concurrent mark sweep)是一種併發GC,對應用的暫停時間短,GC要進行三次mark,時間長。
CMS GC觸發,開始進行三次標記:
第一次CMS-initial-mark,該步驟暫停整個應用,並對舊生代物件掃描著色
第二次CMS-concurrent-mark,併發標記,恢復應用執行緒,對著色的物件進行輪詢,以標記這些物件的可訪問物件
第三次final mark(remark),重新著色,需要暫停整個應用,因為步驟二時,程式可能會改變飲用關係,所以需要重新著色
最後,sweep,恢復應用所有執行緒,將沒有標記的物件回收。這是一次正常的過程,而該日誌中出現了錯誤,稍後會進行分析。
由於CMS會產生碎片,所以通過-XX:+UseCMSCompactAtFullCollection 來使得每次Full GC時重新整理記憶體,這個整理過程也會暫停整個應用
CMS預設會開啟多個執行緒(數目為(並行GC執行緒數+3)/4),通過-XX:CMSInitiatingOccupancyFraction=60,設定舊生代使用比例達到該比例後,觸發CMS GC


進一步分析日誌,日誌中的重疊(一行中多條日誌)是由於併發操作導致的
CMS-concurrent-preclean 過程重新掃描concurrent mark階段heap中被新建立的物件,或從新生代晉升到舊生代物件的引用關係,減少remark步驟的時間
 concurrent mode failure 觸發了Full Gc,該錯誤是由於在執行CMS GC的時候,同時有物件要放入舊生代,而此時舊生代的空間不足。
 應對措施:增大舊生代空間或者調低併發GC的比率,但在jdk5,jdk6中存在jdk的bug,導致CMS的remark完畢後,很久才出發sweep動作,這種情況可以通過設定-XX:CMSMaxAbortablePrecleanTime=5(單位ms)來避免


總體上,該日誌描述了,在進行CMS GC的時候,年輕代GC也在執行,並且向舊生代放入物件,此時由於CMS GC還沒有執行完,因此舊生代空間不足,所以觸發了concurrent mode failure,導致Full GC。從數值分析,CMS GC時舊生代正好剩餘5G,按理說可以分配下新生代的物件,但是可能因為碎片的存在導致無法分配,導致Full GC失敗,整個程式Crash掉,之後應用自動重啟,恢復正常。
程式恢復後列印了堆的狀態,新生代一共6G,eden2G,兩個survivor每個2G,舊生代14G,持久代256Mb