1. 程式人生 > >HBase GC日誌

HBase GC日誌

cup min 能夠 app jdk 幫助 mas 狀態 有一個

HBase依靠ZooKeeper來感知集群成員及其存活性。假設一個server暫停了非常長時間,它將無法給ZooKeeper quorum發送心跳信息,其他server會覺得這臺server已死亡。這將導致master為其啟動恢復進程。當該server脫離停頓時,它會發現它的全部租約都已失效(hbase client端每次和regionserver交互的時候,都會在服務器端生成一個租約(Lease)。租約的有效期由參數hbase.regionserver.lease.period確定)。然後自殺。

HBase開發團隊親切地稱這個場景為Juliet Pause。

在HBase中使用默認GC,能夠看到全部線程中的長時間停頓,包含Juliet Pause

, 又名"GC of Death"。能夠在JAVA虛擬機中開啟GC日誌,來幫助調試這樣的問題。

hbase-env.sh中,取消下面任一凝視就可以開啟GC日誌:

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# If <FILE-PATH> is not replaced, the log file(.gc) would be generated in the HBASE_LOG_DIR.       


技術分享

點擊"Local logs",查看gc-xxx.log日誌,格式例如以下:

15578637.583: [GC [1 CMS-initial-mark: 12076002K(12582912K)] 18788432K(29360128K), 1.0113310 secs] [Times: user=1.01 sys=0.00, real=1.01 secs]
15578638.595: [CMS-concurrent-mark-start]
15578639.125: [CMS-concurrent-mark: 0.530/0.530 secs] [Times: user=1.61 sys=0.01, real=0.53 secs]
15578639.125: [CMS-concurrent-preclean-start]
15578639.160: [CMS-concurrent-preclean: 0.035/0.035 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
15578639.160: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 15578644.606: [CMS-concurrent-abortable-preclean: 5.446/5.446 secs] [Times: user=5.82 sys=0.07, real=5.45 secs]
15578644.607: [GC[YG occupancy: 6769031 K (16777216 K)]15578644.607: [Rescan (parallel) , 0.9756010 secs]15578645.583: [weak refs processing, 0.0000280 secs]15578645.583: [scrub string table, 0.0006370 secs] [1 CMS-remark: 12076002K(12582912K)] 18845034K(29360128K), 0.9764070 secs] [Times: user=9.52 sys=0.03, real=0.97 secs]
15578645.584: [CMS-concurrent-sweep-start]
15578645.829: [CMS-concurrent-sweep: 0.245/0.245 secs] [Times: user=0.30 sys=0.00, real=0.24 secs]
15578645.829: [CMS-concurrent-reset-start]
15578645.855: [CMS-concurrent-reset: 0.026/0.026 secs] [Times: user=0.05 sys=0.01, real=0.03 secs] 

15578647.856: [GC [1 CMS-initial-mark: 12075999K(12582912K)] 20737398K(29360128K), 1.0872730 secs] [Times: user=1.09 sys=0.00, real=1.08 secs]
15578648.943: [CMS-concurrent-mark-start]
15578649.472: [CMS-concurrent-mark: 0.528/0.528 secs] [Times: user=2.06 sys=0.02, real=0.53 secs]
15578649.472: [CMS-concurrent-preclean-start]
15578649.507: [CMS-concurrent-preclean: 0.035/0.035 secs] [Times: user=0.04 sys=0.00, real=0.04 secs]
15578649.507: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 15578654.961: [CMS-concurrent-abortable-preclean: 5.454/5.454 secs] [Times: user=8.15 sys=0.17, real=5.45 secs]
15578654.962: [GC[YG occupancy: 11315403 K (16777216 K)]15578654.963: [Rescan (parallel) , 1.1274320 secs]15578656.090: [weak refs processing, 0.0000240 secs]15578656.090: [scrub string table, 0.0005890 secs] [1 CMS-remark: 12075999K(12582912K)] 23391403K(29360128K), 1.1281890 secs] [Times: user=11.00 sys=0.03, real=1.13 secs]
15578656.091: [CMS-concurrent-sweep-start]
15578656.327: [CMS-concurrent-sweep: 0.236/0.236 secs] [Times: user=0.52 sys=0.01, real=0.24 secs]
15578656.327: [CMS-concurrent-reset-start]
15578656.353: [CMS-concurrent-reset: 0.026/0.026 secs] [Times: user=0.05 sys=0.00, real=0.02 secs] </span></span>

CMS垃圾回收的運行過程:
1、CMS-initial-mark :初始標記,從root對象開始標記存活的對象。它花費的時間是real=1.01 secs,因為這一步驟是STW的,所以整個application被暫停了1.01秒。而且,user time和real time相差不大,所以確實是僅僅有一個thread運行這一步;
2、CMS-concurrent-mark-start並發標記。它運行時間是real=0.53 secs,但由於這一步是能夠並發運行的。所以系統在這段時間內並沒有暫停;
3、CMS-concurrent-preclean-startconcurrent precleaning。相同的。這一步也是並發運行;
4、CMS-remark:又一次標記。暫停應用程序同一時候啟動一定數目的垃圾回收線程。並行地標記第2階段遺漏的對象(在並發標記階段結束後對象狀態的更新導致)

它的運行時間是real=0.97 secs。

由於這是STW的步驟,而且它的pause time通常是最長的,所以這一步的運行時間會直接決定這次Full GC對系統的影響。 5、CMS-concurrent-sweep-start:並發清除。 6、CMS-concurrent-reset-start:並發重設狀態等待下一次CMS的觸發。 僅僅有在第1、4階段須要暫停全部的應用程序。其他階段的垃圾回收線程與應用程序線程並發運行。

HBase GC日誌