HBase Region 合併分析
1.概述
HBase中表的基本單位是Region,日常在呼叫HBase API操作一個表時,互動的資料也會以Region的形式進行呈現。一個表可以有若干個Region,今天筆者就來和大家分享一下Region合併的一些問題和解決方法。
2.內容
在分析合併Region之前,我們先來了解一下Region的體系結構,如下圖所示:
從圖中可知,能夠總結以下知識點:
-
HRegion:一個Region可以包含多個Store;
-
Store:每個Store包含一個Memstore和若干個StoreFile;
-
StoreFile:表資料真實儲存的地方,HFile是表資料在HDFS上的檔案格式。
如果要檢視HFile檔案,HBase有提供命令,命令如下:
1hbase hfile -p -f /hbase/data/default/ip_login/d0d7d881bb802592c09d305e47ae70a5/_d/7ec738167e9f4d4386316e5e702c8d3d
執行輸出結果,如下圖所示:
2.1 為什麼需要合併Region
那為什麼需要合併Region呢?這個需要從Region的Split來說。當一個Region被不斷的寫資料,達到Region的Split的閥值時(由屬性hbase.hregion.max.filesize來決定,預設是10GB),該Region就會被Split成2個新的Region。隨著業務資料量的不斷增加,Region不斷的執行Split,那麼Region的個數也會越來越多。
一個業務表的Region越多,在進行讀寫操作時,或是對該表執行Compaction操作時,此時叢集的壓力是很大的。這裡筆者做過一個線上統計,在一個業務表的Region個數達到9000+時,每次對該表進行Compaction操作時,叢集的負載便會加重。而間接的也會影響應用程式的讀寫,一個表的Region過大,勢必整個叢集的Region個數也會增加,負載均衡後,每個RegionServer承擔的Region個數也會增加。
因此,這種情況是很有必要的進行Region合併的。比如,當前Region進行Split的閥值設定為30GB,那麼我們可以對小於等於10GB的Region進行一次合併,減少每個業務表的Region,從而降低整個叢集的Region,減緩每個RegionServer上的Region壓力。
2.2 如何進行Region合併
那麼我們如何進行Region合併呢?HBase有提供一個合併Region的命令,具體操作如下:
1# 合併相鄰的兩個Region 2hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME' 3# 強制合併兩個Region 4hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true
但是,這種方式會有一個問題,就是隻能一次合併2個Region,如果這裡有幾千個Region需要合併,這種方式是不可取的。
2.2.1 批量合併
這裡有一種批量合併的方式,就是通過編寫指令碼(merge_small_regions.rb)來實現,實現程式碼如下:
1# Test Mode: 2# 3# hbase org.jruby.Main merge_empty_regions.rb namespace.tablename <skip_size> <batch_regions> <merge?> 4# 5# Non Test - ie actually do the merge: 6# 7# hbase org.jruby.Main merge_empty_regions.rb namespace.tablename <skip_size> <batch_regions> merge 8# 9# Note: Please replace namespace.tablename with your namespace and table, eg NS1.MyTable. This value is case sensitive. 10 11require 'digest' 12require 'java' 13java_import org.apache.hadoop.hbase.HBaseConfiguration 14java_import org.apache.hadoop.hbase.client.HBaseAdmin 15java_import org.apache.hadoop.hbase.TableName 16java_import org.apache.hadoop.hbase.HRegionInfo; 17java_import org.apache.hadoop.hbase.client.Connection 18java_import org.apache.hadoop.hbase.client.ConnectionFactory 19java_import org.apache.hadoop.hbase.client.Table 20java_import org.apache.hadoop.hbase.util.Bytes 21 22def list_bigger_regions(admin, table, low_size) 23cluster_status = admin.getClusterStatus() 24master = cluster_status.getMaster() 25biggers = [] 26cluster_status.getServers.each do |s| 27cluster_status.getLoad(s).getRegionsLoad.each do |r| 28# getRegionsLoad returns an array of arrays, where each array 29# is 2 elements 30 31# Filter out any regions that don't match the requested 32# tablename 33next unless r[1].get_name_as_string =~ /#{table}\,/ 34if r[1].getStorefileSizeMB() > low_size 35if r[1].get_name_as_string =~ /\.([^\.]+)\.$/ 36biggers.push $1 37else 38raise "Failed to get the encoded name for #{r[1].get_name_as_string}" 39end 40end 41end 42end 43biggers 44end 45 46# Handle command line parameters 47table_name = ARGV[0] 48low_size = 1024 49if ARGV[1].to_i >= low_size 50low_size=ARGV[1].to_i 51end 52 53limit_batch = 1000 54if ARGV[2].to_i <= limit_batch 55limit_batch = ARGV[2].to_i 56end 57do_merge = false 58if ARGV[3] == 'merge' 59do_merge = true 60end 61 62config = HBaseConfiguration.create(); 63connection = ConnectionFactory.createConnection(config); 64admin = HBaseAdmin.new(connection); 65 66bigger_regions = list_bigger_regions(admin, table_name, low_size) 67regions = admin.getTableRegions(Bytes.toBytes(table_name)); 68 69puts "Total Table Regions: #{regions.length}" 70puts "Total bigger regions: #{bigger_regions.length}" 71 72filtered_regions = regions.reject do |r| 73bigger_regions.include?(r.get_encoded_name) 74end 75 76puts "Total regions to consider for Merge: #{filtered_regions.length}" 77 78filtered_regions_limit = filtered_regions 79 80if filtered_regions.length < 2 81puts "There are not enough regions to merge" 82filtered_regions_limit = filtered_regions 83end 84 85if filtered_regions.length > limit_batch 86filtered_regions_limit = filtered_regions[0,limit_batch] 87puts "But we will merge : #{filtered_regions_limit.length} regions because limit in parameter!" 88end 89 90 91r1, r2 = nil 92filtered_regions_limit.each do |r| 93if r1.nil? 94r1 = r 95next 96end 97if r2.nil? 98r2 = r 99end 100# Skip any region that is a split region 101if r1.is_split() 102r1 = r2 103r2 = nil 104puts "Skip #{r1.get_encoded_name} bcause it in spliting!" 105next 106end 107if r2.is_split() 108r2 = nil 109 puts "Skip #{r2.get_encoded_name} bcause it in spliting!" 110next 111end 112if HRegionInfo.are_adjacent(r1, r2) 113# only merge regions that are adjacent 114puts "#{r1.get_encoded_name} is adjacent to #{r2.get_encoded_name}" 115if do_merge 116admin.mergeRegions(r1.getEncodedNameAsBytes, r2.getEncodedNameAsBytes, false) 117puts "Successfully Merged #{r1.get_encoded_name} with #{r2.get_encoded_name}" 118sleep 2 119end 120r1, r2 = nil 121else 122puts "Regions are not adjacent, so drop the first one and with the #{r2.get_encoded_name} toiterate again" 123r1 = r2 124r2 = nil 125end 126end 127admin.close
該指令碼預設是合併1GB以內的Region,個數為1000個。如果我們要合併小於10GB,個數在4000以內,指令碼(merging-region.sh)如下:
1#! /bin/bash 2 3num=$1 4 5echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : RegionServer Start Merging..." 6if [ ! -n "$num" ]; then 7echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Default Merging 10 Times." 8num=10 9elif [[ $num == *[!0-9]* ]]; then 10echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Input [$num] Times Must Be Number." 11exit 1 12else 13echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : User-Defined Merging [$num] Times." 14fi 15 16for (( i=1; i<=$num; i++ )) 17do 18echo "[`date "+%Y-%m-%d %H:%M:%S"`] INFO : Merging [$i] Times,Total [$num] Times." 19hbase org.jruby.Main merge_small_regions.rb namespace.tablename 102404000 merge 20sleep 5 21done
在merging-region.sh指令碼中,做了引數控制,可以迴圈來執行批量合併指令碼。可能在實際操作過程中,批量執行一次Region合併,合併後的結果Region還是有很多(可能此時又有新的Region生成),這是我們可以使用merging-region.sh這個指令碼多次執行批量合併Region操作,具體操作命令如下:
1# 預設迴圈10次,例如本次迴圈執行5次 2sh merging-region.sh 5
2.3 如果在合併Region的過程中出現永久RIT怎麼辦
在合併Region的過程中出現永久RIT怎麼辦?筆者在生產環境中就遇到過這種情況,在批量合併Region的過程中,出現了永久MERGING_NEW的情況,雖然這種情況不會影響現有叢集的正常的服務能力,但是如果叢集有某個節點發生重啟,那麼可能此時該RegionServer上的Region是沒法均衡的。因為在RIT狀態時,HBase是不會執行Region負載均衡的,即使手動執行balancer命令也是無效的。
如果不解決這種RIT情況,那麼後續有HBase節點相繼重啟,這樣會導致整個叢集的Region驗證不均衡,這是很致命的,對叢集的效能將會影響很大。經過查詢HBase JIRA單,發現這種MERGING_NEW永久RIT的情況是觸發了HBASE-17682的BUG,需要打上該Patch來修復這個BUG,其實就是HBase原始碼在判斷業務邏輯時,沒有對MERGING_NEW這種狀態進行判斷,直接進入到else流程中了。原始碼如下:
1for (RegionState state : regionsInTransition.values()) { 2HRegionInfo hri = state.getRegion(); 3if (assignedRegions.contains(hri)) { 4// Region is open on this region server, but in transition. 5// This region must be moving away from this server, or splitting/merging. 6// SSH will handle it, either skip assigning, or re-assign. 7LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn); 8} else if (sn.equals(state.getServerName())) { 9// Region is in transition on this region server, and this 10// region is not open on this server. So the region must be 11// moving to this server from another one (i.e. opening or 12// pending open on this server, was open on another one. 13// Offline state is also kind of pending open if the region is in 14// transition. The region could be in failed_close state too if we have 15// tried several times to open it while this region server is not reachable) 16if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) { 17LOG.info("Found region in " + state + 18" to be reassigned by ServerCrashProcedure for " + sn); 19rits.add(hri); 20} else if(state.isSplittingNew()) { 21regionsToCleanIfNoMetaEntry.add(state.getRegion()); 22} else { 23LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state); 24} 25} 26}
修復之後的程式碼如下:
1for (RegionState state : regionsInTransition.values()) { 2HRegionInfo hri = state.getRegion(); 3if (assignedRegions.contains(hri)) { 4// Region is open on this region server, but in transition. 5// This region must be moving away from this server, or splitting/merging. 6// SSH will handle it, either skip assigning, or re-assign. 7LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn); 8} else if (sn.equals(state.getServerName())) { 9// Region is in transition on this region server, and this 10// region is not open on this server. So the region must be 11// moving to this server from another one (i.e. opening or 12// pending open on this server, was open on another one. 13// Offline state is also kind of pending open if the region is in 14// transition. The region could be in failed_close state too if we have 15// tried several times to open it while this region server is not reachable) 16if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) { 17LOG.info("Found region in " + state + 18" to be reassigned by ServerCrashProcedure for " + sn); 19rits.add(hri); 20} else if(state.isSplittingNew()) { 21regionsToCleanIfNoMetaEntry.add(state.getRegion()); 22} else if (isOneOfStates(state, State.SPLITTING_NEW, State.MERGING_NEW)) { 23regionsToCleanIfNoMetaEntry.add(state.getRegion()); 24}else { 25LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state); 26} 27} 28}
但是,這裡有一個問題,目前該JIRA單只是說了需要去修復BUG,打Patch。但是,實際生產情況下,面對這種RIT情況,是不可能長時間停止叢集,影響應用程式讀寫的。那麼,有沒有臨時的解決辦法,先臨時解決當前的MERGING_NEW這種永久RIT,之後在進行HBase版本升級操作。
辦法是有的,在分析了MERGE合併的流程之後,發現HBase在執行Region合併時,會先生成一個初始狀態的MERGING_NEW。整個Region合併流程如下:
從流程圖中可以看到,MERGING_NEW是一個初始化狀態,在Master的記憶體中,而處於Backup狀態的Master記憶體中是沒有這個新Region的MERGING_NEW狀態的,那麼可以通過對HBase的Master進行一個主備切換,來臨時消除這個永久RIT狀態。而HBase是一個高可用的叢集,進行主備切換時對使用者應用來說是無感操作。因此,面對MERGING_NEW狀態的永久RIT可以使用對HBase進行主備切換的方式來做一個臨時處理方案。之後,我們在對HBase進行修復BUG,打Patch進行版本升級。
3.總結
HBase的RIT問題,是一個比較常見的問題,在遇到這種問題時,可以先冷靜的分析原因,例如檢視Master的日誌、仔細閱讀HBase Web頁面RIT異常的描述、使用hbck命令檢視Region、使用fsck檢視HDFS的block等。分析出具體的原因後,我們在對症下藥,做到大膽猜想,小心求證。
4.結束語
長按關注,獲取更多幹貨
這篇部落格就和大家分享到這裡,如果大家在研究學習的過程當中有什麼問題,可以加群進行討論或傳送郵件給我,我會盡我所能為您解答,與君共勉!
博主出版了《Hadoop大資料探勘從入門到進階實戰》,喜歡的朋友或同學, 可以去購買博主的書進行學習,在此感謝大家的支援。