Hbase表級別元資料一致性和hbck原理

阿新 • • 發佈：2019-01-09

最近重新回到熟悉的hbase領域，感慨還是很多。首先終於又可以沉下心來好好搞技術了，其次看到現在有衝勁有追求的年輕人就像看到原來的自己。大資料需要一代一代人傳承下去。

最近處於叢集管理方便以及資源合理利用的考慮我們上線了region group的patch，將原來在2.0裡面才合併的patch 加到了0.98版本中。初始使用的時候挺好，但是也遇到了一點問題——在做表group之間遷移的時候發現master頁面上的元資料資訊有誤。而實際的region分配卻沒問題

為啥會出現這種情況，就需要我們瞭解hbase關於表、region元資料如何管理的問題。

首先，大家知道的比較多的是zk中儲存的元資料資訊

一、ZK元資料

第一級目錄/hbase

第二級子目錄
/meta-region-server## meta表所在的rs位置，最初的bigtable論文中是有root表到meta表兩級的，hbase原來也有，後來是發現一個meta表能索引的region數量已經足夠用了，而多加一級root表多一次路由沒意義就捨棄掉了
/acl## 子節點儲存表以及namespace級別許可權控制，再下一級子節點儲存哪些user擁有什麼許可權
/backup-masters## 子節點儲存standby master的地址,埠,啟動時間
/table## 子節點儲存這個叢集所有的表資訊，無論是否enable
/draining## 儲存regionserver的臨時變化情況，一般是下線多個regionserver時使用
/region-in-transition## 儲存處於事物中的region(split/online/offline/compact等)
/running## hbase叢集是否正常執行

/table-lock## 鎖表資訊，在表發生變更時使用
/master## 叢集的master地址
/balancer## loaderbalancer是否被開啟
/namespace## 當前所有的namespace

/hbaseid## 叢集啟動時生成的唯一id

/online-snapshot## 線上的快照

/replication## hbase的replication配置，有rs和peers兩個元資料資訊
/groupInfo## 儲存的group資訊
/splitWAL## 用來構造一個region server的splitlog目錄
/recovering-regions## 儲存恢復中的regions
/rs## 當前所有線上的regionserver資訊

二、meta表元資料

meta表中儲存的就是所有region狀態的資訊

rowkey組成為——namespace:tablename,,timestamp.md5

列族為info，子列包括

server——region所屬regionserver位置

serverstartcode——server啟動的startcode，rs每次重啟之後就不是"自己"了，而是用startcode標識的一個rs，所以要重新分配region

regioninfo——region的ENCODED, STARTKEY, ENDKEY

三、HDFS目錄中元資料

/hbase/.tmp## 臨時目錄，一般是存放compact、split等操作過程中的臨時檔案
/hbase/WALs		## 儲存每個regionserver的WAL日誌，子目錄是每個rs的名稱
/hbase/archive## 儲存compact過程中不用的HFile,刪除表的資料也再，過期(5分鐘)會被刪除.可以用來恢復誤drop的表，快照也儲存在這
/hbase/corrupt## 錯誤檔案路徑
/hbase/data## 所有的表資料都在data下
/hbase/hbase.id## 當前叢集啟動的id，每次啟動都不同
/hbase/hbase.version## hbase版本號
/hbase/oldWALs## 過期的WAL日誌，等待被清除

四、Hmaster記憶體中的資料

這個部分的資料在任何地方都比較少介紹，但是其實是非常重要的！

infoServer——儲存web UI需要的相關資訊

ZookeeperWatcher——保持和zooKeeper連線

activeMasterManager——管理並存儲了當前active的master

regionServerTracker——追蹤regionserver

drainingServerTracker——追蹤drainning狀態regionserver

groupAdminServer——region group元資料資訊

tableNamespaceManager——namespace元資料資訊

五、HBck檢查過程

有了上面所說的元資料，大家可以注意到，同樣的一份資料在hbase中分別儲存在了4個不同的地方，資料就存在不一致的可能。那我們就從Hbase自帶的hbck角度來看看什麼樣的情況會被hbase認為是元資料異常，又是如何去做修復的？

這裡只分析核心檢查的部分，其餘檢查準備階段略過

// do the real work of hbck
    connect();

    try {
      // if corrupt file mode is on, first fix them since they may be opened later
      if (checkCorruptHFiles || sidelineCorruptHFiles) {
        LOG.info("Checking all hfiles for corruption");
        HFileCorruptionChecker hfcc = createHFileCorruptionChecker(sidelineCorruptHFiles);
        setHFileCorruptionChecker(hfcc); // so we can get result
        Collection<TableName> tables = getIncludedTables();
        Collection<Path> tableDirs = new ArrayList<Path>();
        Path rootdir = FSUtils.getRootDir(getConf());
        if (tables.size() > 0) {
          for (TableName t : tables) {
            tableDirs.add(FSUtils.getTableDir(rootdir, t));
          }
        } else {
          tableDirs = FSUtils.getTableDirs(FSUtils.getCurrentFileSystem(getConf()), rootdir);
        }
        hfcc.checkTables(tableDirs);
        hfcc.report(errors);
      }

      //到這一步先檢查HFile的資料格式是否正確，作為第一步做的檢查


      // check and fix table integrity, region consistency.
      int code = onlineHbck();

      //這裡呼叫了onlineHbck做線上檢查使用

      setRetCode(code);
      // If we have changed the HBase state it is better to run hbck again
      // to see if we haven't broken something else in the process.
      // We run it only once more because otherwise we can easily fall into
      // an infinite loop.
      if (shouldRerun()) {
        try {
          LOG.info("Sleeping " + sleepBeforeRerun + "ms before re-checking after fix...");
          Thread.sleep(sleepBeforeRerun);
        } catch (InterruptedException ie) {
          return this;
        }
        // Just report
        setFixAssignments(false);
        setFixMeta(false);
        setFixHdfsHoles(false);
        setFixHdfsOverlaps(false);
        setFixVersionFile(false);
        setFixTableOrphans(false);
        errors.resetErrors();
        code = onlineHbck();
        setRetCode(code);
      }
    } finally {
      IOUtils.cleanup(null, connection, meta, admin);
    }
    return this;

---------------------------------------------------------------------------------------------------------

/**
   * Contacts the master and prints out cluster-wide information
   * @return 0 on success, non-zero on failure
   */
  public int onlineHbck() throws IOException, KeeperException, InterruptedException, ServiceException {
    // print hbase server version
    errors.print("Version: " + status.getHBaseVersion());
    offlineHdfsIntegrityRepair();

    //這裡是對HBase表在hdfs路徑上的儲存路徑進行檢查，是否符合標準

    // turn the balancer off
    boolean oldBalancer = admin.setBalancerRunning(false, true);
    try {
      onlineConsistencyRepair();
    }
    finally {
      admin.setBalancerRunning(oldBalancer, false);
    }

    if (checkRegionBoundaries) {
      checkRegionBoundaries();
    }

    offlineReferenceFileRepair();

    checkAndFixTableLocks();

    // Check (and fix if requested) orphaned table ZNodes
    checkAndFixOrphanedTableZNodes();

    // Remove the hbck lock
    unlockHbck();

    // Print table summary
    printTableSummary(tablesInfo);
    return errors.summarize();
  }

--------------------------------------checkAndFixConsistency();-------------------------

private void checkRegionConsistencyConcurrently(
final List<CheckRegionConsistencyWorkItem> workItems)
throws IOException, KeeperException, InterruptedException {
if (workItems.isEmpty()) {
return; // nothing to check
}
//workItems是具體去做修復的任務
List<Future<Void>> workFutures = executor.invokeAll(workItems);
for(Future<Void> f: workFutures) {
try {
f.get();
} catch(ExecutionException e1) {
LOG.warn("Could not check region consistency " , e1.getCause());
if (e1.getCause() instanceof IOException) {
throw (IOException)e1.getCause();
} else if (e1.getCause() instanceof KeeperException) {
throw (KeeperException)e1.getCause();
} else if (e1.getCause() instanceof InterruptedException) {
throw (InterruptedException)e1.getCause();
} else {
throw new IOException(e1.getCause());
}
}
}
}

六、思考
目前看來hbase在處理元資料時資訊並不是集中儲存，對於一些操作失敗時會產生資料不一致的情況。提供了HBCK的方式進行修復，不過對於新的region group沒有做檢查以及修復元資料，待後續改進。
另外，這種資料分散的方式對hbase的一致性也還是造成挑戰。

Hbase表級別元資料一致性和hbck原理

Hbase表級別元資料一致性和hbck原理

Hbase表兩種資料備份方法-匯入和匯出示例[未測試]

基於日誌的同步資料一致性和實時抽取SyncNavigator

hbase系列-Hbase熱點問題、資料傾斜和rowkey的雜湊設計

Oracle 10g通過建立物化檢視實現不同資料庫間表級別的資料同步

分散式系統的資料一致性和事務處理

基於日誌的同步資料一致性和實時抽取

Kubernetes併發控制與資料一致性的實現原理

hibernate中元資料ClassMetadata用法及原理

Spring JDBC學習筆記（3）：使用JdbcTemplate來獲取資料庫表和列的元資料

Hive中元資料表的關係和如何在元資料中刪除表

hbase總結：插入資料的表名不存在或和建立的表名不一致

Hbase 表設計和高級屬性

資料庫分庫分表(sharding)系列(五) 一種支援自由規劃無須資料遷移和修改路由程式碼的Sharding擴容方案（轉）...

學好資料結構和演算法 —— 線性表

NC6.5 做主子表單據釋出元資料錯誤: ORA-00001: 違反唯一約束條件 (NC65.PK_MD_BIZITFMAP)

（三）Hive元資料資訊對應MySQL資料庫表

05： iSCSI技術應用資料庫服務基礎管理表資料總結和答疑

資料庫分庫分表 sharding 系列五一種支援自由規劃無須資料遷移和修改路由程式碼的Sharding擴容方案

HBase shell 命令建立表及新增資料操作

Hbase表級別元資料一致性和hbck原理

相關推薦