Apache HBase 全攻略

阿新 • • 發佈：2019-01-24

基礎概念

Coprocessor

　Coprocessor 其實是一個類似 MapReduce 的分析元件，不過它極大簡化了 MapReduce 模型。將請求獨立地在各個 Region 中並行地執行，並提供了一套框架讓使用者靈活地自定義 Coprocessor

程式設計技巧

充分利用好 CellUtil

// 直接使用 byte[] 進行匹配，效率會更高
// Bad: cf.equals(Bytes.toString(CellUtil.cloneFamily(cell)))
CellUtil.matchingFamily(cell, cf) && CellUtil.matchingQualifier(cell, col)
// 同理，應儘量使用 `Bytes.equals`，來替代 `String#equals`

發揮好協處理的平行計算能力

// 某些很難使得表資料分佈均勻的場景下，可以設定好預分割槽 [00, 01, 02, ..., 99]，並關閉自動分割槽（詳見：常見命令-分割槽），則可保證每個 Region 上的只有單個 xx 字首。這樣，導表資料的時候，輪詢地在 rowkey 前加上 xx 字首，則可保證無熱點 Region
// 在協處理器的程式中，則可先獲取到 xx 字首，並在構建 Scan 的時候，將字首加在 startKey/endKey 前面即可
static String getStartKeyPrefix(HRegion region) {
    if (region == null 
) throw new RuntimeException("Region is null!");
    byte[] startKey = region.getStartKey();
    if (startKey == null || startKey.length == 0) return "00";
    String startKeyStr = Bytes.toString(startKey);
    return isEmpty(startKeyStr) ? "00" : startKeyStr.substring(0, 2);
}
private static boolean isEmpty 
(final String s) {
    return s == null || s.length() == 0;
}

處理好協處理器程式裡的異常

　如果在協處理器裡面有異常被丟擲，並且 hbase.coprocessor.abortonerror 引數沒有開啟，那麼，該協處理器會直接從被載入的環境中被刪除掉。否則，則需要看異常型別，如果是 IOException 型別，則會直接被丟擲；如果是 DoNotRetryIOException 型別，則不做重試，丟擲異常。否則，預設將會嘗試 10 次（硬編碼在 AsyncConnectionImpl#RETRY_TIMER 中了）。因此需要依據自己的業務場景，對異常做好妥善的處理

日誌列印

// 只能使用 Apache Commons 的 Log 類，否則將無法列印
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

private static final Log log = LogFactory.getLog(CoprocessorImpl.class.getName());

部署

# 先上傳 coprocessor 處理器 jar 包
$ hadoop fs -copyFromLocal /home/hbase/script/coprocessor-0.0.1.jar hdfs://yuzhouwan/hbase/coprocessor/
$ hadoop fs -ls hdfs://yuzhouwan/hbase/coprocessor/

# 解除安裝舊的 coprocessor
$ alter 'yuzhouwan', METHOD => 'table_att_unset', NAME =>'coprocessor$1'
# 指定新的 coprocessor
$ alter 'yuzhouwan', METHOD => 'table_att', 'coprocessor' => 'hdfs://yuzhouwan/hbase/coprocessor/coprocessor-0.0.1.jar|com.yuzhouwan.hbase.coprocessor.Aggregation|111|'

# 通過檢視 RegionServer 的日誌，可觀察協處理器的執行狀況

常用命令

叢集相關

$ su - hbase
$ start-hbase.sh

# HMaster    ThriftServer
$ jps | grep -v Jps
  32538 ThriftServer
  9383 HMaster
  8423 HRegionServer

# BackUp HMaster    ThriftServer
$ jps | grep -v Jps
  24450 jar
  21882 HMaster
  2296 HRegionServer
  14598 ThriftServer
  5998 Jstat

# BackUp HMaster    ThriftServer
$ jps | grep -v Jps
  31119 Bootstrap
  8775 HMaster
  25289 Bootstrap
  14823 Bootstrap
  12671 Jstat
  9052 ThriftServer
  26921 HRegionServer

# HRegionServer
$ jps | grep -v Jps
  29356 hbase-monitor-process-0.0.3-jar-with-dependencies.jar    # monitor
  11023 Jstat
  26135 HRegionServer


$ export -p | egrep -i "(hadoop|hbase)"
  declare -x HADOOP_HOME="/home/bigdata/software/hadoop"
  declare -x HBASE_HOME="/home/bigdata/software/hbase"
  declare -x PATH="/usr/local/anaconda/bin:/usr/local/R-3.2.1/bin:/home/bigdata/software/java/bin:/home/bigdata/software/hadoop/bin:/home/bigdata/software/hive/bin:/home/bigdata/software/sqoop/bin:/home/bigdata/software/hbase/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin"


$ java -XX:+PrintFlagsFinal -version | grep MaxHeapSize
    uintx MaxHeapSize                              := 32126271488     {product}           # 29.919921875 GB
  java version "1.7.0_60-ea"
  Java(TM) SE Runtime Environment (build 1.7.0_60-ea-b15)
  Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)


$ top
  top - 11:37:03 up 545 days, 18:45,  5 users,  load average: 8.74, 10.39, 10.96
  Tasks: 653 total,   1 running, 652 sleeping,   0 stopped,   0 zombie
  Cpu(s): 32.9%us,  0.7%sy,  0.0%ni, 66.3%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
  Mem:  264484056k total, 260853032k used,  3631024k free,  2235248k buffers
  Swap: 10485756k total, 10485756k used,        0k free, 94307776k cached
  # Memory: 252 GB


# `hbase classpath` 可以拿到 HBase 相關的所有依賴
$ java -classpath ~/opt/hbase/soft/yuzhouwan.jar:`hbase classpath` com.yuzhouwan.hbase.MainApp


# Usage
Usage: hbase [<options>] <command> [<args>]
Options:
  --config DIR    Configuration direction to use. Default: ./conf
  --hosts HOSTS   Override the list in 'regionservers' file

Commands:
Some commands take arguments. Pass no args or -h for usage.
  shell           Run the HBase shell
  hbck            Run the hbase 'fsck' tool
  hlog            Write-ahead-log analyzer
  hfile           Store file analyzer
  zkcli           Run the ZooKeeper shell
  upgrade         Upgrade hbase
  master          Run an HBase HMaster node
  regionserver    Run an HBase HRegionServer node
  zookeeper       Run a Zookeeper server
  rest            Run an HBase REST server
  thrift          Run the HBase Thrift server
  thrift2         Run the HBase Thrift2 server
  clean           Run the HBase clean up script
  classpath       Dump hbase CLASSPATH
  mapredcp        Dump CLASSPATH entries required by mapreduce
  pe              Run PerformanceEvaluation
  ltt             Run LoadTestTool
  version         Print the version
  CLASSNAME       Run the class named CLASSNAME


# hbase版本資訊
$ hbase version
  2017-01-13 11:05:07,580 INFO  [main] util.VersionInfo: HBase 0.98.8-hadoop2
  2017-01-13 11:05:07,580 INFO  [main] util.VersionInfo: Subversion file:///e/hbase_compile/hbase-0.98.8 -r Unknown
  2017-01-13 11:05:07,581 INFO  [main] util.VersionInfo: Compiled by 14074019 on Mon Dec 26 20:17:32     2016


$ hadoop fs -ls /hbase
  drwxr-xr-x   - hbase hbase          0 2017-03-01 00:05 /hbase/.hbase-snapshot
  drwxr-xr-x   - hbase hbase          0 2016-10-26 16:42 /hbase/.hbck
  drwxr-xr-x   - hbase hbase          0 2016-12-19 13:02 /hbase/.tmp
  drwxr-xr-x   - hbase hbase          0 2017-01-22 20:18 /hbase/WALs
  drwxr-xr-x   - hbase hbase          0 2015-09-18 09:34 /hbase/archive
  drwxr-xr-x   - hbase hbase          0 2016-10-18 09:44 /hbase/coprocessor
  drwxr-xr-x   - hbase hbase          0 2015-09-15 17:21 /hbase/corrupt
  drwxr-xr-x   - hbase hbase          0 2017-02-20 14:34 /hbase/data
  -rw-r--r--   2 hbase hbase         42 2015-09-14 12:10 /hbase/hbase.id
  -rw-r--r--   2 hbase hbase          7 2015-09-14 12:10 /hbase/hbase.version
  drwxr-xr-x   - hbase hbase          0 2016-06-28 12:14 /hbase/inputdir
  drwxr-xr-x   - hbase hbase          0 2017-03-01 10:40 /hbase/oldWALs
  -rw-r--r--   2 hbase hbase     345610 2015-12-08 16:54 /hbase/test_bulkload.txt


$ hadoop fs -ls /hbase/WALs
  drwxr-xr-x   - hbase hbase          0 2016-12-27 16:08 /hbase/WALs/yuzhouwan03,60020,1482741120018-splitting
  drwxr-xr-x   - hbase hbase          0 2017-03-01 10:36 /hbase/WALs/yuzhouwan03,60020,1483442645857
  drwxr-xr-x   - hbase hbase          0 2017-03-01 10:37 /hbase/WALs/yuzhouwan02,60020,1483491016710
  drwxr-xr-x   - hbase hbase          0 2017-03-01 10:37 /hbase/WALs/yuzhouwan01,60020,1483443835926
  drwxr-xr-x   - hbase hbase          0 2017-03-01 10:36 /hbase/WALs/yuzhouwan03,60020,1483444682422
  drwxr-xr-x   - hbase hbase          0 2017-03-01 10:16 /hbase/WALs/yuzhouwan04,60020,1485087488577
  drwxr-xr-x   - hbase hbase          0 2017-03-01 10:37 /hbase/WALs/yuzhouwan05,60020,1484790306754
  drwxr-xr-x   - hbase hbase          0 2017-03-01 10:37 /hbase/WALs/yuzhouwan06,60020,1484931966988


$ hadoop fs -ls /hbase/WALs/yuzhouwan01,60020,1483443835926

  -rw-r--r--   3 hbase hbase  127540109 2017-03-01 09:49 /hbase/WALs/yuzhouwan01,60020,1483443835926/yuzhouwan01%2C60020%2C1483443835926.1488330961720
  # ...
  -rw-r--r--   3 hbase hbase         83 2017-03-01 10:37 /hbase/WALs/yuzhouwan01,60020,1483443835926/yuzhouwan01%2C60020%2C1483443835926.1488335822133


# log
$ vim /home/hbase/logs/hbase-hbase-regionserver-yuzhouwan03.log


# HBase 批處理
$ echo "<command>" | hbase shell
$ hbase shell ../script/batch.hbase


# HBase 命令列
$ hbase shell


$ status
  1 servers, 0 dead, 41.0000 average load


$ zk_dump
  HBase is rooted at /hbase
  Active master address: yuzhouwan03,60000,1481009498847
  Backup master addresses:
   yuzhouwan02,60000,1481009591957
   yuzhouwan01,60000,1481009567346
  Region server holding hbase:meta: yuzhouwan03,60020,1483442645857
  Region servers:
   yuzhouwan02,60020,1483491016710
   # ...
  /hbase/replication: 
  /hbase/replication/peers: 
  /hbase/replication/peers/1: yuzhouwan03,yuzhouwan02,yuzhouwan01:2016:/hbase
  /hbase/replication/peers/1/peer-state: ENABLED
  /hbase/replication/rs: 
  /hbase/replication/rs/yuzhouwan03,60020,1483442645857: 
  /hbase/replication/rs/yuzhouwan03,60020,1483442645857/1: 
  /hbase/replication/rs/yuzhouwan03,60020,1483442645857/1/yuzhouwan03%2C60020%2C1483442645857.1488334114131: 116838271
  /hbase/replication/rs/1485152902048.SyncUpTool.replication.org,1234,1: 
  /hbase/replication/rs/yuzhouwan06,60020,1484931966988: 
  /hbase/replication/rs/yuzhouwan06,60020,1484931966988/1: 
  # ...
  Quorum Server Statistics:
   yuzhouwan02:2015
    Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
    Clients:
     /yuzhouwan:62003[1](queued=0,recved=625845,sent=625845)
     # ...
     /yuzhouwan:11151[1](queued=0,recved=8828,sent=8828)  
    Latency min/avg/max: 0/0/1
    Received: 161
    Sent: 162
    Connections: 168
    Outstanding: 0
    Zxid: 0xc062e91c6
    Mode: follower
    Node count: 25428
   yuzhouwan03:2015
    Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
    Clients:
     /yuzhouwan:39582[1](queued=0,recved=399812,sent=399812)
     # ...
     /yuzhouwan:58770[1](queued=0,recved=3234,sent=3234)

$ stop-hbase.sh

增刪查改

$ list
  TABLE
  mytable
  yuzhouwan
  # ...
  20 row(s) in 1.4080 seconds


$ create 'yuzhouwan', {NAME => 'info', VERSIONS => 3}, {NAME => 'data', VERSIONS => 1}
  0 row(s) in 0.2650 seconds
  => Hbase::Table - yuzhouwan


$ put 'yuzhouwan', 'rk0001', 'info:name', 'Benedict Jin'
$ put 'yuzhouwan', 'rk0001', 'info:gender', 'Man'
$ put 'yuzhouwan', 'rk0001', 'data:pic', '[picture]'


$ get 'yuzhouwan', 'rk0001', {FILTER => "ValueFilter(=, 'binary:[picture]')"}
  COLUMN                                              CELL
  data:pic                                           timestamp=1479092170498, value=[picture]
  1 row(s) in 0.0200 seconds


$ get 'yuzhouwan', 'rk0001', {FILTER => "QualifierFilter(=, 'substring:a')"}
  COLUMN                                              CELL
  info:name                                          timestamp=1479092160236, value=Benedict Jin
  1 row(s) in 0.0050 seconds


$ scan 'yuzhouwan', {FILTER => "QualifierFilter(=, 'substring:a')"}
  ROW                                                 COLUMN+CELL
  rk0001                                             column=info:name, timestamp=1479092160236, value=Benedict Jin
  1 row(s) in 0.0140 seconds


# 按照 timestamp 進行查詢
$ scan 'yuzhouwan', { TIMERANGE => [0, 1416083300000] }


# [rk0001, rk0003)
$ put 'yuzhouwan', 'rk0003', 'info:name', 'asdf2014'
$ scan 'yuzhouwan', {COLUMNS => 'info', STARTROW => 'rk0001', ENDROW => 'rk0003'}


# row key start with 'rk'
$ put 'yuzhouwan', 'aha_rk0003', 'info:name', 'Jin'
$ scan 'yuzhouwan', {FILTER => "PrefixFilter('rk')"}
  ROW                                                 COLUMN+CELL
  rk0001                                             column=data:pic, timestamp=1479092170498, value=[picture]
  rk0001                                             column=info:gender, timestamp=1479092166019, value=Man
  rk0001                                             column=info:name, timestamp=1479092160236, value=Benedict Jin
  rk0003                                             column=info:name, timestamp=1479092728688, value=asdf2014
  2 row(s) in 0.0150 seconds


$ delete 'yuzhouwan', 'rk0001', 'info:gender'
$ get 'yuzhouwan', 'rk0001'
  COLUMN                                              CELL
  data:pic                                           timestamp=1479092170498, value=[picture]
  info:name                                          timestamp=1479092160236, value=Benedict Jin
  2 row(s) in 0.0100 seconds


$ disable 'yuzhouwan'
$ drop 'yuzhouwan'

行列修改

# 修改表
$ disable 'yuzhouwan'
# 新增列
$ alter 'yuzhouwan', NAME => 'f1'
$ alter 'yuzhouwan', NAME => 'f2'
  Updating all regions with the new schema...
  1/1 regions updated.
  Done.
  0 row(s) in 1.3020 seconds


# 修改 CQ
$ create 'yuzhouwan', {NAME => 'info'}
$ put 'yuzhouwan', 'rk00001', 'info:name', 'China'

$ get 'yuzhouwan', 'rk00001', {COLUMN => 'info:name'}, 'value'
$ put 'yuzhouwan', 'rk00001', 'info:address', 'value'

$ scan 'yuzhouwan'
  ROW                                                 COLUMN+CELL
   rk00001                                            column=info:address, timestamp=1480556328381, value=value
  1 row(s) in 0.0220 seconds


# 刪除列
$ alter 'yuzhouwan', {NAME => 'f3'}, {NAME => 'f4'}
$ alter 'yuzhouwan', {NAME => 'f5'}, {NAME => 'f1', METHOD => 'delete'}, {NAME => 'f2', METHOD => 'delete'}, {NAME => 'f3', METHOD => 'delete'}, {NAME => 'f4', METHOD => 'delete'}

# 無法細到 CQ 級別，alter 'ns_rec:tb_mem_tag', {NAME => 'cf_tag:partyIdType', METHOD => 'delete'}

# 刪除行
$ deteleall <table>,  <rowkey>

清空表資料

# 清空表資料
$ describe 'yuzhouwan'
  Table yuzhouwan is ENABLED
  COLUMN FAMILIES DESCRIPTION
  {NAME => 'data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
  {NAME => 'f5', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE'
  , BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
  {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
  3 row(s) in 0.0230 seconds

# 0.98 版本引入的命令，可以清空表資料的同時，保留 region 分割槽
$ truncate_preserve 'yuzhouwan'

# truncate 會進行 drop table 和 create table 的操作
$ truncate 'yuzhouwan'
$ scan 'yuzhouwan'
  ROW                                                 COLUMN+CELL
  0 row(s) in 0.3170 seconds

改表名

# 注意 snapshot 的名字 不可帶 ':' 之類的字元，也就是說，不需要特意去區分 namespace
$ disable 'yuzhouwan'
$ snapshot 'yuzhouwan', 'yuzhouwan_snapshot'
$ clone_snapshot 'yuzhouwan_snapshot', 'ns_site:yuzhouwan'
$ delete_snapshot 'yuzhouwan_snapshot'
$ drop 'yuzhouwan'
$ grant 'site', 'CXWR', 'ns_site:yuzhouwan'

$ user_permission 'yuzhouwan'
  User                                                Table,Family,Qualifier:Permission
   site                                               default,yuzhouwan,,: [Permission: actions=CREATE,EXEC,WRITE,READ]
   hbase                                              default,yuzhouwan,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN]

$ disable 'ns_site:yuzhouwan'
$ drop 'ns_site:yuzhouwan'

$ exists 'ns_site:yuzhouwan'
  Table ns_site:yuzhouwan does not exist
  0 row(s) in 0.0200 seconds

改表屬性

$ disable 'yuzhouwan'

# versions
$ alter 'yuzhouwan', NAME => 'f', VERSIONS => 5

# ttl（注意，超時屬性是針對 CF 的，而不是 Table 級別的，且單位是 秒）
$ alter 'yuzhouwan', NAME => 'f', TTL => 20

$ enable 'yuzhouwan'
$ describe 'yuzhouwan'

壓縮演算法

# 壓縮演算法為 'SNAPPY' 報錯，ERROR: java.io.IOException: Compression algorithm 'snappy' previously failed test.
# 嘗試 LZ4（低壓縮比，高速，在 Spark 2.x 中已作為預設壓縮演算法）
$ create 'yuzhouwan', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}, {NAME => 'v', COMPRESSION => 'LZ4', BLOOMFILTER => 'NONE', DATA_BLOCK_ENCODING => 'FAST_DIFF'}

$ describe 'yuzhouwan'
  Table yuzhouwan is ENABLED
  COLUMN FAMILIES DESCRIPTION
  {NAME => 'v', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'LZ4', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
  1 row(s) in 0.0280 seconds

許可權控制

# ACL
# R - read
# W - write
# X - execute
# C - create
# A - admin
$ grant 'benedict', 'WRXC', 'yuzhouwan'
$ echo "scan 'hbase:acl'" | hbase shell > acl.txt
  yuzhouwan column=l:benedict, timestamp=1496216745249, value=WRXC
  yuzhouwan column=l:hbase, timestamp=1496216737326, value=RWXCA

$ user_permission                  # 如果不接 <table_name>，將從 'hbase:acl' 表中獲取全部
$ user_permission 'yuzhouwan'
    User                                 Table,Family,Qualifier:Permission
   hbase                               default,yuzhouwan,,: [Permission: actions=READ,WRITE,EXEC,CREATE,ADMIN]
   benedict                            default,yuzhouwan,,: [Permission: actions=WRITE,READ,EXEC,CREATE]
  2 row(s) in 0.0510 seconds

$ revoke 'benedict', 'yuzhouwan'

分割槽

# splits
$ create 'yuzhouwan', {NAME => 'f'}, SPLITS => ['1', '2', '3']                      # 5 regions
$ alter 'yuzhouwan', SPLITS => ['1', '2', '3', '4', '5', '6', '7', '8', '9']        # not work

# 關閉自動分割槽
$ alter 'yuzhouwan', {METHOD => 'table_att', SPLIT_POLICY => 'org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy'}

# 配置 Master 是否去平衡各個 RegionServer 的 Region 數量
# 維護或者重啟一個 RegionServer 時，會關閉 balancer，會導致 Region 在 RegionServer 上的分佈不均，這個時候需要手工的開啟 balance
$ balance_switch true
$ balance_switch false

名稱空間

# namespace
$ list_namespace_tables 'hbase'
  TABLE
  acl
  meta
  namespace
  3 row(s) in 0.0050 seconds

$ list_namespace
  NAMESPACE
  default
  hbase
  # ...
  50 row(s) in 0.3710 seconds

$ create_namespace 'www'
$ exists 'www:yuzhouwan.site'
$ create 'www:yuzhouwan.site', {NAME => 'info', VERSIONS=> 9}, SPLITS => ['1','2','3','4','5','6','7','8','9']
$ alter_namespace 'www', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}

$ drop_namespace 'www'

手動 Split

$ create 'yuzhouwan', {NAME => 'info', VERSIONS => 3}, {NAME => 'data', VERSIONS => 1}
$ put 'yuzhouwan', 'rk0001', 'info:name', 'Benedict Jin'
$ put 'yuzhouwan', 'rk0001', 'info:gender', 'Man'
$ put 'yuzhouwan', 'rk0001', 'data:pic', '[picture]'
$ put 'yuzhouwan', 'rk0002', 'info:name', 'Yuzhouwan'

# Usage:
#   split 'tableName'
#   split 'namespace:tableName'
#   split 'regionName' # format: 'tableName,startKey,id'
#   split 'tableName', 'splitKey'
#   split 'regionName', 'splitKey'
$ split 'yuzhouwan', 'rk0002'
  # Name                                                            Region Server        Start Key   End Key    Locality    Requests
  yuzhouwan,,1500964657548.bd21cdf7ae9e2d8e5b2ed3730eb8b738.        yuzhouwan01:60020                rk0002     1.0         0
  yuzhouwan,rk0002,1500964657548.76f95590aed5d39291a087c5e8e83833.  yuzhouwan02:60020    rk0002                 1.0         2

Phoenix 命令

# 執行外部 SQL 指令碼
$ sqlline.py <hbase.zookeeper.quorum host without port>:/phoenix sql.txt

實戰技巧

Hive 資料匯入（Bulkload）

　Bulkload 就是依據 Hive 表的 schema 解析 RCFile，然後通過 MapReduce 程式生成 HBase 的 HFile 檔案，最後直接利用 bulkload 機制將 HFile 檔案匯入到 HBase 中。也就是直接存放到 HDFS 中。這樣會比呼叫 Api 一條條的匯入，效率會高很多（一般的，Hive 資料入庫 HBase，都會使用 bulkload 的方式）

叢集間複製（CopyTable + Replication）

Commend	Comment
add_peer	新增一條複製連線，ID 是連線的識別符號，CLUSTER_KEY 的格式是 HBase.zookeeper.quorum: HBase.zookeeper.property.clientPort: zookeeper.znode.parent
list_peers	檢視所有的複製連線
enable_peer	設定某條複製連線為可用狀態，add_peer 一條連線預設就是 `enable` 的，通過 disable_peer 命令讓該連線變為不可用的時候，可以通過 enable_peer 讓連線變成可用
disable_peer	設定某條複製連線為不可用狀態
remove_peer	刪除某條複製連線
set_peer_tableCFs	設定某條複製連線可以複製的表資訊
預設 add_peer 新增的複製連線是可以複製叢集所有的表。如果，只想複製某些表的話，就可以用 set_peer_tableCFs，複製連線的粒度可以到表的列族。表之間通過 ‘;’ 分號隔開，列族之間通過 ‘,’ 逗號隔開。e.g. set_peer_tableCFs ‘2’, “table1; table2:cf1,cf2; table3:cfA,cfB”。使用 ‘set_peer_tableCFs’ 命令，可以設定複製連線所有的表
append_peer_tableCFs	可以為複製連線新增需要複製的表
remove_peer_tableCFs	為複製連線刪除不需要複製的表
show_peer_tableCFs	檢視某條複製連線複製的表資訊，查出的資訊為空時，表示複製所有的表
list_replicated_tables	列出所有複製的表

監控 Replication

HBase Shell

$ status 'replication'

Metrics

源端

Metrics Name	Comment
sizeOfLogQueue	還有多少 WAL 檔案沒處理
ageOfLastShippedOp	上一次複製延遲時間
shippedBatches	傳輸了多少批資料
shippedKBs	傳輸了多少 KB 的資料
shippedOps	傳輸了多少條資料
logEditsRead	讀取了多少個 logEdits
logReadInBytes	讀取了多少 KB 資料
logEditsFiltered	實際過濾了多少 logEdits

目的端

Metrics Name	Comment
sink.ageOfLastAppliedOp	上次處理的延遲
sink.appliedBatches	處理的批次數
sink.appliedOps	處理的資料條數

完整步驟

CopyTable

# 明確遷移時間
2017-01-01 00:00:00(1483200000000)          2017-05-01 00:00:00(1493568000000)
# 這裡需要轉換時間格式為 13 位的 毫秒級 unix timestamp
# 線上轉換工具 http://tool.chinaz.com/Tools/unixtime.aspx
# 或者用 shell
$ echo "`date -d "2017-01-01 00:00:00" +%s`000"
$ echo "`date -d "2017-05-01 00:00:00" +%s`000"
# 這裡不用擔心出現 邊界問題 [starttime, endtime)
# 源叢集執行（不限制 starttime，可以增加引數 --starttime=0）
$ hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1483200000000 --endtime=1493568000000 --peer.adr=<aim zk address>,<aim zk address>,...:<aim zk port>:/<hbase parent path> <table name>
# 檢查資料一致性（在兩個叢集分別執行，比較 RowCount 是否一致）
$ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <table name> --endtime=1493568000000
# 進一步檢查資料一致性（在兩個叢集分別執行，比較 位元組數 是否一致）
$ hadoop fs -du hdfs://<base path>/hbase/data/<namespace>/<table name>

Replication

# 小叢集上執行
# 預先進行 list_peers，避免 peer id 衝突
$ list_peers
$ add_peer '<peer id>', "<big cluster zk address>,<big cluster zk address>,...:<big cluster zk port>:/<hbase parent path>"

# 開啟表的 REPLICATION_SCOPE
$ disable '<table name>'
# 1: open; 0: close（default）
$ alter '<table name>', {NAME => '<column family>', REPLICATION_SCOPE => '1'}
$ enable '<table name>'

Trouble shooting

# 源叢集執行
$ hbase hbck
# 出現問題後 hbase hbck --repair
# 沒有問題後 `hbase shell` 中執行
$ balance_switch true

關閉自動分割槽

$ alter 'yuzhouwan', {METHOD => 'table_att', SPLIT_POLICY => 'org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy'}

JMX 獲取部分指標項

# 語法
http://namenode:50070/jmx?qry=<指標項>

# 例如，只返回 NameNodeInfo 指標項
http://namenode:50070/jmx?qry=hadoop:service=NameNode,name=NameNodeInfo

架構

總圖

踩過的坑

Table is neither in disabled nor in enabled state

描述

　執行完正常的建表語句之後，一直卡在 enable table 這步上

解決

# 檢查發現 table 既不處於 `enable` 狀態，也不處於 `disable` 狀態
$ is_enabled 'yuzhouwan'
  false
$ is_disabled 'yuzhouwan'
  false
$ hbase zkcli
$ delete /hbase/table/yuzhouwan
$ hbase hbck -fixMeta -fixAssignments
# 重啟 active HMaster
$ is_enabled 'yuzhouwan'
  true
$ disable 'yuzhouwan'

解決

# 從 JDK6 開始 預設開啟偏向鎖
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=0
# 但是不適合高併發的場景，Cassandra 中已預設關閉（https://github.com/apache/cassandra/blob/trunk/conf/jvm.options#L116）
-XX:-UseBiasedLocking

十六進位制無法在命令列被識別

解決

# 只需要用雙引號包起來就可以了
$ put 'yuzhouwan', 'rowkey01', 'cf:age', "\xFF"  #255

Apache HBase 全攻略

基礎概念

Coprocessor

程式設計技巧

充分利用好 CellUtil

發揮好協處理的平行計算能力

處理好協處理器程式裡的異常

日誌列印

部署

常用命令

叢集相關

增刪查改

行列修改

清空表資料

改表名

改表屬性

許可權控制

分割槽

名稱空間

手動 Split

Phoenix 命令

實戰技巧

Hive 資料匯入（Bulkload）

叢集間複製（CopyTable + Replication）

相關命令

監控 Replication

HBase Shell

Metrics

完整步驟

CopyTable

Replication

Trouble shooting

關閉自動分割槽

JMX 獲取部分指標項

架構

總圖

踩過的坑

Table is neither in disabled nor in enabled state

描述

解決

十六進位制無法在命令列被識別

解決

效能優化

社群跟進

資料

Doc

Blog

Put

Read

Replication

BulkLoad

Flush

Code Resource

更多資源，歡迎加入，一起交流學習

相關推薦