1. 程式人生 > >HBase備份之匯入匯出

HBase備份之匯入匯出

HBase複製只對設定好複製以後的資料生效,也即,配置好複製之後插入HBase主叢集的資料才能同步複製到HBase從叢集中,而對之前的歷史資料,採用HBase複製這種辦法是無能為力的。本文介紹如何使用HBase的匯入匯出功能來實現歷史資料的備份。

1)將HBase表資料匯出到hdfs的一個指定目錄中,具體命令如下:

$ cd $HBASE_HOME/
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export test_table /data/test_table

其中,$HBASE_HOME為HBase主目錄,test_table為要匯出的表名,/data/test_table為hdfs中的目錄地址。

執行結果太長,這裡擷取最後一部分,如下所示:

2014-08-11 16:49:44,484 INFO  [main] mapreduce.Job: Running job: job_1407491918245_0021
2014-08-11 16:49:51,658 INFO  [main] mapreduce.Job: Job job_1407491918245_0021 running in uber mode : false
2014-08-11 16:49:51,659 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2014-08-11 16:49:57,706 INFO  [main] mapreduce.Job:  map
100% reduce 0% 2014-08-11 16:49:57,715 INFO [main] mapreduce.Job: Job job_1407491918245_0021 completed successfully 2014-08-11 16:49:57,789 INFO [main] mapreduce.Job: Counters: 37 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=118223 FILE: Number of read operations=0 FILE: Number of large read
operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=84 HDFS: Number of bytes written=243 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Rack-local map tasks=1 Total time spent by all maps in occupied slots (ms)=9152 Total time spent by all reduces in occupied slots (ms)=0 Map-Reduce Framework Map input records=3 Map output records=3 Input split bytes=84 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=201 CPU time spent (ms)=5210 Physical memory (bytes) snapshot=377470976 Virtual memory (bytes) snapshot=1863364608 Total committed heap usage (bytes)=1029177344 HBase Counters BYTES_IN_REMOTE_RESULTS=87 BYTES_IN_RESULTS=87 MILLIS_BETWEEN_NEXTS=444 NOT_SERVING_REGION_EXCEPTION=0 NUM_SCANNER_RESTARTS=0 REGIONS_SCANNED=1 REMOTE_RPC_CALLS=3 REMOTE_RPC_RETRIES=0 RPC_CALLS=3 RPC_RETRIES=0 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=243

檢視以下指定的匯出目錄,命令如下:

$ cd $HADOOP_HOME/
$ bin/hadoop fs -ls /data/test_table

其中$HADOOP_HOME為hadoop的主目錄。結果如下:

Found 2 items
-rw-r--r--   3 hbase supergroup          0 2014-08-11 16:49 /data/test_table/_SUCCESS
-rw-r--r--   3 hbase supergroup        243 2014-08-11 16:49 /data/test_table/part-m-00000

執行以下hbase shell命令,檢視以下test_table表中的資料:

$ cd $HBASE_HOME/
$ bin/hbase shell
2014-08-11 17:05:52,589 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014

hbase(main):001:0> describe 'test_table'
DESCRIPTION                                                                                                                               ENABLED                                                                    
 'test_table', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '1', COMPRESSION => 'NONE', VERSIONS => true                                                                      
  '1', TTL => '2147483647', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE =>                                                                            
  'true'}                                                                                                                                                                                                            
1 row(s) in 1.3400 seconds

hbase(main):002:0> scan 'test_table'
ROW                                                    COLUMN+CELL                                                                                                                                                   
 r1                                                    column=cf:q1, timestamp=1406788229440, value=va1                                                                                                              
 r2                                                    column=cf:q1, timestamp=1406788265646, value=va2                                                                                                              
 r3                                                    column=cf:q1, timestamp=1406788474301, value=va3                                                                                                              
3 row(s) in 0.0560 seconds

至此,HBase表資料匯出結束。接下來開始匯入工作。

2)將匯出到hdfs中的資料匯入到hbase建立好的表中。注意,該表可以和之前的表不同名,但模式一定要相同。我們領取一個名字,使用test_copy這個表名。建立表的命令如下:

$ cd $HBASE_HOME/
$ bin/hbase shell
2014-08-11 17:05:52,589 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014

hbase(main):001:0> create 'test_copy', 'cf'
0 row(s) in 1.1980 seconds

=> Hbase::Table - test_copy

接下來,執行匯入命令。具體的命令如下:

$ cd $HBASE_HOME/
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import test_copy hdfs://l-master.data/data/test_table

其中,test_copy為我們想要匯入的表名。而hdfs://l-master.data/data/test_table為master叢集的hdfs中,我們之前將test_table表匯出hdfs的全路徑。

匯入命令執行的結果如下,因為結果很長,所以取最後一部分:

2014-08-11 17:13:08,706 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2014-08-11 17:13:08,710 INFO  [main] mapreduce.Job: Job job_1407728839061_0014 completed successfully
2014-08-11 17:13:08,715 INFO  [main] mapreduce.Job: Counters: 27
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=117256
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=356
HDFS: Number of bytes written=0
HDFS: Number of read operations=3
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters 
Launched map tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6510
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=113
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=21
CPU time spent (ms)=1110
Physical memory (bytes) snapshot=379494400
Virtual memory (bytes) snapshot=1855762432
Total committed heap usage (bytes)=1029177344
File Input Format Counters 
Bytes Read=243
File Output Format Counters 
Bytes Written=0

接下來,我們看看從叢集test_copy表中的資料是否和主叢集test_table表的資料一致,執行hbase shell命令:

$ cd $HBASE_HOME/
$ bin/hbase shell
2014-08-11 17:15:52,117 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.2-hadoop2, r1591526, Wed Apr 30 20:17:33 PDT 2014

hbase(main):001:0> scan 'test_copy'
ROW                                                    COLUMN+CELL                                                                                                                                                   
 r1                                                    column=cf:q1, timestamp=1406788229440, value=va1                                                                                                              
 r2                                                    column=cf:q1, timestamp=1406788265646, value=va2                                                                                                              
 r3                                                    column=cf:q1, timestamp=1406788474301, value=va3                                                                                                              
3 row(s) in 0.3640 seconds

對照後,就可以發現,兩個表的資料是完全一致的。

轉載自:http://www.tuicool.com/articles/FjeMNr