HBase跨叢集複製資料的另一種方法
阿新 • • 發佈:2018-12-23
2012-08-14
http://abloz.com date:2012.8.14
上一篇文章《hbase 複製備份資料》 中提到用工具CopyTable來在叢集間複製資料。另外還有一種更暴力的方式,來共享HBase備份表。因為有時候兩個叢集並不連通。
一、從源hbase叢集中複製出HBase資料庫表到本地目錄
最好停止HBase,否則可能會丟部分資料
[[email protected] ~]$ hadoop fs -get /hbase/toplist_ware_total_1009_201232 toplist_ware_total_1009_201232
壓縮
[[email protected] ~]$ tar zcvf topl.tar.gz toplist_ware_total_1009_201232
遠端複製到目標機器
[[email protected] ~]$ scp topl.tar.gz [email protected]:~/.
二、目標HBase匯入
解壓 [[email protected] ~]$ tar zxvf topl.tar.gz
如果目標HBase裡有這個表,需disable並drop掉。如果有該目錄,則用hadoop fs -rmr /hbase/table的方式刪除,再往HDFS上覆制。以免資料出錯。
放到叢集下面
[ [email protected] ~]$ fs -put toplist_ware_total_1009_201232 /hbase
[[email protected] ~]$
此時可以list出來,但scan報錯
hbase(main):055:0> list 'toplist_ware_total_1009_201232' TABLE toplist_ware_total_1009_201232 1 row(s) in 0.0220 seconds hbase(main):062:0> scan 'toplist_ware_total_1009_201232' ROW COLUMN+CELL ERROR: Unknown table toplist_ware_total_1009_201232!
.META.表裡面沒有相關記錄 hbase(main):064:0> scan ‘.META.’ 裡面沒有toplist_ware_total_1009_201232 開頭的行
三、修復.META.表和重新分配資料到各RegionServer
在.META.表沒修復時執行重新分配,會報錯
[[email protected] ~]$ hbase hbck -fixAssignments
...
ERROR: Region { meta => null, hdfs => hdfs://h185:54310/hbase/toplist_ware_total_1009_201232/0403552001eb2a31990e443dcae74ee8, deployed => } on HDFS, but not listed in META or deployed on any region server
...
先修復.META.表
[[email protected] ~]$ hbase hbck -fixMeta
...
ERROR: Region { meta => null, hdfs => hdfs://h185:54310/hbase/toplist_ware_total_1009_201232/0403552001eb2a31990e443dcae74ee8, deployed => } on HDFS, but not listed in META or deployed on any region server
12/08/14 18:25:15 INFO util.HBaseFsck: Patching .META. with .regioninfo: {NAME => 'toplist_ware_total_1009_201232,,1344187094829.0403552001eb2a31990e443dcae74ee8.', STARTKEY => '', ENDKEY => '', ENCODED => 0403552001eb2a31990e443dcae74ee8,}
...
此時.META.表已經有表的資料了,但scan還是失敗
hbase(main):065:0> scan '.META.'
ROW COLUMN+CELL
...
toplist_ware_total_1009_201232,,134418709 column=info:regioninfo, timestamp=1344939930752, value={NAME => 'toplist_ware_total_1009_201232,,1344187094829.0403552001eb
4829.0403552001eb2a31990e443dcae74ee8. 2a31990e443dcae74ee8.', STARTKEY => '', ENDKEY => '', ENCODED => 0403552001eb2a31990e443dcae74ee8,}
16 row(s) in 0.0550 seconds
scan還是失敗
hbase(main):066:0> scan 'toplist_ware_total_1009_201232'
ROW COLUMN+CELL
ERROR: org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in .META. for region toplist_ware_total_1009_201232,,1344187094829.0403552001eb2a31990e443dcae74ee8. containing row
重新分配到各分割槽伺服器
[[email protected] ~]$ hbase hbck -fixAssignments
...
ERROR: Region { meta => toplist_ware_total_1009_201232,,1344187094829.0403552001eb2a31990e443dcae74ee8., hdfs => hdfs://h185:54310/hbase/toplist_ware_total_1009_201232/0403552001eb2a31990e443dcae74ee8, deployed => } not deployed on any region server.
Trying to fix unassigned region...
12/08/14 18:28:01 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {NAME => 'toplist_ware_total_1009_201232,,1344187094829.0403552001eb2a31990e443dcae74ee8.', STARTKEY => '', ENDKEY => '', ENCODED => 0403552001eb2a31990e443dcae74ee8,}
12/08/14 18:28:02 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {NAME => 'toplist_ware_total_1009_201232,,1344187094829.0403552001eb2a31990e443dcae74ee8.', STARTKEY => '', ENDKEY => '', ENCODED => 0403552001eb2a31990e443dcae74ee8,}
12/08/14 18:28:04 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {NAME => 'toplist_ware_total_1009_201232,,1344187094829.0403552001eb2a31990e443dcae74ee8.', STARTKEY => '', ENDKEY => '', ENCODED => 0403552001eb2a31990e443dcae74ee8,}
12/08/14 18:28:05 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {NAME => 'toplist_ware_total_1009_201232,,1344187094829.0403552001eb2a31990e443dcae74ee8.', STARTKEY => '', ENDKEY => '', ENCODED => 0403552001eb2a31990e443dcae74ee8,}
...
scan成功!
hbase(main):067:0> scan 'toplist_ware_total_1009_201232'
ROW COLUMN+CELL
0000000001 column=info:loginid, timestamp=1344187147972, value=jjm167258611
0000000001 column=info:nick, timestamp=1344187147972, value=?xE9x97xB4?xE6xB5xA3?
0000000001 column=info:score, timestamp=1344187147972, value=200
0000000001 column=info:userid, timestamp=1344187147972, value=167258611
...
330 row(s) in 0.8630 seconds
如果目標叢集是空的,則可以直接將源HBase的/hbase目錄複製出來,然後在目標HBase系統上fs -rmr /hbase 或fs -mv /hbase /hbase1 然後用hadoop fs -put hbase / 即可
如非註明轉載, 均為原創. 本站遵循知識共享CC協議,轉載請註明來源