1. 程式人生 > >分散式資料遷移工具

分散式資料遷移工具

Data Migration Project

This project is a distributed migration service for mysql.It includes full data migration, incremental data migration and real-time data check.The service can process big-data migration efficiently,exactly.Our company use it in expansion and shrink of distributed database,disaster recovery and business service updating.

Flow Chart

flow-chart

System Frame

frame

Full data migration

1.Two way for Extract

Way 1: Extracting from master directly and controlling IO and Loading of master by configing the threshold.
Way 2: Copying a slave from master and extracting from slave.

2.Index

We rule the table upon 10 thousand rows or 100 mb must has index.Generally we extract data by primary key.

3.performance

We divide a schema into a lot of table tasks and every node process some of these.A master node controls the program of all slave nodes.
For one table data, We divide it into some chunks by index and run some threads to process every chunk.

Incrmental data migration

1.log the binlog position

Incre service starts by a position which is logged by full data task.

2.catch incre data

We enhanced canal which is a mysql binlog server.Incre service can subscribes binlog position and catches up data from canal by RPC.

3.performance

We choose one thread for one db task in order to the data sequence.We batch pull data and process data to increase performance.

Full data check

1.When to start?

Incre data service will notify check service starting when it keep migrationg and master in sync.

2.How to check full data?

First logging the binlog position.
Second extracting and chunking data by index.
Finally compareing and repairing data.

3.How to compare data?

Signaturing row data by MD5 and compare target and master.

Incre data check

1.When to start?

It start from binlog position when full data check finished.

2.How to compare data?

Subscribing target and master,Comparing binlog data in a certian range.
Incre data service log target and master binlog position and notify the incre data check to consume these.

Distribution and recovery

Migration Services use master-slave frame.The master node allocate schedules and transfer the fail task.
Slave node will report progress to master and master save progress.If slave fail,master will choose another node to continue task from the point of interruption.

資料同步工具

介紹

mysql資料同步工具,具有全量、增量遷移、資料校驗、可定製等功能。可做到大資料的高效、實時遷移,平滑切換、保證資料無誤。我公司內部用於分散式資料庫的資料擴容縮容,機房災備以及業務升級。

遷移流程

flow-chart

系統架構

frame

全量遷移

1.Extract的兩種方案

方案一 可直接在主庫上執行extract,可配置遷移閾值以控制主庫上的IO和Load。
方案二,從postion點複製一個slave用於全量同步。

2.大資料索引

全量服務,小表(低於10萬,容量小於500mb)可選擇無索引的複製方式。但大表必須配置索引, Extract預設根據主鍵批量獲取資料。

3.效能

多節點共同完成遷移工作,以表為最小粒度,由主節點分配。
根據索引進行分塊,多塊並行遷移。

增量遷移

1.記錄position

全量遷移開始需記錄binlog的position,增量服務從該節點進行遷移

2.增量資料獲取

擴充套件了阿里巴巴的canal服務,通過訂閱binlog拉去增量資料。

3.效能

需要保持資料順序行,所以採用單庫單執行緒方式,批量拉資料,批量更新方式。

全量資料校驗

1.何時觸發

增量遷移服務發現已經追上主庫時,通知校驗服務開啟全量校驗。

2.全量資料

記錄position點,分塊,進行merge。

3.資料merge

通過查詢源庫和目標庫,對資料進行簽名,比較,進行修正。

增量資料校驗

1.何時觸發

全量進行完後,根據pos點進行增量校驗

2.如何merge

訂閱源庫和目標庫binlog,再一定區間內進行比對和校驗.
增量遷移會記錄兩個庫binlog位點,校驗服務通過notify消費。

分散式與容災

遷移工具採用主從叢集架構。主節點用於分配進度和失效轉移。
遷移節點會彙報進度,若中途某個節點異常,可選其他節點進行斷點續傳。