1. 程式人生 > >Mongo實戰之數據空洞的最佳實踐

Mongo實戰之數據空洞的最佳實踐

journal 初始 beat objectid use bpa 沒有 lda replica

問題背景: 某天,開發部的同事跑過來反映: mongodb數據文件太大,快把磁盤撐爆了!其中某個db占用最大(運營環境這個db的數據量其實很小)

分析: 開發環境有大量測試的增/刪/改操作,而由於MongoDB順序寫的原因,在我們刪除部分無用數據後,它的storageSize並不會變小,這就造成了大量的數據空洞。

解決辦法

1. 使用MongoDB自帶的compact命令:

  db.collectionName.runCommand(“compact”)
 這種方式是collection級別的壓縮,只能去除collection內的碎片。但是MongoDB的數據分配是DB級別的,效果有限。
 這個壓縮是線上壓縮,磁盤IO會比較高,從而影響線上服務。

2.采用復制集的方式滾動瘦身(offline)

1.檢查服務器各節點是否正常運行 (ps -ef |grep mongod)
2.登入要處理的主節點,做降權處理rs.stepDown(),並通過命令 rs.status()來查看是否降權
3.切換成功之後,停掉該節點,刪除數據文件,比如: rm -fr /mongodb/data/*
4.重新啟動該節點,執行重啟命令,比如:如:/mongodb/bin/mongod --config /mongodb/mongodb.conf
5.通過日誌查看進程以及數據同步的進度
6.數據同步完成後,在修改後的主節點上執行命令 rs.stepDown(),做降權處理。

通過這種方式,可以做到收縮率是100%,數據完全無碎片.當然也會帶來運維成本的增加,並且在Replic-Set集群只有2個副本的情況下,
還會存在一段時間內的單點風險(在下面的實驗中就發生了這樣的情況)。通過Offline的數據收縮後,收縮前後效果非常明顯.

系統環境(實驗環境,為了簡單只做了1個副本,請勿在生產環境操作!!!)

主庫: 192.168.2.130:27017
從庫: 192.168.2.138:27017

1.安裝mongo

2.配置參數(兩臺上配置)

[root@localhost bin]# vi mongodb.conf

dbpath=/u01/mongodb/data
port=27017
oplogSize = 2048
logpath=/u01/mongodb/logs/mongodb.log
logappend = true
fork = true
nojournal = true
bind_ip=0.0.0.0
shardsvr=true
replSet=relp1

3.啟動mongo

[root@localhost bin]# ./mongod --config ./mongodb.conf

4.初始化副本集

[root@localhost bin]# ./mongo  192.168.2.130:27017

> cfg = {"_id":"repl1","members":[ {"_id":0, "host":"192.168.2.130:27017"}, {"_id":1, "host":"192.168.2.138:27017"}]}; 
{
        "_id" : "repl1",
        "members" : [
                {
                        "_id" : 0,
                        "host" : "192.168.2.130:27017"
                },
                {
                        "_id" : 1,
                        "host" : "192.168.2.138:27017"
                }
        ]
}
>  
> rs.initiate(cfg)rs.initiate(cfg)
{ "ok" : 1 }

5.插入數據

#準備工作 安裝pip工具 pymongo
[root@localhost mongodb]# tar zxvf setuptools-0.6c11.tar.gz
[root@localhost mongodb]# cd setuptools-0.6c11
[root@localhost mongodb]# python setup.py install

[root@localhost mongodb]# wget "https://pypi.python.org/packages/source/p/pip/pip-1.5.4.tar.gz#md5=834b2904f92d46aaa333267fb1c922bb" --no-check-certificate
[root@localhost mongodb]# tar zxvf [root@localhost mongodb]# tar pip-1.5.4.tar.gz
[root@localhost mongodb]# cd pip-1.5.4
[root@localhost pip-1.5.4]# python setup.py install


[root@localhost mongodb]# pip install pymongo

#插入腳本
vi insert.py
#!/usr/bin/python
import random
from pymongo import MongoClient

client = MongoClient('192.168.2.137', 27017)

test = client.test
students = test.students
students_count = students.count()
print "student count is ", students_count

for i in xrange(0,5000000):
    classid = random.randint(1,4)
    age = random.randint(10, 30)
    student = {"classid":classid, "age":age,
"name":"fujun"}
    students.insert_one(student)
    print i

students_count = students.count()
print "student count is ", students_count

#執行插入,可以開多個窗口執行
[root@localhost mongodb]#python insert.py

#插入完畢 2kw條記錄
repl1:PRIMARY> db.students.find().count() 
21511778

#查看復制集狀態
repl1:PRIMARY> rs.status()rs.status()
{
        "set" : "repl1",
        "date" : ISODate("2018-03-17T02:42:01.126Z"),
        "myState" : 1,
        "term" : NumberLong(3),
        "heartbeatIntervalMillis" : NumberLong(2000),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "192.168.2.130:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 60089,
                        "optime" : {
                                "ts" : Timestamp(1521231587, 19),
                                "t" : NumberLong(3)
                        },
                        "optimeDate" : ISODate("2018-03-16T20:19:47Z"),
                        "electionTime" : Timestamp(1521213635, 1),
                        "electionDate" : ISODate("2018-03-16T15:20:35Z"),
                        "configVersion" : 1,
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "192.168.2.138:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 40892,
                        "optime" : {
                                "ts" : Timestamp(1521231587, 19),
                                "t" : NumberLong(3)
                        },
                        "optimeDate" : ISODate("2018-03-16T20:19:47Z"),
                        "lastHeartbeat" : ISODate("2018-03-17T02:42:01Z"),
                        "lastHeartbeatRecv" : ISODate("2018-03-17T02:42:00.376Z"),
                        "pingMs" : NumberLong(1),
                        "syncingTo" : "192.168.2.130:27017",
                        "configVersion" : 1
                }
        ],
        "ok" : 1
}

#查看storageSize和fileSize:
repl1:PRIMARY> db.stats() 
{
        "db" : "test",
        "collections" : 1,
        "objects" : 21511778,
        "avgObjSize" : 60,
        "dataSize" : 1290706680,
        "storageSize" : 410132480,
        "numExtents" : 0,
        "indexes" : 1,
        "indexSize" : 217350144,
        "ok" : 1
}

#storageSize = dataSize+size(刪除的文檔)(文檔刪除後,storageSize並不會變小)。

6.制造空洞數據

#刪除3/4數據
repl1:PRIMARY> db.students.remove({"classid":{"$lt":4}}) 

WriteResult({ "nRemoved" : 9050980 })

#還剩下500w
repl1:PRIMARY> db.students.find().count()db.students.find().count()
5376340


#storageSize沒有變下。
repl1:PRIMARY>db.stats()
{
        "db" : "test",
        "collections" : 1,
        "objects" : 5376340,
        "avgObjSize" : 60,
        "dataSize" : 322580400,
        "storageSize" : 425127936,
        "numExtents" : 0,
        "indexes" : 1,
        "indexSize" : 219590656,
        "ok" : 1
}

7.primary降權、停止

repl1:PRIMARY> rs.setrs.sers.stepDown()rs.stepDown()
2018-03-17T13:46:49.588+0800 E QUERY    [thread1] Error: error doing query: failed: network error while attempting to run command 'replSetStepDown' on host '192.168.2.130:27017'  :
DB.prototype.runCommand@src/mongo/shell/db.js:135:1
DB.prototype.adminCommand@src/mongo/shell/db.js:153:16
rs.stepDown@src/mongo/shell/utils.js:1202:12
@(shell):1:1

2018-03-17T13:46:49.600+0800 I NETWORK  [thread1] trying reconnect to 192.168.2.130:27017 (192.168.2.130) failed
2018-03-17T13:46:49.618+0800 I NETWORK  [thread1] reconnect 192.168.2.130:27017 (192.168.2.130) ok
repl1:SECONDARY> 

#停止192.168.2.130
[root@localhost mongodb]# mongod --shutdown --config=/u01/mongodb/mongodb.conf
killing process with pid: 32322

8.刪除192.168.2.130 上的數據文件

[root@localhost data]# rm -rf /u01/mongodb/data/*

9.重啟192.168.2.130

[root@localhost data]# mongod  --config=/u01/mongodb/mongodb.conf
about to fork child process, waiting until server is ready for connections.
forked process: 48079
child process started successfully, parent exiting

10.查看日誌,跟蹤數據同步過程

[root@localhost mongodb]# cd /u01/mongodb/logs/
[root@localhost logs]# tail -50f  mongodb.log

2018-03-17T13:53:31.077+0800 I REPL     [replExecDBWorker-0] Starting replication applier threads
2018-03-17T13:53:31.077+0800 I REPL     [ReplicationExecutor] 
2018-03-17T13:53:31.078+0800 I REPL     [ReplicationExecutor] ** WARNING: This replica set is running without journaling enabled but the 
2018-03-17T13:53:31.078+0800 I REPL     [ReplicationExecutor] **          writeConcernMajorityJournalDefault option to the replica set config 
2018-03-17T13:53:31.078+0800 I REPL     [ReplicationExecutor] **          is set to true. The writeConcernMajorityJournalDefault 
2018-03-17T13:53:31.078+0800 I REPL     [ReplicationExecutor] **          option to the replica set config must be set to false 
2018-03-17T13:53:31.078+0800 I REPL     [ReplicationExecutor] **          or w:majority write concerns will never complete.
2018-03-17T13:53:31.078+0800 I REPL     [ReplicationExecutor] 
2018-03-17T13:53:31.078+0800 I REPL     [ReplicationExecutor] New replica set config in use: { _id: "repl1", version: 1, protocolVersion: 1, members: [ { _id: 0, host: "192.168.2.130:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "192.168.2.138:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5aab9f92276e84b589038188') } }
2018-03-17T13:53:31.078+0800 I REPL     [ReplicationExecutor] This node is 192.168.2.130:27017 in the config
2018-03-17T13:53:31.078+0800 I REPL     [ReplicationExecutor] transition to STARTUP2
2018-03-17T13:53:31.079+0800 I REPL     [rsSync] ******
2018-03-17T13:53:31.079+0800 I REPL     [rsSync] creating replication oplog of size: 2048MB...
2018-03-17T13:53:31.298+0800 I REPL     [ReplicationExecutor] Member 192.168.2.138:27017 is now in state SECONDARY
2018-03-17T13:53:31.915+0800 I STORAGE  [rsSync] Starting WiredTigerRecordStoreThread local.oplog.rs
2018-03-17T13:53:31.915+0800 I STORAGE  [rsSync] The size storer reports that the oplog contains 0 records totaling to 0 bytes
2018-03-17T13:53:31.916+0800 I STORAGE  [rsSync] Scanning the oplog to determine where to place markers for truncation
2018-03-17T13:53:34.635+0800 I REPL     [rsSync] ******
2018-03-17T13:53:34.635+0800 I REPL     [rsSync] initial sync pending
2018-03-17T13:53:37.219+0800 I REPL     [ReplicationExecutor] syncing from: 192.168.2.138:27017
2018-03-17T13:53:39.164+0800 I REPL     [rsSync] initial sync drop all databases
2018-03-17T13:53:39.165+0800 I STORAGE  [rsSync] dropAllDatabasesExceptLocal 1
2018-03-17T13:53:39.165+0800 I REPL     [rsSync] initial sync clone all databases
2018-03-17T13:53:39.281+0800 I REPL     [rsSync] fetching and creating collections for test
2018-03-17T13:53:47.061+0800 I REPL     [rsSync] initial sync cloning db: test
2018-03-17T14:19:07.020+0800 I STORAGE  [rsSync] clone test.students 70015
2018-03-17T14:19:07.021+0800 I STORAGE  [rsSync] 70105 objects cloned so far from collection test.students
2018-03-17T14:20:34.083+0800 I STORAGE  [rsSync] 139955 objects cloned so far from collection test.students
2018-03-17T14:20:34.084+0800 I STORAGE  [rsSync] clone test.students 140031
2018-03-17T14:21:38.094+0800 I STORAGE  [rsSync] 489459 objects cloned so far from collection test.students
2018-03-17T14:21:38.094+0800 I STORAGE  [rsSync] clone test.students 489471
2018-03-17T14:22:51.519+0800 I STORAGE  [rsSync] 769113 objects cloned so far from collection test.students
2018-03-17T14:22:51.519+0800 I STORAGE  [rsSync] clone test.students 769151
2018-03-17T14:24:07.410+0800 I STORAGE  [rsSync] 978790 objects cloned so far from collection test.students
2018-03-17T14:24:07.410+0800 I STORAGE  [rsSync] clone test.students 978815
2018-03-17T14:25:10.198+0800 I STORAGE  [rsSync] 1468121 objects cloned so far from collection test.students
2018-03-17T14:25:10.198+0800 I STORAGE  [rsSync] clone test.students 1468159
2018-03-17T14:25:44.018+0800 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:48042 #7 (4 connections now open)
2018-03-17T14:26:01.100+0800 I NETWORK  [conn5] end connection 127.0.0.1:48032 (3 connections now open)
2018-03-17T14:26:01.100+0800 I NETWORK  [conn7] end connection 127.0.0.1:48042 (3 connections now open)
2018-03-17T14:26:16.318+0800 I STORAGE  [rsSync] 1887602 objects cloned so far from collection test.students
2018-03-17T14:26:16.318+0800 I STORAGE  [rsSync] clone test.students 1887615
2018-03-17T14:27:19.201+0800 I STORAGE  [rsSync] clone test.students 2307071
2018-03-17T14:27:19.201+0800 I STORAGE  [rsSync] 2307083 objects cloned so far from collection test.students
2018-03-17T14:28:22.898+0800 I STORAGE  [rsSync] clone test.students 3145855
2018-03-17T14:28:22.899+0800 I STORAGE  [rsSync] 3145918 objects cloned so far from collection test.students
2018-03-17T14:29:25.991+0800 I STORAGE  [rsSync] 3565272 objects cloned so far from collection test.students
2018-03-17T14:29:25.991+0800 I STORAGE  [rsSync] clone test.students 3565311
2018-03-17T14:30:28.780+0800 I STORAGE  [rsSync] 3984753 objects cloned so far from collection test.students
2018-03-17T14:30:28.780+0800 I STORAGE  [rsSync] clone test.students 3984767
2018-03-17T14:31:28.001+0800 I STORAGE  [rsSync] clone test.students 4667519
2018-03-17T14:31:29.457+0800 I STORAGE  [rsSync] 4683761 objects cloned so far from collection test.students
2018-03-17T14:32:34.884+0800 I STORAGE  [rsSync] clone test.students 5103231
2018-03-17T14:32:34.884+0800 I STORAGE  [rsSync] 5103242 objects cloned so far from collection test.students
2018-03-17T14:33:46.017+0800 I STORAGE  [rsSync] 5452746 objects cloned so far from collection test.students
2018-03-17T14:33:46.017+0800 I STORAGE  [rsSync] clone test.students 5452799
2018-03-17T14:34:47.770+0800 I STORAGE  [rsSync] clone test.students 5872127
2018-03-17T14:34:47.770+0800 I STORAGE  [rsSync] 5872227 objects cloned so far from collection test.students
2018-03-17T14:35:52.885+0800 I STORAGE  [rsSync] 6291581 objects cloned so far from collection test.students
2018-03-17T14:35:52.885+0800 I STORAGE  [rsSync] clone test.students 6291583
2018-03-17T14:36:57.050+0800 I STORAGE  [rsSync] 6571235 objects cloned so far from collection test.students
2018-03-17T14:36:57.050+0800 I STORAGE  [rsSync] clone test.students 6571263
2018-03-17T14:37:59.514+0800 I STORAGE  [rsSync] 8248905 objects cloned so far from collection test.students
2018-03-17T14:37:59.515+0800 I STORAGE  [rsSync] clone test.students 8248959
2018-03-17T14:38:37.698+0800 I INDEX    [rsSync] build index on: test.students properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "test.students" }
2018-03-17T14:38:37.698+0800 I INDEX    [rsSync]         building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2018-03-17T14:38:40.000+0800 I -        [rsSync]   Index Build: 1511900/12074974 12%
2018-03-17T14:38:43.000+0800 I -        [rsSync]   Index Build: 3571300/12074974 29%
2018-03-17T14:38:46.000+0800 I -        [rsSync]   Index Build: 5640600/12074974 46%
2018-03-17T14:38:49.001+0800 I -        [rsSync]   Index Build: 7599000/12074974 62%
2018-03-17T14:38:52.001+0800 I -        [rsSync]   Index Build: 9471000/12074974 78%
2018-03-17T14:38:55.001+0800 I -        [rsSync]   Index Build: 11341800/12074974 93%
2018-03-17T14:39:27.000+0800 I -        [rsSync]   Index: (2/3) BTree Bottom Up Progress: 8961500/12074974 74%
2018-03-17T14:39:30.222+0800 I INDEX    [rsSync]         done building bottom layer, going to commit
2018-03-17T14:39:31.465+0800 I NETWORK  [initandlisten] connection accepted from 192.168.2.138:50802 #8 (3 connections now open)
2018-03-17T14:39:31.960+0800 I COMMAND  [conn1] command local.replset.election command: replSetRequestVotes { replSetRequestVotes: 1, setName: "repl1", dryRun: false, term: 212, candidateIndex: 1, configVersion: 1, lastCommittedOp: { ts: Timestamp 1521264812000|7795, t: 193 } } keyUpdates:0 writeConflicts:0 numYields:0 reslen:63 locks:{ Global: { acquireCount: { r: 4, w: 2 } }, Database: { acquireCount: { r: 1, W: 2 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_command 3219ms
2018-03-17T14:39:31.983+0800 I INDEX    [rsSync] build index done.  scanned 12074974 total records. 54 secs
2018-03-17T14:39:31.984+0800 I REPL     [rsSync] initial sync data copy, starting syncup
2018-03-17T14:39:31.984+0800 I REPL     [rsSync] oplog sync 1 of 3
2018-03-17T14:39:32.797+0800 I REPL     [ReplicationExecutor] syncing from: 192.168.2.138:27017
2018-03-17T14:39:33.749+0800 I REPL     [ReplicationExecutor] Member 192.168.2.138:27017 is now in state PRIMARY
2018-03-17T14:46:16.481+0800 I REPL [rsSync] oplog sync 2 of 3
2018-03-17T14:46:16.708+0800 I REPL [rsSync] initial sync building indexes
2018-03-17T14:46:16.708+0800 I REPL [rsSync] initial sync cloning indexes for : test
2018-03-17T14:46:17.236+0800 I STORAGE  [rsSync] copying indexes for: { name: "students", options: {} }
2018-03-17T14:46:19.079+0800 I REPL [rsSync] oplog sync 3 of 3
2018-03-17T14:46:19.083+0800 I REPL [rsSync] initial sync finishing up
2018-03-17T14:46:19.483+0800 I REPL [rsSync] initial sync done
2018-03-17T14:46:19.491+0800 I REPL [rsSync] initial sync succeeded after 1 attempt(s).
2018-03-17T14:46:19.491+0800 I REPL [ReplicationExecutor] transition to RECOVERING
2018-03-17T14:46:19.495+0800 I REPL [ReplicationExecutor] transition to SECONDARY

由日誌中可以看到,同步花了很長時間,原因:

1.192.168.2.130降級前,復制集的同步是沒有完成的,這導致了降級後,192.168.2.138並沒有升級為primary,這樣就造成了單點故障!
2.192.168.2.130刪除數據重啟後,192.168.2.138繼續做delete同步操作,130的狀態為startup2
3.192.168.2.138刪除數據完畢後(由於沒有同步完全,130上的數據文件被刪除了,導致138上最終的數據不是最新的!),
  然後才成為primary

repl1:PRIMARY> db.students.find().count()
12074974

通過查詢結果,可以看到數據不是最新的,新數據應該至於500w條,也證明了mongo復制集是異步的,降級的時候應該觀察副本數據庫的狀態!!!

後面又做了一次刪除,並跟蹤了副本的同步情況,這次切換時間就變得很短了,並且文件大小下降的十分可觀!

11. 再次查看storageSize,下降十分明顯

repl1:PRIMARY> db.stats() 
{
        "db" : "test",
        "collections" : 2,
        "objects" : 5376341,
        "avgObjSize" : 60.000029946017186,
        "dataSize" : 322580621,
        "storageSize" : 102256640,
        "numExtents" : 0,
        "indexes" : 2,
        "indexSize" : 54321152,
        "ok" : 1
}

12. 192.168.2.138降權

repl1:PRIMARY>rs.stepDown()
2018-03-17T02:51:29.554-0400 E QUERY    [thread1] Error: error doing query: failed: network error while attempting to run command 'replSetStepDown' on host '127.0.0.1:27017'  :
DB.prototype.runCommand@src/mongo/shell/db.js:135:1
DB.prototype.adminCommand@src/mongo/shell/db.js:153:16
rs.stepDown@src/mongo/shell/utils.js:1202:12
@(shell):1:1

2018-03-17T02:51:29.652-0400 I NETWORK  [thread1] trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed
2018-03-17T02:51:29.791-0400 I NETWORK  [thread1] reconnect 127.0.0.1:27017 (127.0.0.1) ok

13. 重復130的操作收縮138的空間

Mongo實戰之數據空洞的最佳實踐