Kafka的2種日誌清理策略感受一下
序
Kafka是一個基於日誌的流處理平臺,一個topic可以有多個分割槽(partition),分割槽是複製的基本單元,在單節點上,一個分割槽的資料檔案可以儲存在多個磁碟目錄中,配置項是:
# A comma separated list of directories under which to store log files log.dirs=/home/storm/dev/kafka-logs
每個分割槽的日誌檔案儲存的時候又會分成一個個的segment,預設日誌段(segment)的大小是1GB,segment是日誌清理的基本單元,當前正在使用的segment是不會被清理的。
# The maximum size of a log segment file. When this size is reached a new log segment will be created. log.segment.bytes=1073741824
日誌清理
Kafka Broker 的日誌清理功能在配置 log.cleaner.enable=true
後會開啟一些清理執行緒,執行定時清理任務。在kafka 0.9.0之後 log.cleaner.enable 預設是true。 支援的清理策略( log.cleanup.policy
)有2種:delete和compact,預設是delete。
compact 清理策略(log compaction)
log compaction 實現的是一個topic的一個分割槽中,只保留最近的某個key對應的value,如果要刪除某個訊息可以傳送一個墓碑訊息(tomestone):(key, null)。為了展示這個過程,修改 Broker 的配置:把segment的大小調小點,清理策略改為 compact。
# 25KB log.segment.bytes=25600 log.cleanup.policy=compact
批量傳送一些帶有key的訊息。
➜test-0 ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test --property "parse.key=true" --property "key.separator=:" < msg.txt
然後可以在日誌目錄中看到日誌檔案的結構。
➜kafka-logs cd test-0 ➜test-0 ls -alh total 160K drwxrwxr-x2 storm storm 4.0K Sep 11 18:47 . drwxrwxr-x 53 storm storm 4.0K Sep 11 18:47 .. -rw-rw-r--1 storm storm0 Sep 11 17:27 00000000000000000000.index -rw-rw-r--1 storm storm78 Sep 11 17:27 00000000000000000000.log -rw-rw-r--1 storm storm12 Sep 11 17:27 00000000000000000000.timeindex -rw-rw-r--1 storm storm0 Sep 11 17:27 00000000000000000153.index -rw-rw-r--1 storm storm175 Sep 11 17:27 00000000000000000153.log -rw-rw-r--1 storm storm10 Sep 11 17:27 00000000000000000153.snapshot -rw-rw-r--1 storm storm12 Sep 11 17:27 00000000000000000153.timeindex -rw-rw-r--1 storm storm8 Sep 11 18:47 00000000000000000296.index -rw-rw-r--1 storm storm25K Sep 11 17:27 00000000000000000296.log -rw-rw-r--1 storm storm10 Sep 11 17:27 00000000000000000296.snapshot -rw-rw-r--1 storm storm12 Sep 11 18:47 00000000000000000296.timeindex -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.index -rw-rw-r--1 storm storm16K Sep 11 18:47 00000000000000000522.log -rw-rw-r--1 storm storm10 Sep 11 18:47 00000000000000000522.snapshot -rw-rw-r--1 storm storm12 Sep 11 18:47 00000000000000000522.timeindex -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000665.index -rw-rw-r--1 storm storm16K Sep 11 18:47 00000000000000000665.log -rw-rw-r--1 storm storm10 Sep 11 18:47 00000000000000000665.snapshot -rw-rw-r--1 storm storm12 Sep 11 18:47 00000000000000000665.timeindex -rw-rw-r--1 storm storm10M Sep 11 18:47 00000000000000000808.index -rw-rw-r--1 storm storm25K Sep 11 18:47 00000000000000000808.log -rw-rw-r--1 storm storm10 Sep 11 18:47 00000000000000000808.snapshot -rw-rw-r--1 storm storm10M Sep 11 18:47 00000000000000000808.timeindex -rw-rw-r--1 storm storm8 Sep 11 10:44 leader-epoch-checkpoint ➜test-0 ls -alh total 164K drwxrwxr-x2 storm storm 4.0K Sep 11 18:48 . drwxrwxr-x 53 storm storm 4.0K Sep 11 18:48 .. -rw-rw-r--1 storm storm0 Sep 11 17:27 00000000000000000000.index -rw-rw-r--1 storm storm0 Sep 11 17:27 00000000000000000000.index.deleted -rw-rw-r--1 storm storm73 Sep 11 17:27 00000000000000000000.log -rw-rw-r--1 storm storm78 Sep 11 17:27 00000000000000000000.log.deleted -rw-rw-r--1 storm storm12 Sep 11 17:27 00000000000000000000.timeindex -rw-rw-r--1 storm storm12 Sep 11 17:27 00000000000000000000.timeindex.deleted -rw-rw-r--1 storm storm0 Sep 11 17:27 00000000000000000153.index.deleted -rw-rw-r--1 storm storm175 Sep 11 17:27 00000000000000000153.log.deleted -rw-rw-r--1 storm storm12 Sep 11 17:27 00000000000000000153.timeindex.deleted -rw-rw-r--1 storm storm8 Sep 11 18:47 00000000000000000296.index.deleted -rw-rw-r--1 storm storm25K Sep 11 17:27 00000000000000000296.log.deleted -rw-rw-r--1 storm storm12 Sep 11 18:47 00000000000000000296.timeindex.deleted -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.index -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.index.deleted -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.log -rw-rw-r--1 storm storm16K Sep 11 18:47 00000000000000000522.log.deleted -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.timeindex -rw-rw-r--1 storm storm12 Sep 11 18:47 00000000000000000522.timeindex.deleted -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000665.index -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000665.index.deleted -rw-rw-r--1 storm storm175 Sep 11 18:47 00000000000000000665.log -rw-rw-r--1 storm storm16K Sep 11 18:47 00000000000000000665.log.deleted -rw-rw-r--1 storm storm10 Sep 11 18:47 00000000000000000665.snapshot -rw-rw-r--1 storm storm12 Sep 11 18:47 00000000000000000665.timeindex -rw-rw-r--1 storm storm12 Sep 11 18:47 00000000000000000665.timeindex.deleted -rw-rw-r--1 storm storm10M Sep 11 18:47 00000000000000000808.index -rw-rw-r--1 storm storm25K Sep 11 18:47 00000000000000000808.log -rw-rw-r--1 storm storm10 Sep 11 18:47 00000000000000000808.snapshot -rw-rw-r--1 storm storm10M Sep 11 18:47 00000000000000000808.timeindex -rw-rw-r--1 storm storm8 Sep 11 10:44 leader-epoch-checkpoint
可以看到除了當前segment之外,前面的segments都已經得到了清理/壓縮,從偏移量(offset)出現缺失可到看出來。
➜kafka_2.11-2.0.0 ./bin/kafka-run-class.sh kafka.tools.DumpLogSegments--deep-iteration --files /home/storm/dev/kafka-logs/test-0/00000000000000000000.log Dumping /home/storm/dev/kafka-logs/test-0/00000000000000000000.log Starting offset: 0 offset: 521 position: 0 CreateTime: 1536658031117 isvalid: true keysize: 4 valuesize: 0 magic: 2 compresscodec: NONE producerId: -1 producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: [] ➜kafka_2.11-2.0.0 ./bin/kafka-run-class.sh kafka.tools.DumpLogSegments--deep-iteration --files /home/storm/dev/kafka-logs/test-0/00000000000000000665.log Dumping /home/storm/dev/kafka-logs/test-0/00000000000000000665.log Starting offset: 665 offset: 807 position: 0 CreateTime: 1536662844868 isvalid: true keysize: 4 valuesize: 100 magic: 2 compresscodec: NONE producerId: -1 producerEpoch: -1 sequence: -1 isTransactional: false headerKeys: []
標記為deleted的segments會在1天后被清除。
➜test-0 pwd /home/storm/dev/kafka-logs/test-0 ➜test-0 date Wed Sep 12 09:48:45 CST 2018 ➜test-0 ls -alh total 72K drwxrwxr-x2 storm storm 4.0K Sep 11 18:48 . drwxrwxr-x 53 storm storm 4.0K Sep 11 19:51 .. -rw-rw-r--1 storm storm0 Sep 11 17:27 00000000000000000000.index -rw-rw-r--1 storm storm73 Sep 11 17:27 00000000000000000000.log -rw-rw-r--1 storm storm12 Sep 11 17:27 00000000000000000000.timeindex -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.index -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.log -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.timeindex -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000665.index -rw-rw-r--1 storm storm175 Sep 11 18:47 00000000000000000665.log -rw-rw-r--1 storm storm10 Sep 11 18:47 00000000000000000665.snapshot -rw-rw-r--1 storm storm12 Sep 11 18:47 00000000000000000665.timeindex -rw-rw-r--1 storm storm10M Sep 11 18:47 00000000000000000808.index -rw-rw-r--1 storm storm25K Sep 11 18:47 00000000000000000808.log -rw-rw-r--1 storm storm10 Sep 11 18:47 00000000000000000808.snapshot -rw-rw-r--1 storm storm10M Sep 11 18:47 00000000000000000808.timeindex -rw-rw-r--1 storm storm8 Sep 11 10:44 leader-epoch-checkpoint
delete 清理策略(預設)
再來看看 delete 清理策略,這種策略就是我們預設看到的資料保留特點,超過特定的資料量或者時間,日誌就會被刪除,這裡涉及的 Broker 配置引數是: log.retention.bytes
和 log.retention.hours
(等價於 log.retention.minutes
, log.retention.ms
)預設值為:
# 需要自己根據實際情況設定 log.retention.bytes=-1 # 預設的保留時間是7天 log.retention.hours=168
為了能看出日誌刪除的效果,這裡把保留時間調小,設定為60分鐘,然後可以看到,除了當前正在使用的segment,前面的segments都被刪除了(標記為deleted,1天后會物理刪除)。
# The minimum age of a log file to be eligible for deletion due to age log.retention.minutes=60 ➜kafka-logs ls -alh test-0 total 220K drwxrwxr-x2 storm storm 4.0K Sep 13 11:12 . drwxrwxr-x 53 storm storm 4.0K Sep 13 11:12 .. -rw-rw-r--1 storm storm0 Sep 11 17:27 00000000000000000000.index.deleted -rw-rw-r--1 storm storm73 Sep 11 17:27 00000000000000000000.log.deleted -rw-rw-r--1 storm storm12 Sep 11 17:27 00000000000000000000.timeindex.deleted -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.index.deleted -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.log.deleted -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000522.timeindex.deleted -rw-rw-r--1 storm storm0 Sep 11 18:47 00000000000000000665.index.deleted -rw-rw-r--1 storm storm175 Sep 11 18:47 00000000000000000665.log.deleted -rw-rw-r--1 storm storm12 Sep 11 18:47 00000000000000000665.timeindex.deleted -rw-rw-r--1 storm storm8 Sep 12 10:50 00000000000000000808.index.deleted -rw-rw-r--1 storm storm25K Sep 11 18:47 00000000000000000808.log.deleted -rw-rw-r--1 storm storm12 Sep 12 10:50 00000000000000000808.timeindex.deleted -rw-rw-r--1 storm storm0 Sep 12 10:50 00000000000000001034.index.deleted -rw-rw-r--1 storm storm16K Sep 12 10:50 00000000000000001034.log.deleted -rw-rw-r--1 storm storm12 Sep 12 10:50 00000000000000001034.timeindex.deleted -rw-rw-r--1 storm storm0 Sep 12 10:50 00000000000000001177.index.deleted -rw-rw-r--1 storm storm16K Sep 12 10:50 00000000000000001177.log.deleted -rw-rw-r--1 storm storm12 Sep 12 10:50 00000000000000001177.timeindex.deleted -rw-rw-r--1 storm storm8 Sep 12 10:51 00000000000000001320.index.deleted -rw-rw-r--1 storm storm25K Sep 12 10:50 00000000000000001320.log.deleted -rw-rw-r--1 storm storm12 Sep 12 10:51 00000000000000001320.timeindex.deleted -rw-rw-r--1 storm storm0 Sep 12 10:51 00000000000000001546.index.deleted -rw-rw-r--1 storm storm16K Sep 12 10:51 00000000000000001546.log.deleted -rw-rw-r--1 storm storm12 Sep 12 10:51 00000000000000001546.timeindex.deleted -rw-rw-r--1 storm storm0 Sep 12 10:51 00000000000000001689.index.deleted -rw-rw-r--1 storm storm16K Sep 12 10:51 00000000000000001689.log.deleted -rw-rw-r--1 storm storm12 Sep 12 10:51 00000000000000001689.timeindex.deleted -rw-rw-r--1 storm storm8 Sep 13 11:12 00000000000000001832.index.deleted -rw-rw-r--1 storm storm25K Sep 12 10:51 00000000000000001832.log.deleted -rw-rw-r--1 storm storm12 Sep 13 11:12 00000000000000001832.timeindex.deleted -rw-rw-r--1 storm storm10M Sep 13 11:12 00000000000000002058.index -rw-rw-r--1 storm storm0 Sep 13 11:12 00000000000000002058.log -rw-rw-r--1 storm storm10 Sep 13 11:08 00000000000000002058.snapshot -rw-rw-r--1 storm storm10M Sep 13 11:12 00000000000000002058.timeindex -rw-rw-r--1 storm storm11 Sep 13 11:12 leader-epoch-checkpoint ➜kafka-logs date Fri Sep 14 09:19:41 CST 2018 ➜kafka-logs ls -alh test-0 total 16K drwxrwxr-x2 storm storm 4.0K Sep 13 11:13 . drwxrwxr-x 53 storm storm 4.0K Sep 14 09:19 .. -rw-rw-r--1 storm storm10M Sep 13 11:12 00000000000000002058.index -rw-rw-r--1 storm storm0 Sep 13 11:12 00000000000000002058.log -rw-rw-r--1 storm storm10 Sep 13 11:08 00000000000000002058.snapshot -rw-rw-r--1 storm storm10M Sep 13 11:12 00000000000000002058.timeindex -rw-rw-r--1 storm storm11 Sep 13 11:12 leader-epoch-checkpoint
參考
ofollow,noindex">Kafka Architecture: Log Compaction