kafka程序crash-OOM原因分析及解決
1. 現象
由於種種原因,kafka程序宕機了一段時間(比如可能是被人為kill了)。然後在某個時間點嘗試啟動kafka的時候,發現啟動失敗,程序起不起來,檢視日誌,發現日誌報錯OOM,類似如下
[2018-11-27 13:54:58,726] INFO Recovering unflushed segment 0 in log dcs_async_scr_info-4. (kafka.log.Log) [2018-11-27 13:54:58,727] INFO Completed load of log dcs_async_scr_info-4 with log end offset 0 (kafka.log.Log) [2018-11-27 13:54:58,731] INFO Recovering unflushed segment 92049 in log dcs_async_dev_sid_info-6. (kafka.log.Log) [2018-11-27 13:54:58,737] INFO Completed load of log dcs_async_dev_sid_info-6 with log end offset 92146 (kafka.log.Log) ...skipping... [2018-11-27 13:54:59,450] INFO Recovering unflushed segment 0 in log dcs_storm_collect_info-3. (kafka.log.Log) [2018-11-27 13:54:59,552] ERROR There was an error in one of the threads during logs loading: java.io.IOException: Map failed (kafka.log.LogManager) [2018-11-27 13:54:59,554] FATAL [Kafka Server 0], Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:919) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:286) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276) at kafka.utils.Utils$.inLock(Utils.scala:535) at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276) at kafka.log.LogSegment.recover(LogSegment.scala:170) at kafka.log.Log.recoverLog(Log.scala:205) at kafka.log.Log.loadSegments(Log.scala:177) at kafka.log.Log.<init>(Log.scala:67) at kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$7$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:142) at kafka.utils.Utils$$anon$1.run(Utils.scala:54) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:916) ... 15 more [2018-11-27 13:54:59,556] INFO [Kafka Server 0], shutting down (kafka.server.KafkaServer)
2. 資源監控
因為日誌報錯是說記憶體溢位,因此啟動過程中啟動過程中抓取程序的資源消耗訊息,發現如下
2.1 檢視記憶體狀態
因為報錯OOM,因此首先抓取了記憶體狀態
totalusedfreesharedbufferscached Mem:776157941966161503048 -/+ buffers/cache:25955165 Swap:000
整個過程中抓了好幾次,每次結果都可以看出,可用記憶體還是很充足的
2.2 檢視記憶體對映檔案
kafka會將所有的分割槽資料片段的index檔案做mmap,以加速檔案讀寫。另我們知道map是有大小限制的,因此抓取map數量,看看是否超過map控制。
抓取結果:
2055 7f1549679000-7f154a079000 rw-s 00000000 fd:01 1704740/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-2/00000000000000000000.index 2056 7f154a079000-7f154aa79000 rw-s 00000000 fd:01 1704740/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-2/00000000000000000000.index 2057 7f154aa79000-7f154b479000 rw-s 00000000 fd:01 1704740/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-2/00000000000000000000.index 2058 7f154b479000-7f154be79000 rw-s 00000000 fd:01 1704505/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info-14/00000000000000000000.index 2059 7f154be79000-7f154c879000 rw-s 00000000 fd:01 1704505/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info-14/00000000000000000000.index 2060 7f154c879000-7f154d279000 rw-s 00000000 fd:01 1704505/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info-14/00000000000000000000.index 2061 7f154d279000-7f154dc79000 rw-s 00000000 fd:01 1704840/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-47/00000000000000000000.index 2062 7f154dd2b000-7f154e72b000 rw-s 00000000 fd:01 1704840/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-47/00000000000000000000.index 2063 7f154e72b000-7f154f12b000 rw-s 00000000 fd:01 1704840/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-47/00000000000000000000.index 2064 7f154f12b000-7f154fb2b000 rw-s 00000000 fd:01 1704530/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info_gray-6/00000000000000000000.index 2065 7f154fb2b000-7f155052b000 rw-s 00000000 fd:01 1704530/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info_gray-6/00000000000000000000.index 2066 7f15505d9000-7f1550fd9000 rw-s 00000000 fd:01 1704530/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_storm_collect_info_gray-6/00000000000000000000.index 2067 7f1550fd9000-7f15519d9000 rw-s 00000000 fd:01 1704852/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-28/00000000000000000000.index 2068 7f15519d9000-7f15523d9000 rw-s 00000000 fd:01 1704852/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-28/00000000000000000000.index 2069 7f15523d9000-7f1552dd9000 rw-s 00000000 fd:01 1704852/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-28/00000000000000000000.index 2070 7f1552dd9000-7f15537d9000 rw-s 00000000 fd:01 1704783/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-15/00000000000000000000.index 2071 7f15537d9000-7f15541d9000 rw-s 00000000 fd:01 1704783/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-15/00000000000000000000.index 2072 7f15541d9000-7f1554bd9000 rw-s 00000000 fd:01 1704783/home/dcs_extas/usr/kafka_test_env/kafka-logs/__consumer_offsets-15/00000000000000000000.index 2073 7f1554bd9000-7f15555d9000 rw-s 00000000 fd:01 1704990/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_dev_info_batch_del-7/00000000000000057329.index 2074 7f15555d9000-7f15555da000 rw-s 00000000 fd:01 1704990/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_dev_info_batch_del-7/00000000000000057329.index 2075 7f15555da000-7f1555fda000 rw-s 00000000 fd:01 1704990/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_dev_info_batch_del-7/00000000000000057329.index 2076 7f1555fda000-7f15569da000 rw-s 00000000 fd:01 1704990/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_dev_info_batch_del-7/00000000000000057329.index 2077 7f15569da000-7f15573da000 rw-s 00000000 fd:01 1705044/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-10/00000000000000183949.index 2078 7f15573da000-7f15573db000 rw-s 00000000 fd:01 1705044/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-10/00000000000000183949.index 2079 7f15573db000-7f1557ddb000 rw-s 00000000 fd:01 1705044/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-10/00000000000000183949.index 2080 7f1557ddb000-7f15587db000 rw-s 00000000 fd:01 1705044/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-10/00000000000000183949.index 2081 7f15587db000-7f15591db000 rw-s 00000000 fd:01 1704654/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-5/00000000000000000000.index 2082 7f15591db000-7f1559bdb000 rw-s 00000000 fd:01 1704654/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-5/00000000000000000000.index 2083 7f1559bdb000-7f155a5db000 rw-s 00000000 fd:01 1704654/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-5/00000000000000000000.index 2084 7f155a5db000-7f155afdb000 rw-s 00000000 fd:01 1704682/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-9/00000000000000091638.index 2085 7f155afdb000-7f155afdc000 rw-s 00000000 fd:01 1704682/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-9/00000000000000091638.index 2086 7f155afdc000-7f155b9dc000 rw-s 00000000 fd:01 1704682/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-9/00000000000000091638.index 2087 7f155b9dc000-7f155c3dc000 rw-s 00000000 fd:01 1704682/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_device_to_db-9/00000000000000091638.index 2088 7f155c3dc000-7f155cddc000 rw-s 00000000 fd:01 1704870/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-13/00000000000000000000.index 2089 7f155cddc000-7f155d7dc000 rw-s 00000000 fd:01 1704870/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-13/00000000000000000000.index 2090 7f155d7dc000-7f155e1dc000 rw-s 00000000 fd:01 1704870/home/dcs_extas/usr/kafka_test_env/kafka-logs/dcs_async_scr_info-13/00000000000000000000.index
這裡給出的只是各種index的片段,還有其他的對映檔案,比如載入的jar包等,此處忽略。
統計量下map的總數量,600多。檢視系統限制
cat /proc/sys/vm/max_map_count 65530
可以看到,map size遠小於系統限制
2.3 檢視程序status
然後抓取status,
cat /proc/$PID/status
可以看出,VmPeak和VMSize比較大,快7G
----------------------------------------Tue Nov 27 13:54:59 CST 2018 catch process status info start-------------------------------- Name:java State:S (sleeping) Tgid:7492 Pid:7492 PPid:7491 TracerPid:0 Uid:1006100610061006 Gid:1000100010001000 FDSize: 256 Groups: 16 33 1000 VmPeak:6899700 kB VmSize:6899700 kB VmLck:0 kB VmPin:0 kB VmHWM:106168 kB VmRSS:106104 kB VmData:2416668 kB VmStk:140 kB VmExe:4 kB VmLib:14460 kB VmPTE:1248 kB VmSwap:0 kB Threads:17 SigQ:0/127537 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000000003 SigCgt: 2000000181005ccc CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: ffffffffffffffff Cpus_allowed:f Cpus_allowed_list:0-3 Mems_allowed:00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001 Mems_allowed_list:0 voluntary_ctxt_switches:7 nonvoluntary_ctxt_switches:3 ----------------------------------------Tue Nov 27 13:54:59 CST 2018 catch process status info end--------------------------------
因此懷疑是VmPeak反映出的虛擬記憶體佔用過大導致的
2.4 檢視ulimit限制
檢視ulimit
core file size(blocks, -c) unlimited data seg size(kbytes, -d) unlimited scheduling priority(-e) 0 file size(blocks, -f) unlimited pending signals(-i) 127537 max locked memory(kbytes, -l) 64 max memory size(kbytes, -m) 7331164 open files(-n) 1024 pipe size(512 bytes, -p) 8 POSIX message queues(bytes, -q) 819200 real-time priority(-r) 0 stack size(kbytes, -s) 8192 cpu time(seconds, -t) unlimited max user processes(-u) 127537 virtual memory(kbytes, -v) 6899920 file locks(-x) unlimited
可以發現, virtual memory的上線也是快7G,因此大概率是vm不夠用了。
3. 調大vm-解決
申請了root許可權,調整了ulimit -v
ulimit -v unlimited
再啟動kafka,完美啟動。檢視status
Name:java State:S (sleeping) Tgid:19687 Pid:19687 PPid:1 TracerPid:0 Uid:1006100610061006 Gid:1000100010001000 FDSize: 512 Groups: 16 33 1000 VmPeak:7566500 kB VmSize:4704148 kB VmLck:0 kB VmPin:0 kB VmHWM:261668 kB VmRSS:261668 kB VmData:2453684 kB VmStk:140 kB VmExe:4 kB VmLib:14460 kB VmPTE:1788 kB VmSwap:0 kB Threads:59 SigQ:0/127537
可以看出,vm峰值的確超過了7G,可以應證是之前的vm限制問題。
總結
kafka的設計和實現中大量使用了PageCache和檔案對映。其中檔案對映需要用到虛擬記憶體。對於每個topic的每個partition的資料片段,kafka都將為其生成一個index檔案,這些index檔案會被kafka做mmap對映到記憶體中,如果index檔案太多,虛擬記憶體可能會不夠用,因此有兩處可以緩解這種現象
- 調整系統vm限制,如本文中設定ulimit -v unlimited
- 合理配置log.segment.bytes的大小,不要配置的過小,否則在生產環境中,可能會因為滾動生成太多的segment檔案而附加生成太多的index檔案,最終使得mmap在對映時造成OOM報錯。一般log.segment.bytes配置不小於200M,log.retention.bytes配置推薦為1G即以上(是log.segment.bytes的倍數)
4. 抓取指令碼
catchKafkaMemInfo.sh
PID=`jps|grep Kafka|grep -v grep|awk '{print $1}'` STATUS_FILE=process_status_$PID.log MAPS_FILE=process_maps_$PID.log FREEMEM_FILE=process_free_$PID.log if [ "x$PID" == "x" ] ; then echo "no pid found for kafka" else echo "catch status of pid:$PID" NOW_TIME=`date` echo "----------------------------------------$NOW_TIME catch process status info start--------------------------------">>$FREEMEM_FILE free -m>> $FREEMEM_FILE echo "----------------------------------------$NOW_TIME catch process mem info end--------------------------------">>$FREEMEM_FILE echo "----------------------------------------$NOW_TIME catch process status info start--------------------------------">>$STATUS_FILE cat /proc/$PID/status >> $STATUS_FILE echo "----------------------------------------$NOW_TIME catch process status info end--------------------------------">>$STATUS_FILE echo "process status file has been created as file$STATUS_FILE" echo "----------------------------------------$NOW_TIME catch process map info start--------------------------------">>$MAPS_FILE cat /proc/$PID/maps >> $MAPS_FILE echo "----------------------------------------$NOW_TIME catch process map info end--------------------------------">>$MAPS_FILE MAP_SIZE=`wc -l $MAPS_FILE|awk '{print $1}'` echo "process maps file has been created as file $MAPS_FILE, map size:$MAP_SIZE" fi
startKafkaAndCatch.sh
//啟動kafka server nohup bin/kafka-server-start.sh config/server.properties > /dev/null 2>&1 & //開始抓資源狀態,每100ms抓取一次 sh catchKafkaMemInfo.sh for i in {1..10} do echo "catch now..." sh catchKafkaMemInfo.sh sleep 0.1 done
主要抓取了map count、記憶體使用資訊等。
status相關欄位解釋
VmPeak: Peak virtual memory usage VmSize: Current virtual memory usage VmLck:Current mlocked memory VmHWM:Peak resident set size VmRSS:Resident set size VmData: Size of "data" segment VmStk:Size of stack VmExe:Size of "text" segment VmLib:Shared library usage VmPTE:Pagetable entries size VmSwap: Swap space used