1. 程式人生 > >為什麽logstash進程的CPU使用率100%?

為什麽logstash進程的CPU使用率100%?

agen gre queue comm format sele sched The zombie

機器上有個進程cpu使用率很高,近100%了,

Tasks: 120 total, 2 running, 118 sleeping, 0 stopped, 0 zombie
%Cpu(s): 99.0 us, 0.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
KiB Mem: 4048308 total, 3301480 used, 746828 free, 33500 buffers
KiB Swap: 0 total, 0 used, 0 free. 149784 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18400 admin 20 0 2201508 304712 3948 S 98.1 7.5 17335:45 java
17782 root 20 0 3130780 893164 4476 S 0.7 22.1 120:19.01 java

7 root      20   0       0      0      0 S  0.3  0.0   4:01.17 rcu_sched

1114 root 20 0 154200 9676 2360 S 0.3 0.2 40:41.38 AliYunDun

看下這個進程是幹什麽的,發現是logstash的一個進程,這個進程的作用是從kafka讀取數據,並將數據寫入到elasticsearch。


cat /proc/18400/cmdline | sed -e ‘s/x00/ /g‘
/opt/dtstack/java/bin/java
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Xmx128m -Xss2048k
-Djffi.boot.library.path=/opt/dtstack/logstash-2.1.1/vendor/jruby/lib/jni
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/opt/dtstack/logstash/heapdump.hprof
-Xbootclasspath/a:/opt/dtstack/logstash-2.1.1/vendor/jruby/lib/jruby.jar
-classpath : -Djruby.home=/opt/dtstack/logstash-2.1.1/vendor/jruby
-Djruby.lib=/opt/dtstack/logstash-2.1.1/vendor/jruby/lib
-Djruby.script=jruby
-Djruby.shell=/bin/sh org.jruby.Main
--1.9 /opt/dtstack/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent
-f /home/admin/logstash/dtlog/conf/logindexer-kafka.conf
-l /home/admin/logstash/dtlog/log/logindexer-kafka.log

看一下是哪個線程CPU消耗得厲害,可以使用top的-H選項查看線程的情況,使用-p選擇指定pid。

發現有個command是kafka的線程cpu使用率一直很高。

top -H -p 18400
top - 12:08:19 up 19 days, 57 min, 4 users, load average: 1.04, 1.19, 1.26
Threads: 31 total, 1 running, 30 sleeping, 0 stopped, 0 zombie
%Cpu(s): 99.0 us, 0.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st
KiB Mem: 4048308 total, 3321036 used, 727272 free, 36228 buffers
KiB Swap: 0 total, 0 used, 0 free. 167372 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18435 admin 20 0 2201508 304920 4120 R 97.9 7.5 17276:37 17837 admin 20 0 2201508 304920 4120 S 0.3 7.5 27:30.14 18400 admin 20 0 2201508 304920 4120 S 0.0 7.5 0:00.00 java
18413 admin 20 0 2201508 304920 4120 S 0.0 7.5 1:58.54 LogStash::Runne
18414 admin 20 0 2201508 304920 4120 S 0.0 7.5 0:00.00 java
18415 admin 20 0 2201508 304920 4120 S 0.0 7.5 0:23.43 java
18416 admin 20 0 2201508 304920 4120 S 0.0 7.5 1:42.88 java
18417 admin 20 0 2201508 304920 4120 S 0.0 7.5 0:00.35 java
18418 admin 20 0 2201508 304920 4120 S 0.0 7.5 0:00.00 java

可以使用jstack看看線程的調用棧,不過裏面的內容沒看明白。


jstack -F 18400 > tmp1.log
grep -A 50 18435 tmp1.log
Thread 18435: (state = IN_JAVA)

  • org.jruby.runtime.callsite.CachingCallSite.call(org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject, org.jruby.runtime.builtin.IRubyObject) @bci=36, line=134 (Compiled frame; information may be imprecise)
  • org.jruby.ast.CallNoArgNode.interpret(org.jruby.Ruby, org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject, org.jruby.runtime.Block) @bci=18, line=60 (Compiled frame)
  • org.jruby.ast.WhileNode.interpret(org.jruby.Ruby, org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject, org.jruby.runtime.Block) @bci=23, line=127 (Compiled frame)
  • org.jruby.ast.NewlineNode.interpret(org.jruby.Ruby, org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject, org.jruby.runtime.Block) @bci=41, line=105 (Compiled frame)
  • org.jruby.ast.BlockNode.interpret(org.jruby.Ruby, org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject, org.jruby.runtime.Block) @bci=33, line=71 (Compiled

到網上找找看是否有別人遇到這個問題,搜索關鍵詞

logstash kafka cpu high

發現還真有人遇到過這個問題:

http://stackoverflow.com/questions/34486960/logstash-kafka-input-filter-with-high-cpu-usage

The logstash-input-kafka plugin had a bug in its tight loop which unnecessarily checked for an empty queue and skipped to the next iteration instead of blocking.

This has been fixed in this pull request and version 2.0.3 of the plugin has been released with it.

To test this, please update the plugin using:

bin/plugin install --version 2.0.3 logstash-input-kafka

查看我們機器上安裝的plugin版本,

$ /opt/dtstack/logstash/bin/plugin list --verbose kafka
logstash-input-kafka (2.0.2)
logstash-output-kafka (2.0.1)

升級一下版本試試

$ /opt/dtstack/logstash/bin/plugin install logstash-input-kafka-2.0.3.gem
Validating logstash-input-kafka-2.0.3.gem
Installing logstash-input-kafka
Installation successful
$ /opt/dtstack/logstash/bin/plugin list --verbose kafka
logstash-input-kafka (2.0.3)
logstash-output-kafka (2.0.1)

CPU使用率恢復正常了。

為什麽logstash進程的CPU使用率100%?