1. 程式人生 > >記一次線上請求緩慢問題處理

記一次線上請求緩慢問題處理

9月的某天晚上,接到通知,線上環境請求非常慢,頁面開啟要1分鐘,遂按照系統引數,資料庫,應用,web伺服器狀態逐步排查的步驟,最終定位到問題並得以解決,此處做個檢查流程梳理,供大家參考。

問題描述:正式環境請求響應異常慢,頁面開啟需要1分鐘左右時間等。

第一步 系統引數檢查  1 df -h 檢查硬碟空間 

重點看 Use這一欄,可以看到空間足夠,硬碟空間ok

2 free 檢查記憶體使用情況 

對比total和used兩欄,或者直接看free這一欄,預設是單位是KB  free [-b -k -m] [-o] [-s delay] [-t] [-V]  -b -k -m:分別以位元組(KB、MB)為單位顯示記憶體使用情況  可以發現記憶體也是ok  3 top命令檢查執行緒的cpu負載情況 

重點看CPU這一欄,我這個圖是後來補的圖,現場的圖忘記儲存了,當時發現有個java執行緒的CPU的使用率超過了100%,此處可發現有問題  4 通過jstack 命令檢視執行緒記憶體資訊  jstack pid

[email protected]_16_1_214 webapps]# jstack 9495 9495: Unable to open socket file: target process not responding or HotSpot VM not loaded The -F option can be used when the target process is not responding

[email protected]_16_1_214 logs]# tail -n 800 catalina.out      at java.lang.Thread.run(Thread.java:745)

"updatePromotionStatusTask-8-exe0" daemon prio=10 tid=0x00000000017cb000 nid=0x257b runnable [0x00007f9475ca8000]    java.lang.Thread.State: RUNNABLE     at java.net.SocketInputStream.socketRead0(Native Method)     at java.net.SocketInputStream.read(SocketInputStream.java:152)     at java.net.SocketInputStream.read(SocketInputStream.java:122)     at java.net.SocketInputStream.read(SocketInputStream.java:108)     at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:196)     at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)     at redis.clients.jedis.Protocol.process(Protocol.java:151)     at redis.clients.jedis.Protocol.read(Protocol.java:215)     at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340)     at redis.clients.jedis.Connection.getIntegerReply(Connection.java:265)     at redis.clients.jedis.Jedis.hexists(Jedis.java:779)     at cn.htd.common.dao.util.RedisDB.getHash(RedisDB.java:255)     at cn.htd.common.util.DictionaryUtils.getDictionaryOptList(DictionaryUtils.java:250)     at cn.htd.common.util.DictionaryUtils.getValueByCode(DictionaryUtils.java:207)     at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask.execute(UpdatePromotionStatusTask.java:143)     at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask.execute(UpdatePromotionStatusTask.java:32)     at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask$$FastClassBySpringCGLIB$$b13b01ff.invoke(<generated>)     at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)     at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:720)     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)     at org.springframework.aop.framework.adapter.AfterReturningAdviceInterceptor.invoke(AfterReturningAdviceInterceptor.java:52)     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)     at cn.htd.common.interceptor.MethodAccessLogInterceptor.invoke(MethodAccessLogInterceptor.java:48)     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)     at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)     at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)     at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:655)     at cn.htd.promotion.cpc.biz.task.UpdatePromotionStatusTask$$EnhancerBySpringCGLIB$$688d5abb.execute(<generated>)     at com.taobao.pamirs.schedule.taskmanager.TBScheduleProcessorSleep.run(TBScheduleProcessorSleep.java:239)     at java.lang.Thread.run(Thread.java:745)

"DubboServerHandler-172.16.1.214:28098-thread-25" daemon prio=10 tid=0x00007f9408032000 nid=0x257a waiting on condition [0x00007f9475daa000]    java.lang.Thread.State: WAITING (parking)     at sun.misc.Unsafe.park(Native Method)     - parking to wait for  <0x00000000c5fefe50> (a java.util.concurrent.SynchronousQueue$TransferStack)     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)     at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)     at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)     at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)     at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)     at java.lang.Thread.run(Thread.java:745)

"DubboServerHandler-172.16.1.214:28098-thread-24" daemon prio=10 tid=0x00007f9408030000 nid=0x2577 waiting on condition [0x00007f9475eab000]    java.lang.Thread.State: WAITING (parking)     at sun.misc.Unsafe.park(Native Method)     - parking to wait for  <0x00000000c5fefe50> (a java.util.concurrent.SynchronousQueue$TransferStack)     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)     at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)     at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)     at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:925)     at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)     at java.lang.Thread.run(Thread.java:745)

"cleanPromotionBargainScheduleTask-3-exe4" daemon prio=10 tid=0x00007f948002c800 nid=0x2576 in Object.wait() [0x00007f9475fac000]    java.lang.Thread.State: WAITING (on object monitor)     at java.lang.Object.wait(Native Method)     at java.lang.Object.wait(Object.java:503)     at com.taobao.pamirs.schedule.taskmanager.LockObject.waitCurrentThread(LockObject.java:14)     - locked <0x00000000c60fabf0> (a java.lang.Object)     at com.taobao.pamirs.schedule.taskmanager.TBScheduleProcessorSleep.run(TBScheduleProcessorSleep.java:296)     at java.lang.Thread.run(Thread.java:745) JNI global references: 214

Heap  PSYoungGen      total 278528K, used 253517K [0x00000000ec000000, 0x0000000100000000, 0x0000000100000000)   eden space 253952K, 99% used [0x00000000ec000000,0x00000000fb7953a0,0x00000000fb800000)   from space 24576K, 0% used [0x00000000fb800000,0x00000000fb800000,0x00000000fd000000)   to   space 38400K, 0% used [0x00000000fda80000,0x00000000fda80000,0x0000000100000000)  ParOldGen       total 655360K, used 655313K [0x00000000c4000000, 0x00000000ec000000, 0x00000000ec000000)   object space 655360K, 99% used [0x00000000c4000000,0x00000000ebff47c8,0x00000000ec000000)  PSPermGen       total 51200K, used 50805K [0x00000000bee00000, 0x00000000c2000000, 0x00000000c4000000)   object space 51200K, 99% used [0x00000000bee00000,0x00000000c1f9d6e0,0x00000000c2000000) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 看最後幾行,可以看到PSYoungGen ,ParOldGen,PSPermGen都已達到99%的使用率,由此可推斷應用中存線上程阻塞,那麼阻塞在什麼地方呢,仔細檢視上面的日誌資訊,可以看到  t redis.clients.jedis.Jedis.hexists(Jedis.java:779)  at cn.htd.common.dao.util.RedisDB.getHash(RedisDB.java:255)  at cn.htd.common.util.DictionaryUtils.getDictionaryOptList(DictionaryUtils.java:250)  這兩行程式碼可以看到是redis獲取資料時出了問題,那麼是不是redis有問題呢,後證實確實是redis出了問題。  我們首選會推測redis服務是不是掛了,趕緊通過本地連線上redis,發現可以連通,也可以正常檢視key資訊。  第二步 檢查redis服務  1 先看redis伺服器負載,發現CPU居高不下  2 連上redis服務,通過keys * 檢視儲存情況,發現多出來7w+的促銷券的key,且還在增加

由此,推斷出此時由於redis在執行大量的插入操作,而redis是單執行緒的,我們出異常的應用剛好有個頻繁的定時任務要讀redis,reids伺服器正忙,定時任務執行緩慢,不斷堆積,從而擠爆了記憶體空間,導致應用響應緩慢。