1. 程式人生 > >ali的ons mq執行一段時間後消費下降並導致堆積問題查驗

ali的ons mq執行一段時間後消費下降並導致堆積問題查驗

1:問題現象:

執行的instance一段時間(20h)就下降,重啟之後消費正常然後又不行了;原以為是ons版本1.2.7改成laest1.7.7.final;沒效果;經驗之覺:肯定是程式碼沒優化好:

處理流程一:單純以為應該是gc沒做好;有big Object ;./jmap發現了MsgContent;查project使用 ConcurrentHashMap<String ,MsgContent>一直add沒有remove;so 新增remove並且就加上value = null;利於gc發現沒太大效果;

public static ConcurrentHashMap<String ,MsgContent> map = new ConcurrentHashMap<String ,MsgContent>();
//遍歷map中的value,然後檢視value中的time值是不是超過了兩分鐘,是的話就刪除掉對應的key  
public static void removeInvalidKey(ConcurrentHashMap<String,MsgContent> map){
        for (MsgContent value : map.values()) {
            if (System.currentTimeMillis()-value.getTime() > 2 * 60 * 1000) {
                MsgMatch.map.remove(value.getUid());
                value = null;//強制把物件設定null,check object被gc回收(System.gc())
            }
        }
    }
 num     #instances         #bytes  class name
----------------------------------------------
   1:        651850      208798320  [C
   2:        651267       15630408  java.lang.String
   3:         71571       10226008  <constMethodKlass>
   4:         71571        9172944  <methodKlass>
   5:          6020        6965584  <constantPoolKlass>
   6:         20793        5553840  [I
   7:        153195        4902240  java.util.HashMap$Entry
   8:         24879        4784448  [B
   9:        189633        4551192  java.util.concurrent.ConcurrentLinkedDeque$Node
  10:          6020        4496624  <instanceKlassKlass>
  11:          5076        4044384  <constantPoolCacheKlass>
  12:         78356        2507392  java.util.concurrent.ConcurrentHashMap$HashEntry
  13:         64274        2506768  com.xxx.xxx.access.mysql.entity.MsgContent

處理流程二:經過流程一;instance能正常跑(30h),還沒找到病原體;沒辦法去找thread Stack快照:發現執行緒runable一個地方(這時jvm已經小露病源了)如圖:

"ConsumeMessageThread_7" prio=10 tid=0x00007f6498008000 nid=0x43 runnable [0x00007f6558c82000]
   java.lang.Thread.State: RUNNABLE
	at java.util.concurrent.ConcurrentLinkedDeque.contains(ConcurrentLinkedDeque.java:1085)
	at com.xxx.xxxx.access.alimq.EvMsgRtListener.consume(EvMsgRtListener.java:169)
	at com.aliyun.openservices.ons.api.impl.rocketmq.ConsumerImpl$MessageListenerImpl.consumeMessage(ConsumerImpl.java:97)
	at com.aliyun.openservices.shade.com.alibaba.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- <0x000000070841ea38> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"ConsumeMessageThread_5" prio=10 tid=0x00007f6498004000 nid=0x42 runnable [0x00007f6558d83000]
   java.lang.Thread.State: RUNNABLE
	at java.util.concurrent.ConcurrentLinkedDeque.contains(ConcurrentLinkedDeque.java:1085)
	at com.xxx.xxxx.access.alimq.EvMsgRtListener.consume(EvMsgRtListener.java:169)
	at com.aliyun.openservices.ons.api.impl.rocketmq.ConsumerImpl$MessageListenerImpl.consumeMessage(ConsumerImpl.java:97)
	at com.aliyun.openservices.shade.com.alibaba.rocketmq.client.impl.consumer.ConsumeMessageConcurrentlyService$ConsumeRequest.run(ConsumeMessageConcurrentlyService.java:417)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- <0x000000070841f708> (a java.util.concurrent.ThreadPoolExecutor$Worker)

 程式碼此處:

 此處的queue是一個定時任務;涉及到遍歷及remove key操作,因為ConcurrentLinkedDeque此處操作會嚴重拖耗效能,每一次重構需要重新排序;詳細參考JAVA集合框架中的常用集合及其特點、適用場景、實現原理簡介

 

 此時問題發現註釋解決:總結一下:之前多次遇到過同樣場景:執行一段時間cpu飆升;消費能力下降;:也是涉及到遠端呼叫http SocketTimeout(5000)  ---》5000修改為1s;縮短時間,避免長時間進行響應阻塞,thread執行

CloseableHttpClient httpclient = HttpClients.createDefault();
		HttpPost http = new HttpPost(url);

		/**
		 * setConnectTimeout:設定連線超時時間,單位毫秒。
		 * setConnectionRequestTimeout:設定從connect Manager獲取Connection 超時時間,單位毫秒
		 * setSocketTimeout:請求獲取資料的超時時間,單位毫秒。 如果訪問一個介面,多少時間內無法返回資料,就直接放棄此次呼叫
		 */
		RequestConfig requestConfig = RequestConfig.custom().setConnectTimeout(5000).setConnectionRequestTimeout(1000)
				.setSocketTimeout(5000).build();
		http.setConfig(requestConfig);
		HttpEntity inEntity = EntityBuilder.create().setText(json).setContentType(ContentType.APPLICATION_JSON).build();
		http.setEntity(inEntity);
		CloseableHttpResponse response = httpclient.execute(http);

ps:提到這快取;設計快取要清楚各個元件效能及優缺點:

簡單一點用hashMap;上文就提到清理無效的資料時;如何徹底gc防止資料過多導致溢位;一個好的替代方案是weakHashMap;是使用弱引用維護一張雜湊表;but 作為專業快取,功能上略有不足;詳見:WeakHashMap和HashMap的區別;更詳細的:話說ReferenceQueue