1. 程式人生 > >thrift java.io.IOException: Connection reset by peer引發的oom

thrift java.io.IOException: Connection reset by peer引發的oom

server 退出前的異常log資訊:

[2015-06-20 10:45:56,713 WARN] Got an IOException in internalRead!
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:142)
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:539)
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
        at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
        at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.select(TThreadedSelectorServer.java:590)
        at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.run(TThreadedSelectorServer.java:545)
[2015-06-20 10:46:08,110 ERROR] run() exiting due to uncaught error
java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:371)
        at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
        at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.select(TThreadedSelectorServer.java:590)
        at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.run(TThreadedSelectorServer.java:545)

一個thrift rpc 請求接收過程如下:(參考中文註釋)

/**
     * Give this FrameBuffer a chance to read. The selector loop should have
     * received a read event for this FrameBuffer.
     * 
     * @return true if the connection should live on, false if it should be
     *         closed
     */
    public boolean read() {
      if (state_ == FrameBufferState.READING_FRAME_SIZE) {
        // try to read the frame size completely 
		// 讀取 frame 頭,即frame 的size,根據 thrift 協議定義,應該是請求體前4個位元組
        if (!internalRead()) { 
          return false;
        }

        // if the frame size has been read completely, then prepare to read the
        // actual frame.
        if (buffer_.remaining() == 0) {
          // pull out the frame size as an integer.
          int frameSize = buffer_.getInt(0);
          if (frameSize <= 0) {
            LOGGER.error("Read an invalid frame size of " + frameSize
                + ". Are you using TFramedTransport on the client side?");
            return false;
          }

          // if this frame will always be too large for this server, log the
          // error and close the connection.
		  // frame 長度異常判斷
          if (frameSize > MAX_READ_BUFFER_BYTES) {   
            LOGGER.error("Read a frame size of " + frameSize
                + ", which is bigger than the maximum allowable buffer size for ALL connections.");
            return false;
          }

          // if this frame will push us over the memory limit, then return.
          // with luck, more memory will free up the next time around.
		   // frame 長度異常判斷
          if (readBufferBytesAllocated.get() + frameSize > MAX_READ_BUFFER_BYTES) {
            return true; 
          }

          // increment the amount of memory allocated to read buffers
          readBufferBytesAllocated.addAndGet(frameSize + 4); 

          // reallocate the readbuffer as a frame-sized buffer
		  // 為 frame 分配足夠大記憶體空間, 從 log 列印資訊來看 server 就是在這個呼叫點 oom 了
          buffer_ = ByteBuffer.allocate(frameSize + 4);  
          buffer_.putInt(frameSize);

          state_ = FrameBufferState.READING_FRAME;
        } else {
          // this skips the check of READING_FRAME state below, since we can't
          // possibly go on to that state if there's data left to be read at
          // this one.
          return true;
        }
      }

      // it is possible to fall through from the READING_FRAME_SIZE section
      // to READING_FRAME if there's already some frame data available once
      // READING_FRAME_SIZE is complete.

      if (state_ == FrameBufferState.READING_FRAME) {
		// 讀取 frame 體
        if (!internalRead()) {
          return false; 
        }

        // since we're already in the select loop here for sure, we can just
        // modify our selection key directly.
        if (buffer_.remaining() == 0) {
          // get rid of the read select interests
          selectionKey_.interestOps(0);
          state_ = FrameBufferState.READ_FRAME_COMPLETE;
        }

        return true;
      }

      // if we fall through to this point, then the state must be invalid.
      LOGGER.error("Read was called but state is invalid (" + state_ + ")");
      return false;
    } 
	
    /**
     * Perform a read into buffer.
     * 
     * @return true if the read succeeded, false if there was an error or the
     *         connection closed.
     */
	 // 讀 SocketChannel
    private boolean internalRead() {
      try {
        if (trans_.read(buffer_) < 0) {
          return false;
        }
        return true;
      } catch (IOException e) {
        LOGGER.warn("Got an IOException in internalRead!", e);
        return false;
      }
    }

程式碼分析到這裡問題就可以定位到是 frame size(MAX_READ_BUFFER_BYTES) 過大導致的異常退出。

 public static abstract class AbstractNonblockingServerArgs<T extends AbstractNonblockingServerArgs<T>> extends AbstractServerArgs<T> {
	  // 預設是 Long.MAX_VALUE,肯定 OOM
    public long maxReadBufferBytes = Long.MAX_VALUE;

    public AbstractNonblockingServerArgs(TNonblockingServerTransport transport) {
      super(transport);
      transportFactory(new TFramedTransport.Factory());
    }
  }
  
 /**
   * The maximum amount of memory we will allocate to client IO buffers at a
   * time. Without this limit, the server will gladly allocate client buffers
   * right into an out of memory exception, rather than waiting.
   */
   // 該註釋指出了問題所在
  final long MAX_READ_BUFFER_BYTES;

  // 可以通過建構函式設定MAX_READ_BUFFER_BYTES
  public AbstractNonblockingServer(AbstractNonblockingServerArgs args) {
    super(args);
    MAX_READ_BUFFER_BYTES = args.maxReadBufferBytes;
  }

問題復現(驗證)

  1. echo “something”    nc 到 thrift 埠
  2. telnet 到 thrift 埠,隨意敲幾個字元

thrift server log中列印幾行錯誤資訊後即退出。

原因:

異常的客戶端連線到 thrift server 埠,隨機的傳送資料導致 thrift協議解析到異常大的 frameSize。(內網攻擊)

解決辦法:

使用 TFramedTransport,設定初始化函式中maxLength,預設值DEFAULT_MAX_LENGTH = 16384000;
其他請務必設定AbstractNonblockingServerArgs.maxReadBufferBytes,預設值Long.MAX_VALUE;

例如:

服務端修改

客戶端

參考:

http://www.concurrent.work/java/thrift/case/one-java-thrift-case-caused-by-default-parameters/

http://www.voidcn.com/article/p-nvaknkpy-bpn.html