1. 程式人生 > >JVM初探- 使用堆外記憶體減少Full GC

JVM初探- 使用堆外記憶體減少Full GC

JVM初探-使用堆外記憶體減少Full GC

標籤 : JVM

問題: 大部分主流網際網路企業線上Server JVM選用了CMS收集器(如Taobao、LinkedIn、Vdian), 雖然CMS可與使用者執行緒併發GC以降低STW時間, 但它也並非十分完美, 尤其是當出現Concurrent Mode Failure由並行GC轉入序列時, 將導致非常長時間的Stop The World(詳細可參考JVM初探- 記憶體分配、GC原理與垃圾收集器).

解決: 由GCIH可以聯想到: 將長期存活的物件(如Local Cache)移入堆外記憶體(off-heap, 又名直接記憶體/direct-memory

), 從而減少CMS管理的物件數量, 以降低Full GC的次數和頻率, 達到提高系統響應速度的目的.

引入

這個idea最初來源於TaobaoJVM對OpenJDK定製開發的GCIH部分(詳見撒迦的分享-JVM定製改進@淘寶), 其中GCIH就是將CMS Old Heap區的一部分劃分出來, 這部分記憶體雖然還在堆內, 但已不被GC所管理.將長生命週期Java物件放在Java堆外, GC不能管理GCIH內Java物件(GC Invisible Heap):

(圖片來源: [email protected] PPT)

  • 這樣做有兩方面的好處:
    1. 減少GC管理記憶體:
      由於GCIH會從Old區“切出”
      一塊, 因此導致GC管理區域變小, 可以明顯降低GC工作量, 提高GC效率, 降低Full GC STW時間(且由於這部分記憶體仍屬於堆, 因此其訪問方式/速度不變- 不必付出序列化/反序列化的開銷).
    2. GCIH內容程序間共享:
      由於這部分割槽域不再是JVM執行時資料的一部分, 因此GCIH內的物件可供對個JVM例項所共享(如一臺Server跑多個MR-Job可共享同一份Cache資料), 這樣一臺Server也就可以跑更多的VM例項.

(實際測試資料/圖示可下載撒迦分享PPT).

但是大部分的互聯公司不能像阿里這樣可以有專門的工程師針對自己的業務特點定製JVM, 因此我們只能”眼饞”GCIH帶來的效能提升卻無法”享用”. 但通用的JVM開放了介面可直接向作業系統申請堆外記憶體(ByteBuffer

or Unsafe), 而這部分記憶體也是GC所顧及不到的, 因此我們可用JVM堆外記憶體來模擬GCIH的功能(但相比GCIH不足的是需要付出serialize/deserialize的開銷).

JVM堆外記憶體

JVM初探 -JVM記憶體模型一文中介紹的Java執行時資料區域中是找不到堆外記憶體區域的:

因為它並不是JVM執行時資料區的一部分, 也不是Java虛擬機器規範中定義的記憶體區域, 這部分記憶體區域直接被作業系統管理.
在JDK 1.4以前, 對這部分記憶體訪問沒有光明正大的做法: 只能通過反射拿到Unsafe類, 然後呼叫allocateMemory()/freeMemory()來申請/釋放這塊記憶體. 1.4開始新加入了NIO, 它引入了一種基於Channel與Buffer的I/O方式, 可以使用Native函式庫直接分配堆外記憶體, 然後通過一個儲存在Java堆裡面的DirectByteBuffer物件作為這塊記憶體的引用進行操作, ByteBuffer提供瞭如下常用方法來跟堆外記憶體打交道:

API 描述
static ByteBuffer allocateDirect(int capacity) Allocates a new direct byte buffer.
ByteBuffer put(byte b) Relative put method (optional operation).
ByteBuffer put(byte[] src) Relative bulk put method (optional operation).
ByteBuffer putXxx(Xxx value) Relative put method for writing a Char/Double/Float/Int/Long/Short value (optional operation).
ByteBuffer get(byte[] dst) Relative bulk get method.
Xxx getXxx() Relative get method for reading a Char/Double/Float/Int/Long/Short value.
XxxBuffer asXxxBuffer() Creates a view of this byte buffer as a Char/Double/Float/Int/Long/Short buffer.
ByteBuffer asReadOnlyBuffer() Creates a new, read-only byte buffer that shares this buffer’s content.
boolean isDirect() Tells whether or not this byte buffer is direct.
ByteBuffer duplicate() Creates a new byte buffer that shares this buffer’s content.

下面我們就用通用的JDK API來使用堆外記憶體來實現一個local cache.

示例1.: 使用JDK API實現堆外Cache

注: 主要邏輯都集中在方法invoke()內, 而AbstractAppInvoker是一個自定義的效能測試框架, 在後面會有詳細的介紹.

/**
 * @author jifang
 * @since 2016/12/31 下午6:05.
 */
public class DirectByteBufferApp extends AbstractAppInvoker {

    @Test
    @Override
    public void invoke(Object... param) {
        Map<String, FeedDO> map = createInHeapMap(SIZE);

        // move in off-heap
        byte[] bytes = serializer.serialize(map);
        ByteBuffer buffer = ByteBuffer.allocateDirect(bytes.length);
        buffer.put(bytes);
        buffer.flip();

        // for gc
        map = null;
        bytes = null;
        System.out.println("write down");
        // move out from off-heap
        byte[] offHeapBytes = new byte[buffer.limit()];
        buffer.get(offHeapBytes);
        Map<String, FeedDO> deserMap = serializer.deserialize(offHeapBytes);
        for (int i = 0; i < SIZE; ++i) {
            String key = "key-" + i;
            FeedDO feedDO = deserMap.get(key);
            checkValid(feedDO);

            if (i % 10000 == 0) {
                System.out.println("read " + i);
            }
        }

        free(buffer);
    }

    private Map<String, FeedDO> createInHeapMap(int size) {
        long createTime = System.currentTimeMillis();

        Map<String, FeedDO> map = new ConcurrentHashMap<>(size);
        for (int i = 0; i < size; ++i) {
            String key = "key-" + i;
            FeedDO value = createFeed(i, key, createTime);
            map.put(key, value);
        }

        return map;
    }
}

由JDK提供的堆外記憶體訪問API只能申請到一個類似一維陣列的ByteBuffer, JDK並未提供基於堆外記憶體的實用資料結構實現(如堆外的MapSet), 因此想要實現Cache的功能只能在write()時先將資料put()到一個堆內的HashMap, 然後再將整個Map序列化後MoveInDirectMemory, 取快取則反之. 由於需要在堆內申請HashMap, 因此可能會導致多次Full GC. 這種方式雖然可以使用堆外記憶體, 但效能不高、無法發揮堆外記憶體的優勢.
幸運的是開源界的前輩開發了諸如EhcacheMapDBChronicle Map等一系列優秀的堆外記憶體框架, 使我們可以在使用簡潔API訪問堆外記憶體的同時又不損耗額外的效能.

其中又以Ehcache最為強大, 其提供了in-heap、off-heap、on-disk、cluster四級快取, 且Ehcache企業級產品(BigMemory Max / BigMemory Go)實現的BigMemory也是Java堆外記憶體領域的先驅.

示例2: MapDB API實現堆外Cache

public class MapDBApp extends AbstractAppInvoker {

    private static HTreeMap<String, FeedDO> mapDBCache;

    static {
        mapDBCache = DBMaker.hashMapSegmentedMemoryDirect()
                .expireMaxSize(SIZE)
                .make();
    }

    @Test
    @Override
    public void invoke(Object... param) {

        for (int i = 0; i < SIZE; ++i) {
            String key = "key-" + i;
            FeedDO feed = createFeed(i, key, System.currentTimeMillis());

            mapDBCache.put(key, feed);
        }

        System.out.println("write down");
        for (int i = 0; i < SIZE; ++i) {
            String key = "key-" + i;
            FeedDO feedDO = mapDBCache.get(key);
            checkValid(feedDO);

            if (i % 10000 == 0) {
                System.out.println("read " + i);
            }
        }
    }
}

結果 & 分析

  • DirectByteBufferApp
 S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
0.00   0.00   5.22  78.57  59.85     19    2.902    13    7.251   10.153
  • the last one jstat of MapDBApp
 S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
0.00   0.03   8.02   0.38  44.46    171    0.238     0    0.000    0.238

執行DirectByteBufferApp.invoke()會發現有看到很多Full GC的產生, 這是因為HashMap需要一個很大的連續陣列, Old區很快就會被佔滿, 因此也就導致頻繁Full GC的產生.
而執行MapDBApp.invoke()可以看到有一個DirectMemory持續增長的過程, 但FullGC卻一次都沒有了.

實驗: 使用堆外記憶體減少Full GC

實驗環境

  • java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
  • VM Options
-Xmx512M
-XX:MaxDirectMemorySize=512M
-XX:+PrintGC
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=80
-XX:+UseCMSInitiatingOccupancyOnly
  • 實驗資料
    170W條動態(FeedDO).

實驗程式碼

第1組: in-heap、affect by GC、no serialize

  • ConcurrentHashMapApp
public class ConcurrentHashMapApp extends AbstractAppInvoker {

    private static final Map<String, FeedDO> cache = new ConcurrentHashMap<>();

    @Test
    @Override
    public void invoke(Object... param) {

        // write
        for (int i = 0; i < SIZE; ++i) {
            String key = String.format("key_%s", i);
            FeedDO feedDO = createFeed(i, key, System.currentTimeMillis());
            cache.put(key, feedDO);
        }

        System.out.println("write down");
        // read
        for (int i = 0; i < SIZE; ++i) {
            String key = String.format("key_%s", i);
            FeedDO feedDO = cache.get(key);
            checkValid(feedDO);

            if (i % 10000 == 0) {
                System.out.println("read " + i);
            }
        }
    }
}

GuavaCacheApp類似, 詳細程式碼可參考完整專案.

第2組: off-heap、not affect by GC、need serialize

  • EhcacheApp
public class EhcacheApp extends AbstractAppInvoker {

    private static Cache<String, FeedDO> cache;

    static {
        ResourcePools resourcePools = ResourcePoolsBuilder.newResourcePoolsBuilder()
                .heap(1000, EntryUnit.ENTRIES)
                .offheap(480, MemoryUnit.MB)
                .build();

        CacheConfiguration<String, FeedDO> configuration = CacheConfigurationBuilder
                .newCacheConfigurationBuilder(String.class, FeedDO.class, resourcePools)
                .build();

        cache = CacheManagerBuilder.newCacheManagerBuilder()
                .withCache("cacher", configuration)
                .build(true)
                .getCache("cacher", String.class, FeedDO.class);

    }

    @Test
    @Override
    public void invoke(Object... param) {
        for (int i = 0; i < SIZE; ++i) {
            String key = String.format("key_%s", i);
            FeedDO feedDO = createFeed(i, key, System.currentTimeMillis());
            cache.put(key, feedDO);
        }

        System.out.println("write down");
        // read
        for (int i = 0; i < SIZE; ++i) {
            String key = String.format("key_%s", i);
            Object o = cache.get(key);
            checkValid(o);

            if (i % 10000 == 0) {
                System.out.println("read " + i);
            }
        }
    }
}

MapDBApp與前同.

第3組: off-process、not affect by GC、serialize、affect by process communication

  • LocalRedisApp
public class LocalRedisApp extends AbstractAppInvoker {

    private static final Jedis cache = new Jedis("localhost", 6379);

    private static final IObjectSerializer serializer = new Hessian2Serializer();

    @Test
    @Override
    public void invoke(Object... param) {
        // write
        for (int i = 0; i < SIZE; ++i) {
            String key = String.format("key_%s", i);
            FeedDO feedDO = createFeed(i, key, System.currentTimeMillis());

            byte[] value = serializer.serialize(feedDO);
            cache.set(key.getBytes(), value);

            if (i % 10000 == 0) {
                System.out.println("write " + i);
            }
        }

        System.out.println("write down");
        // read
        for (int i = 0; i < SIZE; ++i) {
            String key = String.format("key_%s", i);
            byte[] value = cache.get(key.getBytes());
            FeedDO feedDO = serializer.deserialize(value);
            checkValid(feedDO);

            if (i % 10000 == 0) {
                System.out.println("read " + i);
            }
        }
    }
}

RemoteRedisApp類似, 詳細程式碼可參考下面完整專案.

實驗結果

* ConcurrentMap Guava
TTC 32166ms/32s 47520ms/47s
Minor C/T 31/1.522 29/1.312
Full C/T 24/23.212 36/41.751
MapDB Ehcache
TTC 40272ms/40s 30814ms/31s
Minor C/T 511/0.557 297/0.430
Full C/T 0/0.000 0/0.000
LocalRedis NetworkRedis
TTC 176382ms/176s 1h+
Minor C/T 421/0.415 -
Full C/T 0/0.000 -

備註:
- TTC: Total Time Cost 總共耗時
- C/T: Count/Time 次數/耗時(seconds)

結果分析

對比前面幾組資料, 可以有如下總結:

  • 將長生命週期的大物件(如cache)移出heap可大幅度降低Full GC次數與耗時;
  • 使用off-heap儲存物件需要付出serialize/deserialize成本;
  • 將cache放入分散式快取需要付出程序間通訊/網路通訊的成本(UNIX Domain/TCP IP)

附:
off-heap的Ehcache能夠跑出比in-heap的HashMap/Guava更好的成績確實是我始料未及的O(∩_∩)O~, 但確實這些資料和堆記憶體的搭配導致in-heap的Full GC太多了, 當heap堆開大之後就肯定不是這個結果了. 因此在使用堆外記憶體降低Full GC前, 可以先考慮是否可以將heap開的更大.

附: 效能測試框架

在main函式啟動時, 掃描com.vdian.se.apps包下的所有繼承了AbstractAppInvoker的類, 然後使用Javassist為每個類生成一個代理物件: 當invoke()方法執行時首先檢查他是否標註了@Test註解(在此, 我們借用junit定義好了的註解), 並在執行的前後記錄方法執行耗時, 並最終對比每個實現類耗時統計.

  • 依賴
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-proxy</artifactId>
    <version>${commons.proxy.version}</version>
</dependency>
<dependency>
    <groupId>org.javassist</groupId>
    <artifactId>javassist</artifactId>
    <version>${javassist.version}</version>
</dependency>
<dependency>
    <groupId>com.caucho</groupId>
    <artifactId>hessian</artifactId>
    <version>${hessian.version}</version>
</dependency>
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>${guava.version}</version>
</dependency>
<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>${junit.version}</version>
</dependency>

啟動類: OffHeapStarter

/**
 * @author jifang
 * @since 2017/1/1 上午10:47.
 */
public class OffHeapStarter {

    private static final Map<String, Long> STATISTICS_MAP = new HashMap<>();

    public static void main(String[] args) throws IOException, IllegalAccessException, InstantiationException {
        Set<Class<?>> classes = PackageScanUtil.scanPackage("com.vdian.se.apps");
        for (Class<?> clazz : classes) {
            AbstractAppInvoker invoker = createProxyInvoker(clazz.newInstance());
            invoker.invoke();

            //System.gc();
        }

        System.out.println("********************* statistics **********************");
        for (Map.Entry<String, Long> entry : STATISTICS_MAP.entrySet()) {
            System.out.println("method [" + entry.getKey() + "] total cost [" + entry.getValue() + "]ms");
        }
    }

    private static AbstractAppInvoker createProxyInvoker(Object invoker) {
        ProxyFactory factory = new JavassistProxyFactory();
        Class<?> superclass = invoker.getClass().getSuperclass();
        Object proxy = factory
                .createInterceptorProxy(invoker, new ProfileInterceptor(), new Class[]{superclass});
        return (AbstractAppInvoker) proxy;
    }

    private static class ProfileInterceptor implements Interceptor {

        @Override
        public Object intercept(Invocation invocation) throws Throwable {
            Class<?> clazz = invocation.getProxy().getClass();
            Method method = clazz.getMethod(invocation.getMethod().getName(), Object[].class);

            Object result = null;
            if (method.isAnnotationPresent(Test.class)
                    && method.getName().equals("invoke")) {

                String methodName = String.format("%s.%s", clazz.getSimpleName(), method.getName());
                System.out.println("method [" + methodName + "] start invoke");

                long start = System.currentTimeMillis();
                result = invocation.proceed();
                long cost = System.currentTimeMillis() - start;

                System.out.println("method [" + methodName + "] total cost [" + cost + "]ms");

                STATISTICS_MAP.put(methodName, cost);
            }

            return result;
        }
    }
}
  • 包掃描工具: PackageScanUtil
public class PackageScanUtil {

    private static final String CLASS_SUFFIX = ".class";

    private static final String FILE_PROTOCOL = "file";

    public static Set<Class<?>> scanPackage(String packageName) throws IOException {

        Set<Class<?>> classes = new HashSet<>();
        String packageDir = packageName.replace('.', '/');
        Enumeration<URL> packageResources = Thread.currentThread().getContextClassLoader().getResources(packageDir);
        while (packageResources.hasMoreElements()) {
            URL packageResource = packageResources.nextElement();

            String protocol = packageResource.getProtocol();
            // 只掃描專案內class
            if (FILE_PROTOCOL.equals(protocol)) {
                String packageDirPath = URLDecoder.decode(packageResource.getPath(), "UTF-8");
                scanProjectPackage(packageName, packageDirPath, classes);
            }
        }

        return classes;
    }

    private static void scanProjectPackage(String packageName, String packageDirPath, Set<Class<?>> classes) {

        File packageDirFile = new File(packageDirPath);
        if (packageDirFile.exists() && packageDirFile.isDirectory()) {

            File[] subFiles = packageDirFile.listFiles(new FileFilter() {
                @Override
                public boolean accept(File pathname) {
                    return pathname.isDirectory() || pathname.getName().endsWith(CLASS_SUFFIX);
                }
            });

            for (File subFile : subFiles) {
                if (!subFile.isDirectory()) {
                    String className = trimClassSuffix(subFile.getName());
                    String classNameWithPackage = packageName + "." + className;

                    Class<?> clazz = null;
                    try {
                        clazz = Class.forName(classNameWithPackage);
                    } catch (ClassNotFoundException e) {
                        // ignore
                    }
                    assert clazz != null;

                    Class<?> superclass = clazz.getSuperclass();
                    if (superclass == AbstractAppInvoker.class) {
                        classes.add(clazz);
                    }
                }
            }
        }
    }

    // trim .class suffix
    private static String trimClassSuffix(String classNameWithSuffix) {
        int endIndex = classNameWithSuffix.length() - CLASS_SUFFIX.length();
        return classNameWithSuffix.substring(0, endIndex);
    }
}

注: 在此僅掃描專案目錄下的單層目錄的class檔案, 功能更強大的包掃描工具可參考Spring原始碼或Touch原始碼中的PackageScanUtil.

AppInvoker基類: AbstractAppInvoker

提供通用測試引數 & 工具函式.

public abstract class AbstractAppInvoker {

    protected static final int SIZE = 170_0000;

    protected static final IObjectSerializer serializer = new Hessian2Serializer();

    protected static FeedDO createFeed(long id, String userId, long createTime) {

        return new FeedDO(id, userId, (int) id, userId + "_" + id, createTime);
    }

    protected static void free(ByteBuffer byteBuffer) {
        if (byteBuffer.isDirect()) {
            ((DirectBuffer) byteBuffer).cleaner().clean();
        }
    }

    protected static void checkValid(Object obj) {
        if (obj == null) {
            throw new RuntimeException("cache invalid");
        }
    }

    protected static void sleep(int time, String beforeMsg) {
        if (!Strings.isNullOrEmpty(beforeMsg)) {
            System.out.println(beforeMsg);
        }

        try {
            Thread.sleep(time);
        } catch (InterruptedException ignored) {
            // no op
        }
    }


    /**
     * 供子類繼承 & 外界呼叫
     *
     * @param param
     */
    public abstract void invoke(Object... param);
}

序列化/反序列化介面與實現

public interface IObjectSerializer {

    <T> byte[] serialize(T obj);

    <T> T deserialize(byte[] bytes);
}
public class Hessian2Serializer implements IObjectSerializer {

    private static final Logger LOGGER = LoggerFactory.getLogger(Hessian2Serializer.class);

    @Override
    public <T> byte[] serialize(T obj) {
        if (obj != null) {
            try (ByteArrayOutputStream os = new ByteArrayOutputStream()) {

                Hessian2Output out = new Hessian2Output(os);
                out.writeObject(obj);
                out.close();
                return os.toByteArray();

            } catch (IOException e) {
                LOGGER.error("Hessian serialize error ", e);
                throw new CacherException(e);
            }
        }
        return null;
    }

    @SuppressWarnings("unchecked")
    @Override
    public <T> T deserialize(byte[] bytes) {
        if (bytes != null) {
            try (ByteArrayInputStream is = new ByteArrayInputStream(bytes)) {

                Hessian2Input in = new Hessian2Input(is);
                T obj = (T) in.readObject();
                in.close();

                return obj;

            } catch (IOException e) {
                LOGGER.error("Hessian deserialize error ", e);
                throw new CacherException(e);
            }
        }
        return null;
    }
}

GC統計工具

#!/bin/bash

pid=`jps | grep $1 | awk '{print $1}'`
jstat -gcutil ${pid} 400 10000
  • 使用
    sh jstat-uti.sh ${u-main-class}

附加: 為什麼在實驗中in-heap cache的Minor GC那麼少?
現在我還不能給出一個確切地分析答案, 有的同學說是因為CMS Full GC會連帶一次Minor GC, 而用jstat會直接計入Full GC, 但檢視詳細的GC日誌也並未發現什麼端倪. 希望有了解的同學可以在下面評論區可以給我留言, 再次先感謝了O(∩_∩)O~.