1. 程式人生 > >成為Java GC專家(5):Java應用效能調優的原則

成為Java GC專家(5):Java應用效能調優的原則

    This is the fifth article in the series of "Become a Java GC Expert". In the first issueUnderstanding Java Garbage Collection we have learned about the processes for different GC algorithms, about how GC works, what Young and Old Generation is, what you should know about the 5 types of GC in the new JDK 7, and what the performance implications are for each of these GC types.

    In the second article How to Monitor Java Garbage Collection we have explained how JVM actually runs the Garbage Collection in the real time, how we can monitor GC, and which tools we can use to make this process faster and more effective.

    In the third article How to Tune Java Garbage Collection

 we have shown some of the best options based on real cases as our examples that you can use for GC tuning. Also we have explained how to minimize the number of objects passed to Old Area, decreasing Full GC time, as well as how to set GC type and the memory size.

    In the fourth article MaxClients in Apache and its effect on Tomcat during Full GC

 we have explained the importance of MaxClients parameter in Apache that significantly affects the overall system performance when GC occurs.

    In this fifth article I will explain about the principles of Java application performance tuning. Specifically, I will explain what is required in order to tune the performance of Java application, the steps you need to perform to identify whether your application needs tuning. I will also explain the problems you may encounter during performance tuning. The article will be finalized with the recommendations you need to follow to make better decisions when tuning Java applications.

Overview

    Not every application requires tuning. If an application performs as well as expected, you don't need to exert additional efforts to enhance its performance. However, it would be difficult to expect an application would reach its target performance as soon as it finishes debugging. This is when tuning is required. Regardless of the implementation language, tuning an application requires high expertise and concentration.Also, you may not use the same method for tuning a certain application to tune another application. This is because each application has its unique action and a different type of resource usage. For this reason, tuning an application requires more basic knowledge compared to the knowledge required to write an application. For example, you need knowledge on virtual machines, operating systems and computer architectures. When you focus on an application domain based on such knowledge, you can successfully tune an application.

    Sometimes Java application tuning requires only changing JVM options, such asGarbage Collector, but sometimes it requires changing the application source code. Whichever method you choose, you need to monitor the process of executing the Java application first. For this reason, the issues this article will deal with are as follows:

  • How can I monitor a Java application?
  • What JVM options should I give?
  • How can I know if modifying source codes is required or not?

Knowledge Required to Tune the Performance of Java Applications

    Java applications operate inside Java Virtual Machine (JVM). Therefore, to tune a Java application, you need tounderstand the JVM operation process. I have previously blogged aboutUnderstanding JVM Internals where you can find great insights about JVM.

    The knowledge regarding the process of the operation of JVM in this article mainly refers to theknowledge of Garbage Collection (GC) and Hotspot. Although you may not be able to tune the performance of all kinds of Java applications only with the knowledge on GC or Hotspot, these two factors influence the performance of Java applications in most cases.

    It is noted that from the perspective of an operating system JVM is also an application process. To make an environment in which a JVM can operate well, you should understand how an OS allocates resources to processes. This means, to tune the performance of Java applications, you should have an understanding of OS or hardware as well as JVM itself.

    Another aspect is that knowledge of Java language domain is also important. It is also important tounderstand lock or concurrency and to be familiar with class loading or object creation.

    When you carry out Java application performance tuning, you should approach it by integrating all this knowledge.

The Process of Java Application Performance Tuning 

    Figure 1 shows a flow chart from the book <Java Performance> co-authored by Charlie Hunt and Binu John. This chart shows the process of Java application performance tuning.


Figure 1: The Process of Tuning the Performance of Java Applications.

    The above process is not a one-time process. You may need to repeat it until the tuning is completed. This also applies to determining an expected performance value. In the process of tuning, sometimes you should lower the expected performance value, and sometimes raise it.

JVM distribution model

    A JVM distribution model is related with making a decision on whether to operate Java applications on a single JVM or to operate them on multiple JVMs.You can decide it according to its availability, responsiveness and maintainability. When operating JVM on multiple servers, you can also decide whether to run multiple JVMs on a single server or to run a single JVM per server. For example, for each server, you can decide whether to run a single JVM using a heap of 8 GB, or to use four JVMs each using a heap of 2 GB. Of course, you can decide the number of JVMs running on a single server depending on the number of cores and the characteristics of the application. When comparing the two settings in terms of responsiveness, it might be more advantageous to use a heap of 2 GB rather than 8 GB for the same application, for it takes shorter to perform a full garbage collection when using a heap of 2 GB. If you use a heap of 8 GB, however, you can reduce the frequency of full GCs. You can also improve responsiveness by increasing the hit rate if the application uses internal cache. Therefore, you can choose a suitable distribution model by taking into account the characteristics of the application and the method to overcome the disadvantage of the model you chose for some advantages.

JVM architecture

    Selecting a JVM means whether to use a 32-bit JVM or a 64-bit JVM. Under the same conditions, you had better choose a 32-bit JVM. This isbecause a 32-bit JVM performs better than a 64-bit JVM. However, the maximum logical heap size of a 32-bit JVM is 4 GB. (However, actual allocatable size for both 32-bit OS and 64-bit OS is 2-3 GB.) It is appropriate to use a 64-bit JVM when a heap size larger than this is required.

Table 1: Performance Comparison (source).
Benchmark Time (sec) Factor
C++ Opt 23 1.0x
C++ Dbg 197 8.6x
Java 64-bit 134 5.8x
Java 32-bit 290 12.6x
Java 32-bit GC* 106 4.6x
Java 32-bit SPEC GC* 89 3.7x
Scala 82 3.6x
Scala low-level* 67 2.9x
Scala low-level GC* 58 2.5x
Go 6g 161 7.0x
Go Pro* 126 5.5x

    The next step is to run the application and to measure its performance. This process includes tuning GC, changing OS settings and modifying codes. For these tasks, you can use a system monitoring tool or a profiling tool.

    It should be noted that tuning for responsiveness and tuning for throughput could be different approaches.Responsiveness will be reduced ifstop-the-world occurs from time to time, for example, for a full garbage collection despite a large amount of throughput per unit time. You also need to consider that a trade-off could occur. Such trade-off could occur not only between responsiveness and throughput. You may need to use more CPU resources to reduce memory usage or put up with reduction in responsiveness or throughput. As opposite cases could likewise occur, you need to approach it according to the priority.

    The flow chart of Figure 1 above shows the performance tuning approach for almost all kinds of Java applications, including Swing applications. However, this chart is somewhat unsuitable for writing a server application for Internet service as our company NHN does. The flow chart inFigure 2 below is a simpler procedure designed based onFigure 1 to be more suitable for NHN.

Figure 2: A Recommended Procedure for Tuning NHN's Java Applications.

    Select JVM in the above flow chart means using a 32-bit JVM as much as possible except when you need to use a 64-bit JVM to maintain cache of several GB.

    Now, based on the flow chart in Figure 2, you will learn about things to do to execute each of the steps.q

JVM Options

    I will explain how to specify suitable JVM options mainly for a web application server. Despite not being applied to every case,the best GC algorithm, especially for web server applications, is theConcurrent Mark Sweep GC. This is because what matters is low latency. Of course, when using the Concurrent Mark Sweep, sometimes a very long stop-the-world phenomenon could take place due to fractions. Nevertheless, this problem is likely to be resolved by adjusting the new area size or the fraction ratio.

    Specifying the new area size is as important as specifying theentire heap size. You had better specify the ratio of the new area size to the entire heap size by using–XX:NewRatio or specify the desired new area size by using the –XX:NewSize option. Specifying a new area size is important because most objects cannot survive long. In web applications, most objects, except cache data, are generated whenHttpResponse toHttpRequest is created. This time hardly exceeds a second. This means the life of objects does not exceed a second, either. If the new area size is not large, it should be moved to the old area to make space for newly created objects. The cost for GC for the old area is much bigger than that for the new area; therefore,it is good to set the size of the new area sufficiently.

    If the new area size exceeds a certain level, however, responsiveness will be reduced. This is because the garbage collection for the new area is basically to copy data from one survivor area to another survivor area. Also, the stop-the-world phenomenon will occur even when performing GC for the new area as well as the old area. If the new area becomes bigger, the survivor area size will increase, and thus the size of the data to copy will increase as well. Given such characteristics, it is good to set a suitable new area size by referring to the NewRatio of HotSpot JVM by OS.

Table 2: NewRatio by OS and option.
OS and option Default -XX:NewRatio
Sparc -server 2
Sparc -client 8
x86 -server 8
x86 -client 12

    If the NewRatio is specified, 1/(NewRatio +1) of the entire heap size becomes the new area size. You will find the NewRatio ofSparc -server is very small. This is because the Sparc system was used for more high-end use than x86 when default values were specified. Now it is common to use the x86 server and its performance has also been improved.Thus it is better to specify 2 or 3, which is the value similar to that of the Sparc -server.

    You can also specify NewSize and MaxNewSize instead of NewRatio. The new area is created as much as the value specified for NewSize and the size increments as much as the value specified for MaxNewSize. The Eden or Survivor area also increases according to the (specified or default) ratio.As you specify the same size for -Xs and -Xmx, it is a very good choice to specify the same size for MaxSize and MaxNewSize.

    If you have specified both NewRatio and NewSize, you should use the bigger one. Therefore, when a heap has been created, you can express the initial New area size as follows:

min(MaxNewSize, max(NewSize, heap/(NewRatio+1)))

    However, it is impossible to determine the appropriate entire heap size and New area size in a single attempt. Based on my experience running Web server applications at NHN, I recommend to run Java applications with the following JVM options. After monitoring the performance of the application with these options, you can use a more suitable GC algorithm or options.

Table 3: Recommended JVM options.
Type Option
Operation mode -sever
Entire heap size Specify the same value for -Xms and -Xmx.
New area size -XX:NewRatio: value of 2 to 4
-XX:NewSize=? –XX:MaxNewSize=?. Also good to specifyNewSize instead ofNewRatio.
Perm size -XX:PermSize=256 m -XX:MaxPermSize=256 m. Specify the value to an extent not to cause any trouble in the operation because it does not affect the performance.
GC log -Xloggc:$CATALINA_BASE/logs/gc.log -XX:+PrintGCDetails-XX:+PrintGCDateStamps. Leaving a GC log does not particularly affect the performance of Java applications. You are recommended to leave a GC log as much as possible.
GC algorithm -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75.This is only a generally recommendable configuration. Other choices could be better depending on the characteristics of the application.
Creating a heap dump when an OOM error occurs -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$CATALINA_BASE/logs
Actions after an OOM occurs -XX:OnOutOfMemoryError=$CATALINA_HOME/bin/stop.sh or -XX:OnOutOfMemoryError=$CATALINA_HOME/bin/restart.sh. After leaving a heap dump, take a proper operation according to a management policy.

Measuring the Performance of Applications

    The information to acquire to grasp the performance of an application is as follows:

  • TPS (OPS): The information required to understand the performance of an application conceptually. 
  • Request Per Second (RPS): Strictly speaking, RPS is different from responsiveness, but you can understand it as responsiveness. Through RPS, you can check the time it takes for the user to see the result. 
  • RPS Standard Deviation: It is necessary to induce even RPS if possible. If a deviation occurs, you need to check GC tuning or interworking systems.  

    To obtain a more accurate performance result, you should measure it after warming up the application sufficiently. This is because byte code is expected to be compiled by HotSpot JIT.In general, you can measure actual performance values after applying load to a certain feature for at least 10 minutes by usingnGrinder load testing tool.

Tuning in Earnest

    You don't need to tune the performance of an application if the result of the execution of nGrinder meets the expectation. If the performance does not meet the expectation, you need to carry out tuning to resolve problems. Now you will see the approach by case.

In the event the Stop-the-World takes long

    Long stop-the-world time could result from inappropriate GC options or incorrect implementation. You can decide the cause according to the result of a profiler or a heap dump. This means you can judge the cause after checking the type and number of objects of a heap. If you find many unnecessary objects, you had better modify source codes. If you find no particular problem in the process of creating objects, you had better simply change GC options.

    To adjust GC options appropriately, you need to have GC log secured for a sufficient period of time. You need to understand in which situation the stop-the-world takes a long time. For more information on the selection of appropriate GC options, read my colleague's blog about How to Monitor Java Garbage Collection.

In the event CPU usage rate is low

    When blocking time occurs, both TPS and CPU usage rate will decrease. This might result from the problem of interworking systems or concurrency. To analyze this, you can use an analysis on the result of thread dump or a profiler. For more information on thread dump analysis, read How to Analyze Java Thread Dumps.

    You can conduct a very accurate lock analysis by using a commercial profiler. In most cases, however, you can obtain a satisfactory result with only the CPU analyzer injvisualvm.

In the event CPU usage rate is high

    If TPS is low but CPU usage rate is high, this is likely to result from inefficient implementation. In this case, you should find out the location of bottlenecks by using a profiler. You can analyze this by usingjvisuavm,TPTP of Eclipse or JProbe.

Approach for Tuning

    You are advised to use the following approach to tune applications.

    First, you should check whether performance tuning is necessary. The process of performance measuring is not easy work. You are also not guaranteed to obtain a satisfactory result all the time. Therefore, if the application already meets its target performance, you don't need to invest additionally in performance.

    The problem lies in only a single place. All you have to do is to fix it. ThePareto principle applies to performance tuning as well. This does not mean to emphasize that the low performance of a certain feature results necessarily from a single problem. Rather,this emphasizes that we should focus on one factor that has the biggest influence on the performance when approaching performance tuning. Thus, you could handle another problem after fixing the most important one. You are advised to try to fix just one problem at a time.

    You should consider the balloon effect. You should decide what to give up to get something. You can improve responsiveness by applying cache but if the cache size increases, the time it takes to carry out a full GC will increase as well. In general, if you want a small amount of memory usage, throughput or responsiveness could be deteriorated. Thus, you need to consider what is most important and what is less important.

    So far, you have read the method for Java application performance tuning. To introduce a concrete procedure for performance measurement, I had to omit some details. Nevertheless, I think this could satisfy most of the cases for tuning Java web server applications.

    Good luck with performance tuning!

    By Se Hoon Park, Senior Software Engineer at Web Platform Development Lab, NHN Corporation.


Original: http://www.cubrid.org/blog/dev-platform/the-principles-of-java-application-performance-tuning/

相關推薦

成為Java GC專家(5)Java應用效能調原則

    This is the fifth article in the series of "Become a Java GC Expert". In the first issueUnderstanding Java Garbage Collection we hav

成為Java GC專家(1)深入淺出Java垃圾回收機制

  對於Java開發人員來說,瞭解垃圾回收機制(GC)有哪些好處呢?首先可以滿足作為一名軟體工程師的求知慾,其次,深入瞭解GC如何工作可以幫你寫出更好的Java應用。   這僅僅代表我個人的意見,但我堅信一個精通GC的人往往是一個好的Java開發者。如果你對GC的處理過程感

成為Java GC專家(2)如何監控Java垃圾回收機制

  本文是成為Java GC專家系列文章的第二篇。在第一篇《深入淺出Java垃圾回收機制》中我們學習了不同GC演算法的執行過程,GC是如何工作的,什麼是新生代和老年代,你應該瞭解的JDK7中的5種GC型別,以及這5種類型對於應用效能的影響。   在本文中,我將解釋JVM到底

夯實Java基礎系列5Java檔案和Java包結構

目錄 Java中的包概念 包的作用 package 的目錄結構 設定 CLASSPATH 系統變數 常用jar包 java軟體包的型別 dt.jar rt.jar *.java檔案的奧祕 *.Java檔案簡介 為什麼一個java原始檔中只能有一個public類? Main方法 外部類的訪問許可權

java應用效能調之詳解System的gc垃圾回收方法

一、什麼是System.gc()? System.gc()是用Java,C#和許多其他流行的高階程式語言提供的API。當它被呼叫時,它將盡最大努力從記憶體中清除垃圾(即未被引用的物件)。名詞解釋:GC,Garbage Collection,垃圾回收,下文會經常使用。 二、誰可以呼叫System.gc()? &

Java基礎系列5Java程式碼的執行順序

該系列博文會告訴你如何從入門到進階,一步步地學習Java基礎知識,並上手進行實戰,接著瞭解每個Java知識點背後的實現原理,更完整地瞭解整個Java技術體系,形成自己的知識框架。   一、構造方法 構造方法(或建構函式)是類的一種特殊方法,用來初始化類的一個新的物件。Java 中的每個類都

深入理解Java虛擬機器(四)——JVM效能調監控工具

Jinfo 檢視正在執行的Java應用程式的擴充套件引數 檢視jvm的引數 檢視java系統引數 Jstat jstat命令可以檢視堆記憶體各部分的使用量,以及載入類的數量。命

Python web 應用效能調

為了快速上線,早期很多程式碼基本是怎麼方便怎麼來,這樣就留下了很多隱患,效能也不是很理想,python 因為 GIL 的原因,在效能上有天然劣勢,即使用了 gevent/eventlet 這種協程方案,也很容易因為耗時的 CPU 操作阻塞住整個程序。前陣子對基礎程式碼做了些重構,效果

2017版KVM網路效能調終極版

四、KVM網路效能調優 首先,我給大家看一張圖,這張圖是資料包從虛擬機器開始然後最後到物理網絡卡的過程。 我們分析下這張圖,虛擬機器有資料包肯定是先走虛擬機器自身的那張虛擬網絡卡,然後發到中間的虛擬化層,再然後是傳到宿主機裡的核心網橋中,最後傳到物理網絡卡,這個過程很好理解。 那麼我們要做網路的優

Android應用效能調的技術點

下面是收集的一些Android應用效能調優點:使用非同步 保持APP的高度響應,不要在UI執行緒做耗時操作,多使用非同步任務 使用執行緒時要做好執行緒控制;使用佇列、執行緒池 謹慎使用糟糕的AysncTask、Timer 警惕非同步任務引起的記憶體洩露 應該非同步任務分類,

Hadoop實戰*********MapReduce的效能調(一)*********

        下面來談談重頭戲,那就是mapred中的這些NB的引數。前置知識我相信大家都已經瞭解了(如果你還不瞭解mapred的執行機制,看這個也無意義...),首先資料要進行map,然後merge,然後reduce程序進行copy,最後進行reduce,其中的merge和copy總稱可以為shuffle

iOS應用效能調的25個建議和技巧

效能對 iOS 應用的開發尤其重要,如果你的應用失去反應或者很慢,失望的使用者會把他們的失望寫滿App Store的評論。然而由於iOS裝置的限制,有時搞好效能是一件難事。開發過程中你會有很多需要注意的事項,你也很容易在做出選擇時忘記考慮效能影響。 這正是我寫下這篇文章的原

CSS重構樣式表效能調

  這兩天窩在家裡又看了本CSS相關的書:《CSS重構:樣式表效能調優》。重構是指在不改變程式碼行為的前提下,重寫程式碼,使其更加簡潔、易於複用。   這本書讀起來比較快,可挑自己感興趣的讀,前面三章是基礎知識的介紹,都瞭解的話可直接跳過。第四章是為樣式分類,我比較感興趣的是第四章(測試)和第五章(程式碼的

成為Java GC專家5

轉載地址:http://www.importnew.com/13954.html 這是“成為Java GC專家”系列的第五篇文章。在第一篇深入淺出Java垃圾回收機制中,我們已經學習了不同的GC演算法流程、GC的工作原理、新生代(Young Generation)和老

Java內部類(5)應用

ont 應用 情況 test 調用 sub() spa ren tro 例1-閉包(Closure) 閉包是一個可調用的對象(通過Callback),它記錄了一些信息,這些信息來自於創建它的作用域 1 interface Incrementable { 2 v

成為Java GC專家(4)—Apache的MaxClients引數詳解及其在Tomcat執行FullGC時的影響

MaxClients 與backlog在這種情況下,設定哪個引數可以避免返回給使用者503錯誤呢?首先,我們應該知道backlog的值要夠大,以至於能夠容納所有因為Full GC導致暫停期間湧入的請求。換句話說太應該不小於200。那麼,這麼設定之後會不會產生新的問題呢?讓我們假設將backlog設定為200後

成為Java GC專家(4)

MaxClients 與backlog 在這種情況下,設定哪個引數可以避免返回給使用者503錯誤呢? 首先,我們應該知道backlog的值要夠大,以至於能夠容納所有因為Full GC導致暫停期間湧入的請求。換句話說太應該不小於200。 那麼,這麼設定之後會不會產生新的問題呢? 讓我們假設將back

Java基礎_3.5簡單Java

inf 簡單 字符串 stat 被調用 name屬性 職位 void 類的定義 簡單Java類 簡單Java類是一種在實際開發之中使用最多的類的定義形式,在簡單Java類中包含有類、對象、構造方法、private封裝等核心概念的使用,而對於簡單Java類首先給出如下的基本開

Java 資料結構5Hash詳解

雜湊表 雜湊表也稱散列表(Hash),Hash表是基於健值對(key - value)直接進行訪問的資料結構。但是他的底層是基於陣列的,通過特定的雜湊函式把key對映到陣列的某個下標來加快查詢速度,對於雜湊表來說,查詢元素的複雜度是O(1) 我們來看一下Hash

JVM效能優化, Part 5Java的伸縮性

感謝朋友【吳傑】投遞本文。 JVM效能優化系列文章由Eva Andearsson在javaworld上發表共計5篇文章,ImportNew上有前4篇譯文。本文(第5篇)由吳傑翻譯自:javaworld 。 很多程式設計師在解決JVM效能問題的時候,花開了很多時間去調優應用程式級別的效能瓶頸,當