Difference between revisions of "Java Virtual Machine"

From Wiki Notes @ WuJiewen.com, by Jiewen Wu
Jump to: navigation, search
m (Compilation and Writing Benchmarks)
m (Tips)
Line 53: Line 53:
 
The compiler runs at less predictable times, the JVM switches from interpreted to compiled code at will, and the same code path may be compiled and recompiled more than once during a run. If you don't account for the timing of these events, they can seriously distort your timing results.
 
The compiler runs at less predictable times, the JVM switches from interpreted to compiled code at will, and the same code path may be compiled and recompiled more than once during a run. If you don't account for the timing of these events, they can seriously distort your timing results.
  
How much warmup is enough? You don't know. The best you can do is run your benchmarks with -XX:+PrintCompilation, observe what causes the compiler to kick in, then restructure your benchmark program to ensure that all of this compilation occurs before you start timing and that no further compilation occurs in the middle of your timing loops.
+
How much warmup is enough? You don't know. The best you can do is run your benchmarks with '''-XX:+PrintCompilation''', observe what causes the compiler to kick in, then restructure your benchmark program to ensure that all of this compilation occurs before you start timing and that no further compilation occurs in the middle of your timing loops.
 +
 
 +
If you run your benchmarks with '''-verbose:gc''', you can see how much time was spent in garbage collection and adjust your timing data accordingly. Even better, you can run your program for a long, long time, ensuring that you trigger many garbage collections, more accurately amortizing the allocation and garbage collection cost.
  
 
==References==
 
==References==
 
[http://www.ibm.com/developerworks/java/library/j-jtp12214/index.html Dynamic compilation and performance measurement]
 
[http://www.ibm.com/developerworks/java/library/j-jtp12214/index.html Dynamic compilation and performance measurement]

Revision as of 10:15, 14 March 2011

Dynamic Compilation

The compilation process for a Java application is different from that of statically compiled languages like C or C++. A static compiler converts source code directly to machine code that can be directly executed on the target platform, and different hardware platforms require different compilers. The Java compiler converts Java source code into portable JVM bytecodes, which are "virtual machine instructions" for the JVM. Unlike static compilers, javac does very little optimization -- the optimizations that would be done by the compiler in a statically compiled language are performed instead by the runtime when the program is executed.

HotSpot dynamic compilation

The HotSpot execution process combines interpretation, profiling, and dynamic compilation. Rather than convert all bytescodes into machine code before they are executed, HotSpot first runs as an interpreter and only compiles the "hot" code -- the code executed most frequently. As it executes, it gathers profiling data, used to decide which code sections are being executed frequently enough to merit compilation.

HotSpot comes with two compilers: the client compiler and the server compiler. The default is to use the client compiler; you can select the server compiler by specifying the -server switch when starting the JVM. The server compiler has been optimized to maximize peak operating speed, and is intended for long-running server applications. The client compiler has been optimized to reduce application startup time and memory footprint, employing fewer complex optimizations than the server compiler, and accordingly requiring less time for compilation.

After interpreting a code path a certain number of times, it is compiled into machine code. But the JVM continues profiling, and may recompile the code again later with a higher level of optimization if it decides the code path is particularly hot or future profiling data suggests opportunities for additional optimization. The JVM may recompile the same bytecodes many times in a single application execution. Try invoking the JVM with the -XX:+PrintCompilation flag, which causes the compiler (client or server) to print a short message every time it runs.

Compilation and Writing Benchmarks

One of the challenges of writing good benchmarks is that optimizing compilers are adept at spotting dead code -- code that has no effect on the outcome of the program execution. But benchmark programs often don't produce any output, which means some, or all, of your code can be optimized away without you realizing it, at which point you're measuring less execution than you think you are. In particular, many microbenchmarks perform much "better" when run with -server than with -client, not because the server compiler is faster (though it often is) but because the server compiler is more adept at optimizing away blocks of dead code.

The following benchmark intended to measure concurrent thread performance, but that instead measures something completely different.

public class StupidThreadTest {
   public static void doSomeStuff() {
       double uselessSum = 0;
       for (int i=0; i<1000; i++) {
           for (int j=0;j<1000; j++) {
               uselessSum += (double) i + (double) j;
           }
       }
   }
   public static void main(String[] args) throws InterruptedException {
       doSomeStuff();
       
       int nThreads = Integer.parseInt(args[0]);
       Thread[] threads = new Thread[nThreads];
       for (int i=0; i<nThreads; i++)
           threads[i] = new Thread(new Runnable() {
               public void run() { doSomeStuff(); }
           });
       long start = System.currentTimeMillis();
       for (int i = 0; i < threads.length; i++)
           threads[i].start();
       for (int i = 0; i < threads.length; i++)
           threads[i].join();
       long end = System.currentTimeMillis();
       System.out.println("Time: " + (end-start) + "ms");
   }
}


The doSomeStuff() method is supposed to give the threads something to do. However, the compiler can determine that all the code in doSomeStuff is dead, and optimize it all away because uselessSum is never used. Once the code inside the loop goes away, the loops can go away, too, leaving doSomeStuff entirely empty. The server compiler does more optimization and can detect that the entirety of doSomeStuff is dead code. While many programs do see a speedup with the server JVM, the speedup you see here is simply a measure of a badly written benchmark, not the blazing performance of the server JVM.

Tips

If you're looking to measure the performance of idiom X, you generally want to measure its compiled performance, not its interpreted performance. (You want to know how fast X will be in the field.) To do so requires "warming up" the JVM -- executing your target operation enough times that the compiler will have had time to run and replace the interpreted code with compiled code before starting to time the execution.

The compiler runs at less predictable times, the JVM switches from interpreted to compiled code at will, and the same code path may be compiled and recompiled more than once during a run. If you don't account for the timing of these events, they can seriously distort your timing results.

How much warmup is enough? You don't know. The best you can do is run your benchmarks with -XX:+PrintCompilation, observe what causes the compiler to kick in, then restructure your benchmark program to ensure that all of this compilation occurs before you start timing and that no further compilation occurs in the middle of your timing loops.

If you run your benchmarks with -verbose:gc, you can see how much time was spent in garbage collection and adjust your timing data accordingly. Even better, you can run your program for a long, long time, ensuring that you trigger many garbage collections, more accurately amortizing the allocation and garbage collection cost.

References

Dynamic compilation and performance measurement