Difference between revisions of "Java Virtual Machine"

From Wiki Notes @ WuJiewen.com, by Jiewen Wu
Jump to: navigation, search
m (Tips)
m
Line 56: Line 56:
  
 
If you run your benchmarks with '''-verbose:gc''', you can see how much time was spent in garbage collection and adjust your timing data accordingly. Even better, you can run your program for a long, long time, ensuring that you trigger many garbage collections, more accurately amortizing the allocation and garbage collection cost.
 
If you run your benchmarks with '''-verbose:gc''', you can see how much time was spent in garbage collection and adjust your timing data accordingly. Even better, you can run your program for a long, long time, ensuring that you trigger many garbage collections, more accurately amortizing the allocation and garbage collection cost.
 +
 +
==Garbage Collection==
 +
The benefits of garbage collection are indisputable -- increased reliability, decoupling of memory management from class interface design, and less developer time spent chasing memory management errors. The well-known problems of dangling pointers and memory leaks simply do not occur in Java programs. (Java programs can exhibit a form of memory leak, more accurately called unintentional object retention.) However, garbage collection is not without its costs -- among them performance impact, pauses, configuration complexity, and nondeterministic finalization.
 +
 +
===The Basic Algorithm===
 +
Memory blocks can be reached in one of two ways -- if the user program holds a reference to that block in a '''root''' (a reference to an object held in a static variable or in a local variable on an active stack frame), or if there is a reference to that block held in another reachable block. In a Java program. The set of reachable objects is the transitive closure of the root set under the points-to relation. The problem faced by all garbage collection algorithms is the same -- identify blocks of memory that have been dispensed by the allocator, but are '''unreachable''' by the user program.
 +
====Reference counting====
 +
 +
The most straightforward garbage collection strategy is reference counting. Reference counting is simple, but requires significant assistance from the compiler and imposes overhead on the mutator (the term for the user program, from the perspective of the garbage collector). Each object has an associated reference count -- the number of active references to that object. If an object's reference count is zero, it is garbage (unreachable from the user program) and can be recycled. Every time a pointer reference is modified, such as through an assignment statement, or when a reference goes out of scope, the compiler must generate code to update the referenced object's reference count. If an object's reference count goes to zero, the runtime can reclaim the block immediately (and decrement the reference counts of any blocks that the reclaimed block references), or place it on a queue for deferred collection.
 +
 +
Many ANSI C++ library classes, such as string, employ reference counting to provide the appearance of garbage collection. By overloading the assignment operator and exploiting the deterministic finalization provided by C++ scoping, C++ programs can use the string class as if it were garbage collected. Reference counting is rarely used in production garbage collectors for a number of reasons, such as its inability to reclaim unreachable cyclic structures (objects that reference each other directly or indirectly, like a circularly linked list or a tree that contains back-pointers to the parent node).
  
 
==References==
 
==References==
 
[http://www.ibm.com/developerworks/java/library/j-jtp12214/index.html Dynamic compilation and performance measurement]
 
[http://www.ibm.com/developerworks/java/library/j-jtp12214/index.html Dynamic compilation and performance measurement]

Revision as of 10:42, 14 March 2011

Dynamic Compilation

The compilation process for a Java application is different from that of statically compiled languages like C or C++. A static compiler converts source code directly to machine code that can be directly executed on the target platform, and different hardware platforms require different compilers. The Java compiler converts Java source code into portable JVM bytecodes, which are "virtual machine instructions" for the JVM. Unlike static compilers, javac does very little optimization -- the optimizations that would be done by the compiler in a statically compiled language are performed instead by the runtime when the program is executed.

HotSpot dynamic compilation

The HotSpot execution process combines interpretation, profiling, and dynamic compilation. Rather than convert all bytescodes into machine code before they are executed, HotSpot first runs as an interpreter and only compiles the "hot" code -- the code executed most frequently. As it executes, it gathers profiling data, used to decide which code sections are being executed frequently enough to merit compilation.

HotSpot comes with two compilers: the client compiler and the server compiler. The default is to use the client compiler; you can select the server compiler by specifying the -server switch when starting the JVM. The server compiler has been optimized to maximize peak operating speed, and is intended for long-running server applications. The client compiler has been optimized to reduce application startup time and memory footprint, employing fewer complex optimizations than the server compiler, and accordingly requiring less time for compilation.

After interpreting a code path a certain number of times, it is compiled into machine code. But the JVM continues profiling, and may recompile the code again later with a higher level of optimization if it decides the code path is particularly hot or future profiling data suggests opportunities for additional optimization. The JVM may recompile the same bytecodes many times in a single application execution. Try invoking the JVM with the -XX:+PrintCompilation flag, which causes the compiler (client or server) to print a short message every time it runs.

Compilation and Writing Benchmarks

One of the challenges of writing good benchmarks is that optimizing compilers are adept at spotting dead code -- code that has no effect on the outcome of the program execution. But benchmark programs often don't produce any output, which means some, or all, of your code can be optimized away without you realizing it, at which point you're measuring less execution than you think you are. In particular, many microbenchmarks perform much "better" when run with -server than with -client, not because the server compiler is faster (though it often is) but because the server compiler is more adept at optimizing away blocks of dead code.

The following benchmark intended to measure concurrent thread performance, but that instead measures something completely different.

public class StupidThreadTest {
   public static void doSomeStuff() {
       double uselessSum = 0;
       for (int i=0; i<1000; i++) {
           for (int j=0;j<1000; j++) {
               uselessSum += (double) i + (double) j;
           }
       }
   }
   public static void main(String[] args) throws InterruptedException {
       doSomeStuff();
       
       int nThreads = Integer.parseInt(args[0]);
       Thread[] threads = new Thread[nThreads];
       for (int i=0; i<nThreads; i++)
           threads[i] = new Thread(new Runnable() {
               public void run() { doSomeStuff(); }
           });
       long start = System.currentTimeMillis();
       for (int i = 0; i < threads.length; i++)
           threads[i].start();
       for (int i = 0; i < threads.length; i++)
           threads[i].join();
       long end = System.currentTimeMillis();
       System.out.println("Time: " + (end-start) + "ms");
   }
}


The doSomeStuff() method is supposed to give the threads something to do. However, the compiler can determine that all the code in doSomeStuff is dead, and optimize it all away because uselessSum is never used. Once the code inside the loop goes away, the loops can go away, too, leaving doSomeStuff entirely empty. The server compiler does more optimization and can detect that the entirety of doSomeStuff is dead code. While many programs do see a speedup with the server JVM, the speedup you see here is simply a measure of a badly written benchmark, not the blazing performance of the server JVM.

Tips

If you're looking to measure the performance of idiom X, you generally want to measure its compiled performance, not its interpreted performance. (You want to know how fast X will be in the field.) To do so requires "warming up" the JVM -- executing your target operation enough times that the compiler will have had time to run and replace the interpreted code with compiled code before starting to time the execution.

The compiler runs at less predictable times, the JVM switches from interpreted to compiled code at will, and the same code path may be compiled and recompiled more than once during a run. If you don't account for the timing of these events, they can seriously distort your timing results.

How much warmup is enough? You don't know. The best you can do is run your benchmarks with -XX:+PrintCompilation, observe what causes the compiler to kick in, then restructure your benchmark program to ensure that all of this compilation occurs before you start timing and that no further compilation occurs in the middle of your timing loops.

If you run your benchmarks with -verbose:gc, you can see how much time was spent in garbage collection and adjust your timing data accordingly. Even better, you can run your program for a long, long time, ensuring that you trigger many garbage collections, more accurately amortizing the allocation and garbage collection cost.

Garbage Collection

The benefits of garbage collection are indisputable -- increased reliability, decoupling of memory management from class interface design, and less developer time spent chasing memory management errors. The well-known problems of dangling pointers and memory leaks simply do not occur in Java programs. (Java programs can exhibit a form of memory leak, more accurately called unintentional object retention.) However, garbage collection is not without its costs -- among them performance impact, pauses, configuration complexity, and nondeterministic finalization.

The Basic Algorithm

Memory blocks can be reached in one of two ways -- if the user program holds a reference to that block in a root (a reference to an object held in a static variable or in a local variable on an active stack frame), or if there is a reference to that block held in another reachable block. In a Java program. The set of reachable objects is the transitive closure of the root set under the points-to relation. The problem faced by all garbage collection algorithms is the same -- identify blocks of memory that have been dispensed by the allocator, but are unreachable by the user program.

Reference counting

The most straightforward garbage collection strategy is reference counting. Reference counting is simple, but requires significant assistance from the compiler and imposes overhead on the mutator (the term for the user program, from the perspective of the garbage collector). Each object has an associated reference count -- the number of active references to that object. If an object's reference count is zero, it is garbage (unreachable from the user program) and can be recycled. Every time a pointer reference is modified, such as through an assignment statement, or when a reference goes out of scope, the compiler must generate code to update the referenced object's reference count. If an object's reference count goes to zero, the runtime can reclaim the block immediately (and decrement the reference counts of any blocks that the reclaimed block references), or place it on a queue for deferred collection.

Many ANSI C++ library classes, such as string, employ reference counting to provide the appearance of garbage collection. By overloading the assignment operator and exploiting the deterministic finalization provided by C++ scoping, C++ programs can use the string class as if it were garbage collected. Reference counting is rarely used in production garbage collectors for a number of reasons, such as its inability to reclaim unreachable cyclic structures (objects that reference each other directly or indirectly, like a circularly linked list or a tree that contains back-pointers to the parent node).

References

Dynamic compilation and performance measurement