Difference between revisions of "Java Generics"

From Wiki Notes @ WuJiewen.com, by Jiewen Wu
Jump to: navigation, search
m (Type Inferences)
Line 1: Line 1:
 +
[[tutorial.pdf]]
 +
 
Generics are a means of expressing type constraints on the behavior of a class or method in terms of unknown types, such as "whatever the types of parameters x and y of this method are, they must be the same type," "you must provide a parameter of the same type to both of these methods," or "the return value of foo() is the same type as the parameter of bar()."
 
Generics are a means of expressing type constraints on the behavior of a class or method in terms of unknown types, such as "whatever the types of parameters x and y of this method are, they must be the same type," "you must provide a parameter of the same type to both of these methods," or "the return value of foo() is the same type as the parameter of bar()."
  

Revision as of 12:20, 14 March 2011

tutorial.pdf

Generics are a means of expressing type constraints on the behavior of a class or method in terms of unknown types, such as "whatever the types of parameters x and y of this method are, they must be the same type," "you must provide a parameter of the same type to both of these methods," or "the return value of foo() is the same type as the parameter of bar()."

Generics are not covariant

Arrays are covariant; because Integer is a subtype of Number, the array type Integer[] is a subtype of Number[], and therefore an Integer[] value can be supplied wherever a value of Number[] is required. (More formally, if Number is a supertype of Integer, then Number[] is a supertype of Integer[].) On the other hand, generics are not covariant; List<Integer> is not a subtype of List<Number>, and attempting to supply a List<Integer> where a List<Number> is demanded is a type error. It turns out there's a good reason it doesn't work that way: It would break the type safety generics were supposed to provide.

Wildcards

Wildcards — ? — are a means of expressing a type constraint in terms of an unknown type. They were not part of the original design for generics (derived from the Generic Java (GJ) project); they were added as the design process played out over the five years between the formation of JSR 14 and its final release.

The wildcard type List<?> is different from both the raw type List and the concrete type List<Object>. To say a variable x has type List<?> means that there exists some type T for which x is of type List<T>, that x is homogeneous even though we don't know what particular type its elements have. It's not that the contents can be anything, it's that we don't know what the type constraints on the contents are — but we know that there is a constraint. On the other hand, the raw type List is heterogeneous; we are not able to place any type constraints on its elements, and the concrete type List<Object> means that we explicitly know that it can contain any object.

One benefit of wildcards is that they allow you to write code that can operate on variables of generic types without knowing their exact type bound.

Bounded Wildcards

The "? extends T" generic type specifiers are for dealing with the lack of covariance — they let classes declare when method arguments or return values are covariant (or the opposite, contravariant). The most common mistake with bounded wildcards is forgetting to use them at all.

Most bounded wildcards are bounded above; the "? extends T" notation places an upper bound on the type. It is also possible, though less common, to place a lower bound on the type with the notation "? super T", meaning "T or any superclass of T". Lower-bounded wildcards show up when you want to specify a callback object, such as a comparator, or a data structure into which you are going to place a value.

Because the language supports both upper- and lower-bounded wildcards, how do we know which one to use, and when?

There's a simple rule, called the get-put principle, which tells us which kind of wildcard to use. The get-put principle, as stated in Naftalin and Wadler's fine book on generics, Java Generics and Collections (see Resources), says:

   Use an extends wildcard when you only get values out of a structure, 
    use a super wildcard when you only put values into a structure, 
     and don't use a wildcard when you do both.

The get-put principle is easiest to understand when applied to container classes like Box or the Collections classes, because the notion of getting or putting connects naturally with what these classes do: store things. So, if we wanted to apply the get-put principle to create a method that copies from one Box to another, the most general form would be as follows.

 public static<T> void copy(Box<? extends T> from, Box<? super T> to) {
   to.put(from.get());
 }

We can see the get-put principle at work in the declaration for Collections.sort().

public static <T extends Comparable<? super T>> void sort(List<T>list) { ... }

Here, we can sort a List that is parameterized by any type that implements Comparable. But rather than restricting the domain of sort() to lists whose elements are comparable to themselves, we can go further — we can sort lists of elements that know how to compare themselves to their supertypes, too. Because we are putting values into the comparator to determine the relative ordering of two elements, the get-put principle tells us we want to use a super wildcard here.

The seeming circular reference — where T extends something parameterized by T — is not really circular at all. It is simply expressing the constraint that to be able to sort a List<T>, T has to implement the interface Comparable<X>, where X is T or one of its supertypes.

Generics Basics

Type Erasure

Generics are implemented almost entirely in the Java compiler, and not in the runtime, nearly all type information about generic types has been "erased" by the time the bytecode is generated. That is, when a generic type is instantiated, the compiler translates those types by a technique called type erasure — a process where the compiler removes all information related to type parameters and type arguments within a class or method. Type erasure enables Java applications that use generics to maintain binary compatibility with Java libraries and applications that were created before generics.

For instance, Box<String> is translated to type Box, the raw type — a raw type is a generic class or interface name without any type arguments. This means that you can't find out what type of Object a generic class is using at runtime. The following operations are not possible:

   public class MyClass<E> {
       public static void myMethod(Object item) {
           if (item instanceof E) {  //Compiler error
               ...
           }
           E item2 = new E();   //Compiler error
           E[] iArray = new E[10]; //Compiler error
           E obj = (E)new Object(); //Unchecked cast warning
       }
   }

The operations shown above are meaningless at runtime because the compiler removes all information about the actual type argument (represented by the type parameter E) at compile time. There's no way to express a constraint like "T must have a copy constructor" (e.g., E item2 = new E();) using generics, so accessing constructors for classes represented by generic type parameters is out. What about clone()? Let's say Foo was defined to make T extend Cloneable:

class Foo<E extends Cloneable> { 
  public void doSomething(E param) {
    E copy = (E) param.clone();  // illegal 
  }
}

Unfortunately, you still can't call param.clone(). Because clone() has protected access in Object and, to call clone(), you have to call it through a reference to a class that has overridden clone() to be public. But T is not known to redeclare clone() as public, so cloning is also out.

Type erasure exists so that new code may continue to interface with legacy code. Using a raw type for any other reason is considered bad programming practice and should be avoided whenever possible.

Use Wildcards

Look at this interface Box.

public interface Box<T> {
    public T get();
    public void put(T element);
}

We use the wildcards in the unbox method.

public void unbox(Box<?> box) {
    System.out.println(box.get());
}

unbox() can call the get() method, and it can call any of the methods inherited from Object (such as hashCode()). The only thing it cannot do is call the put() method, and this is because it cannot verify the safety of such an operation without knowing the type parameter T for this Box instance. Because box is a Box<?>, and not a raw Box, the compiler knows that there is some T that serves as a type parameter for box, but because it doesn't know what that T is, it will not let you call put() because it cannot verify that doing so will not violate the type safety constraints for Box. (Actually, you can call put() in one special case: when you pass the null literal. We may not know what type T represents, but we know that the null literal is a valid value for any reference type.)

It might be tempting to write the following rebox().

public void rebox(Box<?> box) {
    box.put(box.get());
}
Rebox.java:8: put(capture#337 of ?) in Box<capture#337 of ?> cannot be applied
  to (java.lang.Object)
   box.put(box.get());
      ^
1 error


When the compiler encounters a variable with a wildcard in its type, such as the box parameter of rebox(), it knows that there must have been some T for which box is a Box<T>. It does not know what type T represents, but it can create a placeholder for that type to refer to the type that T must be. That placeholder is called the capture of that particular wildcard. In this case, the compiler has assigned the name "capture#337 of ?" to the wildcard in the type of box. Each occurrence of a wildcard in each variable declaration gets a different capture, so in the generic declaration foo(Pair<?,?> x, Pair<?,?> y), the compiler would assign a different name to the capture of each of the four wildcards because there is no relationship between any of the unknown type parameters.

In this case, because ? essentially means "? extends Object," the compiler has already concluded that the type of box.get() is Object, not "capture#337 of ?," and it can't statically verify that an Object is an acceptable value for the type identified by the placeholder "capture#337 of ?."

Generic Methods

The following implementation of rebox(), along with a generic helper method, does the trick:

public void rebox(Box<?> box) {
    reboxHelper(box);
}
private<V> void reboxHelper(Box<V> box) {
    box.put(box.get());
}


Generic methods introduce additional type parameters (placed in angle brackets before the return type), which are usually used to formulate type constraints between the parameters and/or return value of the method. In the case of reboxHelper(), however, the generic method does not use the type parameter to specify a type constraint; it allows the compiler (through type inference) to give a name to the type parameter of box's type.

When rebox() calls reboxHelper(), it knows that doing so is safe because its own box parameter must be a Box<T> for some unknown T. Because the type parameter V is introduced in the method signature and not tied to any other type parameter, it can stand for any unknown type as well, so a Box<T> for some unknown T might as well be a Box<V> for some unknown V. (This is similar to the principle of alpha reduction in the lambda calculus, which allows you to rename bound variables.) Now the expression box.get() in reboxHelper() no longer has type Object, it has type V — and it is allowable to pass a V to Box<V>.put().

We could have declared rebox() as a generic method in the first place, like reboxHelper(), but that is considered bad API design style. The governing design principle here is "don't give something a name if you're never going to refer to it by name." In the case of generic methods, if a type parameter appears only once in the method signature, then it probably should be a wildcard rather than a named type parameter. Because the name can always be resurrected with a private capture helper if needed, this approach gives you the opportunity to keep APIs clean without throwing useful information away.

Type Inferences

The compiler will try and infer the most specific type it can for the type parameters when resolving a call to a generic method. For example, the compiler could infer that T is Integer, Number, Serializable, or Object, but it chooses Integer as that is the most specific type that fits the constraints.

You can use type inference to reduce some of the redundancy when constructing generic instances. For example, using our Box class, creating a Box<String> requires you to specify the type parameter String twice:

Box<String> box = new BoxImpl<String>();

This violation of the DRY principle (Don't Repeat Yourself) here can be irksome. However, if the implementation class BoxImpl provides a generic factory method, you can reduce this redundancy in client code:

A generic factory method that allows you to avoid redundantly specifying type parameters

public class BoxImpl<T> implements Box<T> {
   public static<V> Box<V> make() {
       return new BoxImpl<V>();
   }
}


If you instantiate a Box using the BoxImpl.make() factory, you need only specify the type parameter once:

Box<String> myBox = BoxImpl.make();


The generic make() method returns a Box<V> for some type V, and the return value is being used in a context that requires a Box<String>. The compiler determines that String is the most specific type that V could take on that satisfies the type constraints. You still have the option of manually specifying the value of V as follows:

Box<String> myBox = BoxImpl.<String>make();

References