Mutability, Arrays and the Cost of Temporary Objects in Java

In his 2001 must-read book, Effective Java, Joshua Bloch said in one item “Favor immutability”. Java theory and practice: To mutate or not to mutate? provides an excellent overview of what this means and why it matters. It states:

An immutable object is one whose externally visible state cannot change after it is instantiated.

A Brief Overview of Immutability

Let’s say you want to create a class to model arbitrary precision rational numbers (ie fractions). A mutable version might start out:

public class BigRational {
  private BigInteger numerator = BigInteger.ZERO;
  private BigInteger denominator = BigInteger.ONE;

  // constructors and so on

  public BigRational add(BigRational other) {
    if (numerator.signum() == 0) {
      numerator = other.numerator;
      denominator = other.denominator;
    } else if (other.numerator.signum() == 0) {
      // no action required
    } else if (denominator.equals(other.denominator)) {
      numerator = numerator.add(other.numerator);
    } else {
      // this could be optimized for greatest common divisor
      numerator = numerator.multiply(other.denominator).add(other.numerator.multiply(denominator));
      denominator = denominator.multiply(other.denominator);
    }
    return this;
  }

  // etc
}

This is how many classes naively start. The problem comes when this class is used as a data member or a parameter. Consider this class:

public class MyClass {
  private BigRational data = new BigRational();

  public BigRational getData() {
    return data;
  }
}

Doing this will modify the internal state of the class:

MyClass mc = new MyClass();
BigRational rational = mc.getData();
rational.multiply(rational);

Obviously this behaviour isn’t desirable, which leads to the practice of defensive copying, a practice familiar to any C programmer. Each time this getter is called a temporary copy is created so the internal state of the class isn’t violated.

One of the biggest early errors in Java’s design was that the Date class is mutable. This means that any API that uses dates either has to use a non-standard date class or defensively copy date instances. Oddly, Java got String, BigInteger and BigDecimal all right as they’re all immutable. Even stranger, the later Calendar class (introduced in JDK 1.1) was also made mutable.

An immutable version would look something like:

public class BigRational {
  private final BigInteger numerator;
  private final BigInteger denominator;

  public BigRational() {
    this(BigInteger.ZERO);
  }

  public BigRational(BigInteger integer) {
    this(integer, BigInteger.ONE);
  }

  public BigRational(BigInteger numerator, BigInteger denominator) {
    if (denominator.signum() == 0) {
      throw new IllegalArgumentException("denominator cannot be zero");
    }
    if (numerator.signum() == 0) {
      this.numerator = BigInteger.ZERO;
      this.denominator = BigInteger.ONE;
    } else {
      this.numerator = numerator;
      this.denominator = denominator;
    }
  }

  public BigRational multiply(BigRational other) {
    if (numerator.signum() == 0 || other.numerator.signum() == 0) {
      return new BigRational(BigInteger.ZERO);
    } else if (denominator.equals(other.denominator)) {
      return new BigRational(numerator.add(other.numerator), denominator);
    } else {
      return new BigRational(numerator.multiply(other.denominator).add(other.numerator.multiply(denominator)), enominator.multiply(other.denominator));
    }
  }

  // etc
}

Arrays are Mutable

The big problem with all this is that Java arrays are mutable. So for example:

public void doStuff(String args[]) {
  args[0] = "Hello world";
}

...

String arr[] = new String[] { "one", "two", "three" };
doStuff(arr);
System.out.println(arr[0]); // Hello world

This is one big reason why you should use Lists instead of arrays in almost all circumstances where you have a choice. Lists can be made immutable:

List<String> list = new ArrayList<String>();
list.add("one");
list.add("two");
list.add("three");
final List<String> immutableList = Collections.unmodifiableList(list);

On a side note, this rather verbose syntax gets a little easier in Java 7 with collection literals, for example:

List<String> list = ["one", "two", "three"];
final List<String> immutableList = Collections.unmodifiableList(list);

Enum Values

Java 5 introduced typesafe enums, largely based on Joshua Bloch’s proposal (that used classes). The unofficial versions had lots of potential issues (eg having to implement readResolve() to cater to serialization creating new instances). In my opinion, Java’s enums are one of (increasingly few) significantly better language constructs in Java compared to, say, C# as Java’s enums aren’t just thinly wrapped integers (as they are in C/C++/.Net) and can also have behaviour.

Java enums have a static method called values() which returns an array of all instances of that enum. After the lessons of the Date class, this particular decision was nothing short of shocking. A List would have been a far more sensible choice. Internally this means the array of instances must be defensively copied each time it is called forcing you to write code like this repeatedly:

public enum Season {
  SPRING, SUMMER, AUTUMN, WINTER;

  private static final List<Season> VALUES =
    Collections.unmodifiableList(
      new ArrayList<Season>(Arrays.asList(values())));

  public static List<Season> getValues() { return VALUES; }
}

Is This Really Necessary?

There is nothing inherently wrong with temporary objects. It’s a question of degree. So creating several hundred (or thousand) temporary objects isn’t any big deal. At some point there is such a thing as too many.

I will demonstrate two things here:

  1. The cost of temporary arrays; and
  2. The correct way to generate random enums.

This post was prompted by Java: Pick a random value from an enum? where the poster created a Random on every iteration. Some years ago this was bad because Randoms were seeded with the current time (in milliseconds or even seconds) so you wouldn’t get a particularly random distribution if called in a short space of time and it’s worth answering the question of fairness.

This enum will be used:

public enum Season {
  SPRING, SUMMER, AUTUMN, WINTER;

  private static final List<Season> VALUES1 =
      Collections.unmodifiableList(
          new ArrayList<Season>(Arrays.asList(values())));
  private static final Season[] VALUES2 = values();
  private static final int SIZE = VALUES2.length;
  
  private static final Random RANDOM = new Random();

  public static Season random1() {
    return values()[new Random().nextInt(SIZE)];
  }

  public static Season random2() {
    return values()[RANDOM.nextInt(SIZE)];
  }

  public static Season random3() {
    return VALUES1.get(new Random().nextInt(SIZE));
  }

  public static Season random4() {
    return VALUES1.get(RANDOM.nextInt(SIZE));
  }

  public static Season random5() {
    return VALUES2[new Random().nextInt(SIZE)];
  }

  public static Season random6() {
    return VALUES2[RANDOM.nextInt(SIZE)];
  }
}

with the following test harness:

public class Temporary {
  private static final int COUNT = 30000000;

  public static void main(String args[]) {
    ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor();
    MemoryMonitor monitor = new MemoryMonitor();
    executor.scheduleAtFixedRate(monitor, 0, 10, TimeUnit.MILLISECONDS);
    int[] tally = new int[4];
    long baseline = usedMemory();
    long start = System.nanoTime();
    for (int i=0; i<COUNT; i++) {
      tally[Season.random1().ordinal()]++;
    }
    long end = System.nanoTime();
    executor.shutdown();
    try {
      executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
    } catch (InterruptedException e) {
      throw new RuntimeException(e);
    }
    long memoryUsed = monitor.peak() - baseline;
    for (Season season : Season.values()) {
      System.out.printf("%s: %,d%n", season, tally[season.ordinal()]);
    }
    System.out.printf("%nCompleted %,d iterations in %,.3f seconds using %,d bytes%n",
        COUNT, ((end - start) / 1000000) / 1000.0d, memoryUsed
    );
  }

  private static long usedMemory() {
    Runtime runtime = Runtime.getRuntime();
    return runtime.totalMemory() - runtime.freeMemory();
  }

  private static void waitForEnter() {
    try {
      new BufferedReader(new InputStreamReader(System.in)).readLine();
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}

and

public class MemoryMonitor implements Runnable {
  private final Runtime runtime = Runtime.getRuntime();
  private final List<Long> usage = new ArrayList<Long>();

  @Override
  public void run() {
    usage.add(runtime.totalMemory() - runtime.freeMemory());
  }

  public List<Long> usage() {
    return usage;
  }

  public long peak() {
    return Collections.max(usage);
  }
}

with each method being run in turn.

The Results

Method Run time (seconds) Peak Memory Usage (bytes)
random1 9.746 681,288
random2 5.914 665,592
random3 5.123 669,408
random4 1.476 18,376
random5 4.593 661,368
random6 1.056 18,376

From this we can draw several conclusions:

  1. Creating a Random on every invocation is fair (in distribution terms) but has a high cost in temporary objects and CPU time (by a factor of 2-5);
  2. The garbage collector is working as both the arrays and the temporary Random objects contribute to the memory usage but Java is (partially) handling both being created. What this probably means is that both created is triggering a GC;
  3. Using a static copy of the enum values is 2-5x as fast;
  4. A static array copy is about 20-40% quicker than a static List copy; and
  5. The more optimized version uses 30x less memory and runs 10x quicker than the least optimized version.

Conclusion

For significant use of an enum’s values() method, it’s a no brainer: create and use a static copy instead. It’s faster and uses way less memory. On non-trivial applications it will also mean less memory fragmentation and less (possibly expensive) GCs, which is a significant issue with high-usage Web applications.

5 comments:

Anonymous said...

I'd forgotten how Java is dirty.

Lawrence Krubner said...

I suppose it is these sorts of problems that has been driving the interest in Erlang, Clojure and all the rest.

simi said...

hi,

First of all. Thanks very much for your useful post.

I just came across your blog and wanted to drop you a note telling you how impressed I was with the information you have posted here.

Please let me introduce you some info related to this post and I hope that it is useful for .Net community.

There is a good C# resource site, Have alook

http://csharptalk.com

simi

Josh Smeaton said...

This is an old post, and maybe comments are irrelevant now, however I wanted to question the comparison of enums between Java and c#.

You say that enums within c# are just thinly wrapped ints, and that is correct. Though I think the comparison with Java enums is invalid. By the look of it, Java enums more closely resemble structs in c#, which are value types and therefore immutable. Would you agree?

William Shields said...

@Josh it's true that Java enums are more object (struct) than scalar value but they are compared like a scalar so you do:

if (season1 == season2) ...

not

if (season1.equals(season2)) ...

Can structs in C# have behaviour? I don't actually know. The "value type" part is more likely about the fact that it can be stored on the stack rather than the heap, which is extremely useful in some instances.

For example: they've ported Git to Java but there are things C can do that Java is nowhere as near as efficient as and one of them is being able to pass 20 byte SHA1 value objects on the stack, which C# can do.

Post a Comment