[ACCEPTED]-What is the most efficient Java Collections library?-collections

Accepted answer
Score: 105

The question is (now) about storing lots 46 of data, which can be represented using 45 primitive types like int, in a Map. Some of 44 the answers here are very misleading in 43 my opinion. Let's see why.

I modified the 42 benchmark from trove to measure both runtime 41 and memory consumption. I also added PCJ to 40 this benchmark, which is another collections 39 library for primitive types (I use that 38 one extensively). The 'official' trove benchmark 37 does not compare IntIntMaps to Java Collection's 36 Map<Integer, Integer>, probably storing Integers and storing ints is not 35 the same from a technical point of view. But 34 a user might not care about this technical 33 detail, he wants to store data representable 32 with ints efficiently.

First the relevant part 31 of the code:

new Operation() {

     private long usedMem() {
        System.gc();
        return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
     }

     // trove
     public void ours() {
        long mem = usedMem();
        TIntIntHashMap ours = new TIntIntHashMap(SET_SIZE);
        for ( int i = dataset.size(); i-- > 0; ) {
           ours.put(i, i);
        }
        mem = usedMem() - mem;
        System.err.println("trove " + mem + " bytes");
        ours.clear();
     }

     public void pcj() {
        long mem = usedMem();
        IntKeyIntMap map = new IntKeyIntOpenHashMap(SET_SIZE);
        for ( int i = dataset.size(); i-- > 0; ) {
           map.put(i, i);
        }
        mem = usedMem() - mem;
        System.err.println("pcj " + mem + " bytes");
        map.clear();
     }

     // java collections
     public void theirs() {
        long mem = usedMem();
        Map<Integer, Integer> map = new HashMap<Integer, Integer>(SET_SIZE);
        for ( int i = dataset.size(); i-- > 0; ) {
           map.put(i, i);
        }
        mem = usedMem() - mem;
        System.err.println("java " + mem + " bytes");
        map.clear();
     }

I assume the data comes as primitive 30 ints, which seems sane. But this implies a runtime 29 penalty for java util, because of the auto-boxing, which 28 is not neccessary for the primitive collections 27 frameworks.

The runtime results (without 26 gc() calls, of course) on WinXP, jdk1.6.0_10:

                      100000 put operations      100000 contains operations 
java collections             1938 ms                        203 ms
trove                         234 ms                        125 ms
pcj                           516 ms                         94 ms

While 25 this might already seem drastic, this is 24 not the reason to use such a framework.

The 23 reason is memory performance. The results 22 for a Map containing 100000 int entries:

java collections        oscillates between 6644536 and 7168840 bytes
trove                                      1853296 bytes
pcj                                        1866112 bytes

Java 21 Collections needs more than three times the memory compared to 20 the primitive collection frameworks. I.e. you 19 can keep three times as much data in memory, without 18 resorting to disk IO which lowers runtime 17 performance by magnitudes. And this matters. Read 16 highscalability to find out why.

In my experience high memory 15 consumption is the biggest performance issue 14 with Java, which of course results in worse 13 runtime performance as well. Primitive collection 12 frameworks can really help here.

So: No, java.util 11 is not the answer. And "adding functionality" to 10 Java collections is not the point when asking 9 about efficiency. Also the modern JDK collections 8 do not "out-perform even the specialized 7 Trove collections".

Disclaimer: The 6 benchmark here is far from complete, nor 5 is it perfect. It is meant to drive home 4 the point, which I have experienced in many 3 projects. Primitive collections are useful 2 enough to tolerate fishy API - if you work 1 with lots of data.

Score: 73

From inspection, it looks like Trove is 17 just a library of collections for primitive 16 types - it's not like it's meant to be adding 15 a lot of functionality over the normal collections 14 in the JDK.

Personally (and I'm biased) I 13 love Guava (including the former Google Java 12 Collections project). It makes various tasks 11 (including collections) a lot easier, in 10 a way which is at least reasonably efficient. Given 9 that collection operations rarely form a 8 bottleneck in my code (in my experience) this 7 is "better" than a collections 6 API which may be more efficient but doesn't 5 make my code as readable.

Given that the 4 overlap between Trove and the Guava is pretty 3 much nil, perhaps you could clarify what 2 you're actually looking for from a collections 1 library.

Score: 50

I know this is an old post and there are 19 a ton of answers here. But, The answers 18 above are superficial and over simplified 17 in terms of suggesting a library. There 16 is no one library that does well across 15 the various benchmarks presented here. The 14 only conclusion I derive is if you care 13 about performance and memory and specifically 12 dealing with primitive types, its more than 11 worth looking at the non jdk alternatives.

Here 10 is a more sound analysis, in terms of benchmark 9 mechanics and the libraries covered. This is 8 a thread in the mahout dev list.

The libraries 7 covered are

  • HPPC
  • Trove
  • FastUtil
  • Mahout ( Colt )
  • Java Collections

Update June 2015: Unfortunately, the original 6 benchmarks are no longer available and besides 5 its a bit outdated. Here is a fairly recent 4 (Jan 2015) benchmarks done by someone else. It 3 is not as comprehensive nor does it have 2 the interactive exploratory tools as the 1 original link.

Score: 20

As other commentators have noticed, the 6 definition of "efficient" casts 5 a wide net. However no one has yet mentioned 4 the Javolution library.

Some of the highlights:

  • Javolution classes are fast, very fast (e.g. Text insertion/deletion in O[Log(n)] instead of O[n] for standard StringBuffer/StringBuilder).
  • All Javolution classes are hard real-time compliant and have highly deterministic behavior (in the microsecond range). Furthermore (unlike the standard library), Javolution is RTSJ safe (no memory clash or memory leak when used with Java Real-Time extension).
  • Javolution's real-time collection classes (map, list, table and set) can be used in place of most standard collection classes and provide additional functionality.
  • The Javolution collections provide concurrency guarantees to make implementation of parallel algorithms easier.

The Javolution 3 distribution includes a benchmark suite 2 so you can see how they stack up against 1 other libraries/the built-in collections.

Score: 16

Some collection libs to consider:

I would 20 first and foremost reach for the JDK collection 19 library. It covers most common things you 18 need to do and is obviously already available 17 to you.

Google Collections is probably the 16 best high-quality library outside the JDK. It's 15 heavily used and well supported.

Apache Commons 14 Collections is older and suffers a bit from 13 the "too many cooks" problem but 12 has a lot of useful stuff as well.

Trove 11 has very specialized collections for cases 10 like primitive keys/values. These days 9 we find that on modern JDKs and with the 8 Java 5+ collections and concurrent use cases, the 7 JDK collections out-perform even the specialized 6 Trove collections.

If you have really high 5 concurrency use cases, you should definitely 4 check out stuff like the NonBlockingHashMap 3 in the high-scale lib, which is a lock-free 2 implementation and can stomp on ConcurrentHashMap 1 if you have the right use case for it.

Score: 6

java.util

Sorry for the obvious answer, but for most 1 uses, the default Java Collections are more than sufficient.

Score: 6

To store millions of String in a map, take a look 1 at http://code.google.com/p/flatmap

Score: 4

I'm developer of happy-collections from 1 happy-collections on source-forge

  1. Event based collections
  2. Unmodifiable
  3. SortedList
  4. Cache
Score: 3

ConcurrentHashMap as well as the java.util.concurrent package should be mentioned, if 3 you plan to use the HashMap in multiple 2 threads. small memory footprint is assued, since 1 this is part of standard java.

Score: 3

Depends on how we define "efficient".

Every 14 data structure has its own Big-Oh behavior 13 for reading, writing, iterating, memory 12 footprint, etc. A linked list in one library 11 is likely to be the same as any other. And 10 a hash map will be faster for reading O(1) than 9 a linked list O(n).

But when I read the answers 8 to the question "Most useful free Java libraries?" I 7 noticed that trove is hardly mentioned.

This 6 doesn't sound like "most efficient". It 5 sounds like "most popular" to me.

Just some 4 feedback - I've never heard of it, and I 3 don't know anyone who has used it. Collections 2 built into the JDK, Google, or Apache Commons 1 are well-known to me.

Score: 3

Trove offers a few advantages.

  • smaller memory footprint, it doesn't used Map.Entry objects
  • you can use hash strategies instead keys for maps, this saves memory and means you don't need to define a new key each time you want to cache an object on a new set of its attributes
  • it has primitive collection types
  • think it has some form of internal iterator

That said, a 5 lot has been done to improve jdk collections 4 since trove was written.

It's the hashing 3 strategies that make it appealing to me 2 though... Google for trove and read their 1 overview.

Score: 2

If you want to store millions of records 14 in a hash table, chances are that you will 13 run into memory problems. This happened 12 to me when I tried creating a map with 2.3 11 million String objects, for example. I went 10 with BerkeleyDB, which is very mature and performs 9 well. They have a Java API that wraps the 8 Collections API, so you can easily create 7 arbitrarily large maps with very little 6 memory footprint. Access will be slower 5 though (as it is stored on disk).

Follow-up question: is there 4 a decent (and efficient), well maintained, library 3 for immutable collections? Clojure has excellent 2 support for this, and it would be nice to 1 have something similar for Java.

More Related questions