[ACCEPTED]-Any hard data on GC vs explicit memory management performance?-garbage-collection

Accepted answer
Score: 26

This turns into another flamewar with a 30 lot of "my gut feeling". Some 29 hard data for a change (papers contain details, benchmarks, graphs, etc.):

http://www.cs.umass.edu/~emery/pubs/04-17.pdf says:

"Conclusion. The 28 controversy over garbage collection’s performance 27 impact has long overshadowed the software 26 engineering benefi it provides.This paper 25 introduces a tracing and simulation-based 24 oracular memory manager. Using this framework, we 23 execute a range of unaltered Java benchmarks 22 using both garbage collection and explicit 21 memory management. Comparing runtime, space 20 consumption, and virtual memory footprints, we 19 find that when space is plentiful, the runtime 18 performance of garbage collection can be 17 competitive with explicit memory management, and 16 can even outperform it by up to 4%. We fi 15 that copying garbage collection canrequire 14 six times the physical memory as the Lea 13 or Kingsley allocators to provide comparable 12 performance."

When you have enough memory, copying 11 GC becomes faster than explicit free() - http://www.cs.ucsb.edu/~grze/papers/gc/appel87garbage.pdf

It also 10 depends on what language you use - Java 9 will have to do a lot of rewriting (stack, objects, generations) on 8 each collection and writing a multithreaded 7 GC that doesn't have to stop the world in 6 JVM would be a great achievement. On the 5 other hand, you get that almost for free 4 in Haskell where GC time will rarely be 3 >5%, while alloc time is almost 0. It 2 really depends what you're doing and in 1 what environment.

Score: 17

The cost of memory allocation is generally 44 much lower in a garbage collected memory 43 model, then when just using new or malloc 42 explicitly because garbage collectors generally 41 pre-allocate this memory. However, explicit 40 memory models may also do this (using memory 39 pools or memory areas); making the cost 38 of memory allocation equivalent to a pointer 37 addition.

As Raymond Chen and Rico Mariani pointed out, managed languages 36 tend to out perform unmanaged languages 35 in the general case. However, after pushing 34 it, the unmanaged language can and will 33 eventually beat the GC/Jitted language.

The 32 same thing is also evident in the Computer Language Shootout because 31 even though C++ tends to rank higher than 30 Java most of the time, you'll often see 29 C++ implementations jumping trough various 28 hoops (such as object pools) to achieve 27 optimal performance. Garbage collected languages, however, tend 26 to have easier to follow and more straight 25 forward implementations because the GC is 24 better at allocating (small chunks of) memory.

However, performance 23 isn't the biggest difference when it comes 22 to GC vs non-GC; arguably it's the deterministic 21 finalization (or RIIA) of non-GC (and reference 20 counted) languages that is the biggest argument 19 for explicit memory management because this 18 is generally used for purposes other than 17 memory management (such as releasing locks, closing 16 file or window handles et cetera). 'Recently' however 15 C# introduced the using / IDisposable construct 14 to do exactly this.

Another problem with 13 garbage collection is that the systems they 12 use tend to be rather complex to prevent 11 memory leaks. However, this also makes it 10 way more difficult to debug and track down 9 once you do have a memory leak (yes, even 8 garbage collected languages can have memory 7 leaks).

On the flip side, the garbage collected 6 language can do the most optimal thing at 5 the most optimal time (or approximately) without 4 having to burden the developer with that 3 task. This means that developing for a GC 2 language might be more natural, so you can 1 focus more on the real problem.

Score: 7

Here's an experiment that I like to run:

  1. Start up a program written in a garbage collected environment (like .NET or Java).
  2. Start up a similar program written in a non garbage collected environment (like C or C++).
  3. Use the programs and see which one is more responsive.

Objectivity 21 improvement: get your grandmother to do 20 step 3.

It's all well and good to quote theoretical 19 performance of optimal GC implementations 18 but the fact of the matter is that in real 17 world scenarios programs written in garbage 16 collected languages do not perform as well 15 as native applications. This is why large 14 projects where performance translates directly 13 into user experience still program in C++. The 12 classic example of this is game programming.

Another, perhaps 11 counterintuitive, example of this is the 10 Eclipse IDE. While it may be written in 9 Java the entire graphical subsystem had to be rewritten to produce acceptable performance. The 8 solution: make GUI elements lightweight 7 wrappers around native (C/C++) components 6 (SWT).

I understand the draw of garbage collected 5 environments. Memory management is hard 4 to get right. And a lot of work. The bottom 3 line though is this: knowing how your program 2 is supposed to behave gives you (the programmer) an 1 edge over a machine trying to guess.

Score: 6

Berger's paper is being cited a lot, but 21 it is comparing real garbage collectors 20 against a purely theoretical, offline, optimal 19 algorithm. So while it may tell you something 18 about theoretical limits, it says very little about the performance of real garbage collectors versus real implementations of malloc and free. A study that 17 I like better took real programs and compared 16 explicit malloc and free with Hans Boehm's conservative 15 garbage collector:

The Measured Cost of Conservative Garbage Collection by Ben Zorn

This study 14 isn't perfect, and Zorn is careful to note 13 that if the programs knew they were using 12 a garbage collector, some could be made 11 faster. But the hard data is this: - In 10 real programs originally written to use 9 malloc and free, garbage-collected versions run at 8 about the same speed but require twice as 7 much memory. Zorn argues fairly convincingly 6 that if you know you have GC, you can make 5 things faster, but it's hard to eliminate 4 the memory penalty.

I learned more from this 3 careful experimental study than from Berger's 2 study of an unimplementable, idealized memory 1 manager.

Score: 2

There are quite a bit of different arguments 86 given here. I want to start by making clear 85 that you cannot really make a 1:1 comparison. Each 84 has its pros and cons, and any code snippet 83 will be more appropriate for one or the 82 other system. That is as much to say, on 81 the contrary, that you must know whether 80 you have GC or not to write efficient code.

My 79 argument is you must know your environment and code acordingly. That will make your code efficient. Moving 78 from one paradigm to the other and coding 77 the same style will make your code more 76 inefficient than what the GC really helps/takes 75 away.

Case:

A program makes thousands of heap 74 memory allocations for short lived objects. That 73 is, it allocates and deallocates many times, with 72 different size of objects.

On a non-GC environment, for 71 each allocation you would end up calling 70 malloc, and that requires locating in the 69 list of free memory fragments the most suitable 68 one (according to the specific malloc implementation). The 67 memory is used and then it is freed with 66 free [or new/delete in C++...]. The cost 65 of memory management is the cost of locating 64 the fragments.

On a GC environment, with 63 a movable GC as java or .net are, after 62 each GC run all free memory is contiguous. The 61 cost of acquiring memory for an object is 60 cheap, really cheap (<10 cpu instructions 59 in Java VM). At each GC run, only alive 58 objects are located and moved to the beginning 57 of the appropriate memory region (usually 56 it is a different region than the original 55 one). The cost of memory management is primarily 54 the cost of moving all reachable (alive) objects. Now, the 53 premise is that most objects are shortlived 52 so at the end the cost may be smaller than 51 that of a non-GC system. One million objects 50 allocated and freed (forgotten) on a single 49 GC run amount to no extra cost.

Conclusion: In 48 GC languages you can create all local objects 47 on the heap. They are cheap. On the other 46 hand, in non-GC systems, a bunch of allocations, deallocations 45 and new allocations is expensive. The memory 44 is fragmented and the cost of malloc increases... In 43 non-GC systems you should use the stack 42 as much as possible, using the heap out 41 of necessity.

That has another implication. People 40 used to one of the two memory systems will 39 tend to write inefficient programs in the 38 other. They are used to some idioms that 37 are probably bad on the other system.

A clear 36 example is a non-managed programmer that 35 is used to allocate an object and reuse 34 (reset its internal pointers with new elements 33 as required) is used to that way of thinking: allocation 32 is expensive, reusing is cheap. Now, if 31 the same exact code is moved to a generational 30 GC environment (Java, .net - both are move-generational-GC), you 29 get a funny effect. In Java generational 28 GC the system will perform minor collections 27 only on the younger generations, only processing 26 older generations in full collections. But 25 an object in the young generation could 24 be referred to by objects in the older generation, so 23 extra work has to be performed to keep track 22 of this old-to-young references. In particular 21 in Java 1.4.1 garbage collector the system will mark the memory card (sub-part 20 of page) where the old object resides and 19 it then includes all the marked cards for 18 processing during the minor collection, effectively 17 increasing the amount of work that the GC 16 has to perform and possibly impacting performance.

The 15 object was alive during 1, 2, 3... GC runs, and 14 it was moved that many times around, finally 13 is moved to the old generation where it 12 will not be moved in each GC run but can 11 just stand there... but alas, the programmer 10 forces the object to become a younger. It 9 is moved once again, and it will again be 8 moved each time the GC runs up to the time 7 where it becomes old again.

To make a sensible 6 comparison, you would need to get to programmers 5 that know the environment write different pieces 4 of code that solve the same problem with 3 the same algorithms with different mind 2 sets about memory management. Then compare 1 the results of both of them.

Score: 1

GC will always be slower than the extreme 12 alternative: perfect, non-deterministic 11 memory management.

The questions are:

  • Are the differences large enough to quibble about?
  • Are the drawbacks of one technique enough to cause us to seriously consider the other?

There 10 are other areas in which managed subsystems 9 have won out over unmanaged ones:

In general, a 8 program will always run slower on a multitasking 7 operating system than on a uni-tasking one 6 -- or a computer with no OS.

In general, a 5 program will always run slower on a system 4 with virtual memory than on one without.

Except 3 in extreme circumstances, do we seriously 2 consider computer systems without VM and 1 without an OS?

Score: 1

After discussion on another answer, a disadvantage 69 which turns out is that GC can alter the 68 asymptotic complexity. Soru's comment stated it implicitly 67 without examples, and one comment is not 66 enough to explain it. Thanks to Jon Harrop 65 for the example he suggested and useful 64 comments on this answer. However, a good 63 GC should still amortize the costs (given 62 enough memory, as always), as I explain 61 in the end.

With a GC system, when your nice 60 O(N) code turns out to trash the GC in a 59 pathological way that makes it O(heap size), it 58 can be harder to work out what is going 57 wrong.

First, this happens often when the 56 size of the working set is close to the 55 maximum heap size. The GC is invoked too 54 often and thus everything slows down. Install 53 the Scala plugin for Eclipse and you'll 52 feel it.

Usually, however, having enough 51 memory space and a generational GC prevent 50 it, if your algorithm just produce lots 49 of garbage quickly detectable as such: it 48 won't survive long enough to get out of 47 the nursery.

Here's an example for an exception 46 to that: let us take "map f list", and 45 assume each application of f does consume 44 live memory, by saving a reference to the 43 returned value in a hash-table. The asymptotic 42 complexity, here, should still be O(1).

The 41 generational GC won't be able to free memory 40 by collecting the nursery, thus a major 39 collection (O(heap content)) is invoked 38 somewhat periodically. Hence, we get that 37 runtime is at least proportional to (heap 36 content size) * n * (space consumed by each 35 call to f) / (nursery size).

The GC will 34 actually increase nursery size up to the 33 specified maximum, and then the above happens 32 again for n big enough. However, if the 31 specified maximum is Big-Theta(max. heap 30 size) and thus Omega(heap content size), major 29 collections become infrequent, and the cost 28 of minor collections is proportional to 27 produced data (and thus to the runtime needed 26 to produce them). This is similar to when 25 you need to copy a container to resize it: by 24 growing it enough, you can amortize the 23 cost of copying (or of GC) and make it comparable 22 to the cost needed to cause the copy.

This 21 whole answer does not take in consideration 20 the issue of pause times - collections become 19 infrequent but are long. It implicitly assumes 18 that stack scanning time is constant - but 17 it should be indeed, unless your algorithm 16 is non-tail-recursive and therefore risks 15 problems anyway on the big inputs considered 14 here.

Finally, it is not about incremental 13 garbage collectors. They solve the problem 12 at the root, but they are used mostly for 11 real-time GC, because of the overhead they 10 add to memory reads. Azul System solved 9 this on their own custom HW with their Pauseless 8 GC, thanks to HW support to optimize this 7 overhead. They also recently claim to have 6 solved the problem also for x86, see this "press release" and 5 this article. But it's definitely under development 4 and does not run without changes to the 3 OS. When that's done, and if the performance 2 is as they suggest, maybe the problem will 1 be solved also for us normal mortals.

Score: 1

As an experiment to measure the impact on 50 GC languages, some people took some Java 49 programs, traced the execution, and then 48 replaced garbage collection with explicit 47 memory management. According to this review 46 of the article on Lambda the ultimate, they 45 found out that GC was always slower.

Well, no. They 44 took measurements of a handful of object 43 oriented Java programs running on a non-production 42 research VM (Jikes) that uses really unusual 41 GC algorithms, all of which were decades 40 out of date. Then they profiled each program 39 for a given input and worked out the earliest 38 point each free could have been inserted (i.e. resulting 37 in a program that is no longer generally 36 correct because on other inputs it can free 35 too early).

I don't have a problem with any 34 of that: their results are mildly interesting 33 but do not surprise me (Java is an exceptionally 32 bad language for this and the GC algorithms 31 they used were toys compared to modern production 30 VMs like Hotspot or .NET). However, they 29 pretended that these results generalize 28 which is absolutely ludicrous. There is 27 no reason to believe that their toy GCs 26 are representative of production GCs or 25 that OOP is representative of all programs 24 or that their "oracular" memory management 23 is representative of manual memory management.

Virtual 22 memory issues made GC look even worse, since 21 the collector regularly touches way more 20 memory pages than the program itself at 19 that point, and therefore causes a lot of 18 swapping.

Yes. Don't do that.

This is all 17 experimental to me. Has anybody, and in 16 particular in the context of C++, performed 15 a comprehensive benchmark of GC performance 14 when comparing to explicit memory management?

C++ cannot 13 express efficient GC algorithms (e.g. there 12 is no way to walk that stack or change the 11 calling convention) so it will never be 10 competitive at this. If you must use C++, forget 9 about GC. If you want a decent GC, forget 8 about C++.

Particularly interesting would 7 be to compare how various big open-source 6 projects, for example, perform with or without 5 GC. Has anybody heard of such results before?

That 4 is impossible to do in a useful way because 3 the presence or absence a GC is a fundamental 2 design decision that everything else is 1 built upon.

Score: 0

In theory, a well profiled program can inform 10 an intelligent GC subsystem to attain the 9 described speedups over manually memory 8 management. These speedups may not be visible 7 without long runtimes, to amortize the fixed 6 startup cost of GC.

In practice, you will 5 likely NOT realize these speedups with present-day 4 GC implementations. Furthermore, you will 3 NOT get a definitive answer, because there 2 will always be pathologically bad scenarios 1 for both cases.

Score: 0

One pragmatic issue is that with explicit 7 MM it is generally a lot easier to profile, identify 6 the bottleneck, and resolve it.

With a GC 5 system, when your nice O(N) code turns out 4 to trash the GC in a pathological way that 3 makes it O(heap size), it can be harder 2 to work out what is going wrong. Sometimes 1 even as hard as fixing a memory corruption.

Score: 0

The fundamental difference between gc and 42 any malloc implementation is that gc keeps 41 track of allocated objects, while malloc basically 40 keeps track of deallocated objects (via "free 39 lists", etc., which needed to collect freed 38 memory blocks to quickly return them on 37 following malloc's) - some malloc implementations 36 even have no possibility (in their internals) to 35 enumerate all allocated blocks by design. As 34 a consequence any possible gc implementation 33 (no matter how well it may be), would always 32 has complexity O(somefunc(N)), where N is 31 a count of allocated objects, while most 30 malloc implementations have complexity O(1). Thus 29 when number of (simultaneously standing) allocated 28 objects increase more and more, the performance 27 degradation of any gc is inevitable (but 26 of course performance can be traded for 25 extra memory consumption). [After all it 24 obvious that free memory blocks always has 23 zero maintenance overhead in contrast with 22 allocated blocks/objects.] This is fundamental 21 flaw of any gc, as it collectsmaintains garbage :), while 20 malloc maintains nongarbage (:.

P.S. By malloc I mean 19 not just specific so-called C function, but 18 rather any explicit memory allocation routine. Also, I'd 17 like to note, that programming languages 16 without built-in gc offers many ways to 15 wrap around explicit malloc/free (new/delete) calls 14 (for example, std::shared_ptr in C++ or 13 ARC in Obj-C) that makes program code looks 12 similar to as in gc-powered languages, but 11 from performance point of view it is much 10 closer (almost equivalent) to explicit memory 9 allocation. (I know that even simple reference 8 counting can be considered as a form of 7 garbage collection, but in this post by 6 gc I mean any some more feature rich implementation 5 (at least with automatic detection of reference 4 cycles), which require a keeping track of 3 all allocated objects, so I doesn't consider 2 wrappers like std::shared_ptr as gc (at 1 least in this post)).

More Related questions