[bitc-dev] Pools in lieu of GC
Jonathan S. Shapiro
shap at eros-os.com
Wed Mar 4 22:29:33 EST 2009
On Wed, Mar 4, 2009 at 9:22 PM, Pal-Kristian Engstad
<pal_engstad at naughtydog.com> wrote:
> Jonathan S. Shapiro wrote:
>> What people tend to forget is that GC is just as fast as hand allocation.
> I wondered about this statement -- how can that be true? If we assume
> that the programmer did some careful analysis of the code, determining
> when memory can be reclaimed, how can that be as fast as having code
> (albeit with type information) that scans memory to see if it is still
> reachable? Mind you, I'm not arguing that it can be done with fairly low
> overhead, but there still must be some overhead? So can you try to
> substantiate that claim, Jonathan?
GC is a big topic. I'm fairly well read on it, but I don't consider
myself a real expert. That being said, let me compare one commonly
used GC scheme with manual allocation, which may give you a sense.
A manual allocator incurs a *very* large cost per allocation and per
free. malloc() must find an appropriately sized block to allocate.
Both operations must [de]coalesce blocks of memory to maintain those
blocks. The list management alone is surprisingly complicated, and
list traversals in general have surprisingly bad cache behavior (your
neighbor is almost never in cache). So the overhead of these
operations is quite high, and the procedure call for malloc cannot be
avoided. The cost of that procedure call alone (the call itself, not
the internal execution) is already 10x the cost of allocating in a
modern GC system.
The reason is that GC-based allocators do not need to deal with
fragmentation. An allocate consists of checking whether the current
heap is big enough (if not, invoke GC), recording the current end of
heap pointer (which is the address of the new object) and adding the
object size to create the new end of heap pointer. That's three to
four instructions, which are invariably inlined, after which they can
be "boxcarred" through standard common subexpression elimination. All
of the cost in a GC system is incurred by the collector. That cost
works out in most programs to be about the same as the combined cost
That was all true when safe languages were very prone to heap
allocation. Modern compilers do optimizations to allocate things on
the stack when possible, and that helps further.
A copying collector is relatively cache friendly where free() is not.
It also has the side effect of improving cache and TLB locality in
comparison to a malloc()/free() heap, which actually improves
More information about the bitc-dev