Fresh Hacker News | Garbage Collection Is Useful

▲Garbage Collection Is Useful(dubroy.com)

97 points by surprisetalk 7 hours ago | 6 comments

My favorite quote from Alan Perlis: “Symmetry is a complexity-reducing concept (co-routines include subroutines); seek it everywhere.”

▲summa_tech 1 hour ago

Unfortunately, the symmetry provided by higher levels of your software stack is not always reflected by the lower levels. This causes your symmetry to be created as an abstraction. (Subroutines are more efficient to implement than co-routines on real hardware.) Abstractions are leaky and this, in turn, causes the superficial symmetry to merely mask the underlying asymmetry.

If done exceptionally well, the only visible evidence of the leak will be substantial performance disparities between seemingly symmetric features (well done! this is unusual).

If done with the normal level of software design quality, the evidence will show up as quirky behavior, imperfect symmetry, "well this one always works but the other one is not reentrant", etc.

▲kazinator 1 hour ago

Reference counting does not trace dead objects. Most of its activity is concerned with maintaining reference counts on live objects.

When a reference count hits zero, that's when refcounting begins to be concerned with a dead object; and that part of its operation corresponds to the sweep activity in garbage collection, not to the tracing of live objects.

It is not a dual to garbage collection concerned with its negative spaces; it's simply a less general (we could legitimately say lesser) garbage collection that doesn't deal with cycles on its own.

▲moregrist 44 minutes ago

> it's simply a less general (we could legitimately say lesser) garbage collection that doesn't deal with cycles on its own.

There are different implementation and performance trade-offs associated with both. I’ll focus on the two that are most meaningful to me.

Reference counting can be added as a library to languages that don’t want or can’t have a precise garbage collector. If you work in C++ (or Rust), it’s a very viable way to assure that you have some measure of non-manual clean up while maintaining precise resource control.

Similarly, when performance matters reference counting is essentially deterministic much easier to understand and model.

In a lot of situations, garbage collection is an overall better strategy, but it’s not a strict superset, and not always the right choice.

▲alimw 4 hours ago

Is it possible that by knowing less about garbage collection in Java this person might have arrived at the same solution earlier? After all his initial construction of a tracing garbage collector was wasted effort.

▲pdubroy 3 hours ago

(OP here) It’s possible, but I doubt it. Perhaps the way I wrote it makes it sound like I was thinking about it as a GC problem from the beginning, but I wasn’t. It wasn’t until I started seeing it as as being like GC that (a) I realized that my naive solution was akin to tracing GC, and (b) I came up with the reference counting solution.

▲ 5 hours ago

▲cmrdporcupine 5 hours ago

Had similar epiphanies some many years ago (ugh, I'm old) when I was playing around writing a garbage collected persistent (in the 'stored on [spinny spinny] disk' sense not the FP sense of the word) programming language / runtime. This was back when it was roughly infeasible to be holding large "worlds" of objects purely in-memory on machines of the style of the time, so intelligently paging objects in and out was imperative. (Aside, I think with the recent doubling... tripling of RAM prices this area of thinking is now again more imperative)...

In any case, if one is doing GC in such a language, a full tracing collector (whether copying or mark & sweep) is madness, as to find live references means walking nearly the entire heap including the portions living in secondary storage, and now you're in a world of pain.

In this case, an intelligent cycle collecting garbage collector in the Bacon style was the answer. You keep in in-memory table of reference counts, and you only trace when you hit cycles. [and hopefully design your language semantics to discourage cycles]

▲lisper 1 hour ago

> you only trace when you hit cycles

How do you tell when you've hit a cycle?

> hopefully design your language semantics to discourage cycles

Why? Cyclical structures can be very useful. For example, it can be very handy in many situations for a contained object to have a back-pointer to its container.

[UPDATE] Two responses have now pointed out that this particular case can be handled with weak pointers. But then I can just point to a general graph as an example where that won't work.

▲zozbot234 1 hour ago

> For example, it can be very handy in many situations for a contained object to have a back-pointer to its container.

That's not a true cycle, it's just a back link for which "weak" reference counts suffice. The containment relation implies that the container "owns" the object, so we don't need to worry about the case where the container might just go away without dropping its contents first.

(Edit: I agree that when dealing with a truly general graph some form of tracing is the best approach. These problem domains are where tracing GC really helps.)

▲lisper 1 hour ago

OK, then I'll pick a different example: a general graph.

▲mwkaufma 46 minutes ago

"How do you apply algo X to a problem which has been narrowly-tailored and/or under-specified to specifically exclude X" isn't exactly a constructive inquiry.

▲vlovich123 1 hour ago

> For example, it can be very handy in many situations for a contained object to have a back-pointer to its container.

Does it frequently need an owning reference though or would a weak reference suffice? Usually the latter situation suffices.

▲lisper 1 hour ago

A fair point, but then you're still putting the burden on the programmer to figure out where a weak reference is appropriate.

But then I'll just choose a different example, like a general graph.

▲nxobject 5 hours ago

I’m curious - did fragmentation end up being a significant issue, whether in memory or offloaded?

▲cmrdporcupine 5 hours ago

I never got far enough to push that into a production system but I suspect it would have, yes.

I can see a periodic compacting phase could be useful in a system like that.

In the DB world there's good research around similar topics. e.g. LeanStore and Umbra -- Umbra in particular does some nice things with variable sized buffers that I believe are expected to help with fragmentation https://db.in.tum.de/~freitag/papers/p29-neumann-cidr20.pdf

▲ClimatePaywall 3 hours ago

[dead]