Friday 6 July 2018

egg Garbage Collection 2

In the first part of this thread, I introduced a less intrusive tracing garbage collector that I'm working on for the egg programming language. One of the questions I left hanging was determining which nodes in the basket are "roots". That is, which nodes are pointed to directly from outside the basket?

Roots and Concurrency

At some stage in the future, I'd like to switch the garbage collection to be concurrent. This poses additional problems. Imagine you are inserting a parent node and a child node into the basket and the parent is a root. How do you do this whilst the garbage collector is concurrently running? You have to be very careful of the order of operation so that the collector doesn't accidentally evict your nodes before you've had the chance to finalise all the links.

One solution is to ensure all the new nodes are roots and then downgrade most of them after linking them together. This implies that downgrading nodes, from root to non-root, needs to be an efficient operation.

Another potential requirement is "locking" nodes when a subsystem other than the garbage collector is using them; you don't want the collector to pull the rug from under their feet.

Hard and Soft References

To try to solve these issues, I've been experimenting with "soft" and "hard" references. These shouldn't be confused with "weak" references; that's an orthogonal concern. A soft reference is a link between two nodes in the same basket; these are the links that the tracing garbage collector follows. A hard reference is a link to a node that uses traditional reference counting. Unlike soft references, hard references do not need to know which node they are pointing from, only where they are pointing to. Indeed, hard references do not even need to be to or from nodes within a basket. In these respects, hard references are similar to "std::shared_ptr" in C++.

In egg, the "Collectable" base class can be the target of both soft and hard references. To be the target of a soft reference, the "Collectable" node must be a member of exactly one basket. When the node was added to it's basket, the basket obtained a single hard reference to it. This single hard reference is maintained no matter how many soft references there are to it from within the basket; it is only relinquished when the node is evicted from the basket.

Here's an overview:
  • When a node is added to a basket, the basket takes a hard reference to it.
  • Nodes can have zero or more additional hard references to them.
  • The garbage collector only considers nodes for eviction if they have a hard reference count equal to one, i.e. the only hard reference to them is from the basket.
  • Nodes that have hard references in addition to the single reference from the basket are considered "roots" and are effectively locked.
  • Nodes can only be added to a basket by supplying a hard reference. This overcomes the race condition causing premature eviction of partially constructed networks.
  • Nodes can be made a "root" simply by creating an external hard reference to it. When the last external hard reference is lost, the node is no longer a "root" and is a candidate for eviction from the basket if it is not accessible from some other root.

Practical Considerations

When I started implementing soft references, I found they are incredibly difficult to construct properly. The reason is that soft references need to know both the source and target of the reference. If you are constructing an instance, "source", of a class that contains a soft reference to another node, "target", you can easily end up creating a reference between "source" and the partially constructed "target". If an exception is thrown within the constructor (or a function called from it) it can be very difficult to untangle the links safely.

The solution I'm using at the moment creates all the links as hard references and then demotes them to soft references later on. This has the added advantage of not adding nodes to the basket at all in the event of an exception being thrown part-way through the initial creation of the network.

Another facility I added to make constructing soft references less error-prone is "basket inference". Usually, the sequence of events is:
  1. Add the target to the basket, if it's not already added
  2. Add the source to the basket, if it's not already added
  3. Create a soft reference from the source to the target
Instead, the basket (which must be the same for both source and target) is inferred where possible; the source or target are added to the basket as necessary. This usually works because it's highly unlikely that neither the source nor the target have been added to a basket already.

No comments:

Post a Comment