The .NET Garbage Collector is a generational collector, meaning it collects objects in different generations. The main purpose of generations is performance. Simply put, having the GC perform a full garbage collection on every object reference tree on every collection is way too expensive. The idea behind generational collections is that having a segmented collection, the GC will visit more often those objects with a short lifespan, than those with a long lifespan. Given the fact that most objects are short-lived, such as local variables and parameters (they'll go out of scope relatively faster), we can collect memory a lot faster and more efficiently if we keep collecting these kind of objects with a shorter frequency than objects with a long lifespan.
So how does the GC determine the lifespan of an object? Does it even do that? How does GC manage its generations? What exactly is a generation?
GC Roots and Generations
Generations are logical views of the Garbage Collector Heap (read Part2). The GC has 3 generations: gen-0, gen-1 and gen-2. Gen-0 and gen-1 are known as ephemeral generations because they store small objects that have a short lifespan. GC generations also have precedence, so gen-0 is said to be a younger generation than gen-1, and gen-1 is a younger generation than gen-2. When objects in a generation are collected, all younger generations are also collected:
- Collect (gen-0) = Collect (gen-0)
- Collect (gen-1) = Collect (gen-0) + Collect (gen-1)
- Collect (gen-2) = Collect (gen-1) + Collect (gen-2)
Gen-0 and gen-1 collections are very fast since the heap segment is relatively small while gen-2 collections can be relatively slow. Gen-2 collection is also referred to as Full Garbage Collection because the whole heap is collected. When it’s time for gen-2 collection, the GC stops the program execution and finds all the roots into the GC heap. Departing from each root the GC will visit every single object and also track down every pointer and references to other objects contained in every one of the visited objects and marking all them as it moves through the heap. At the end of the process, the GC has found all reachable objects in the system, so everything else can be collected because it is unreachable. The golden rules GC collections never break are:
- GC collects all objects that are not roots.
- GC collects all objects that are not reachable from GC roots.
In Garbage Collection Part 2, I briefly talked about GC roots but not with much depth. There are many different categories or types of GC roots, but the more common and significant ones are static variables, global variables and objects living in the Stack that are pointing to the Heap. Here is kind of a messy idea of how things look when it comes to GC roots:
Now, in reality the heap doesn’t look that bad nor it’s organized in such a hectic manner. The picture is meant to illustrate the GC roots pointers to the heap. Later on this article I’ll cover the innards of the heap in more details.
Every time GC collects, objects that survive the generational collection just completed (because they can be reached from at least one of the GC roots) gets promoted to an older (higher) generation. This promotion mechanism ensures on each GC cycle that the younger the generation, the shorter the lifetime of the objects in it. The GC generational object promotion works is rather simple and works as follows:
- Objects that survive gen-0 collection will be considered gen-1 objects and GC will attempt collecting them when it runs gen-1 collection.
- Objects that survive gen-1 collection will be considered gen-2 objects and GC will attempt collecting them when it runs gen-2 collection.
- Objects that survive gen-2 collection will be still considered gen-2 objects until they are collected.
- Only GC can promote objects between generations. Developers are only allowed to allocate objects to gen-0 and the GC will take care of the rest.
GC Heap Structure
As mentioned in a previous article, the managed heap is logically divided into 2 heaps, the Small Objects Heap (SOH) and the Large Object Heap (LOH), where the memory is allocated in segments. The next figure is a more accurate view of the managed heap (contrasting the previous figure)
Because the objects collected during gen-0 and gen-1 have a short lifespan, these 2 generations are known as the ephemeral generations. All objects collected by gen-0 and gen-1 are also allocated in the ephemeral memory segment. The ephemeral segment is always the newest segment acquired by the GC. Every time the GC requests the OS for more memory and a new segment is allocated, the new segment becomes the ephemeral segment and the old ephemeral segment gets designated for gen-2 objects.
Facts about GC generations
- The GC has 3 generations gen-0, gen-1 and gen-2.
- Gen-0 and gen1 are known as ephemeral collections.
- Gen-2 collections are known as Full Garbage Collection.
- Objects that survive collections get promoted to higher generations.
- Gen-2 collections are a lot more expensive and happen less often than gen-0 and gen1 collections.
- The managed heap is logically divided into the SOH and the LOH.
- Memory is allocated in segments on the manged heap.
- Always the newest segment allocated is the ephemeral segment.
Side reading: Check out this great article by Maoni Stephens titled Large Object Heap Uncovered, where she talks about many of the topics in this article.