Viewing entries tagged
garbage collection

1 Comment

Garbage Collection – Pt 3: Generations

The .NET Garbage Collector is a generational collector, meaning it collects objects in different generations. The main purpose of generations is performance. Simply put, having the GC perform a full garbage collection on every object reference tree on every collection is way too expensive. The idea behind generational collections is that having a segmented collection, the GC will visit more often those objects with a short lifespan, than those with a long lifespan. Given the fact that most objects are short-lived, such as local variables and parameters (they'll go out of scope relatively faster), we can collect memory a lot faster and more efficiently if we keep collecting these kind of objects with a shorter frequency than objects with a long lifespan.

So how does the GC determine the lifespan of an object? Does it even do that? How does GC manage its generations? What exactly is a generation?

GC Roots and Generations

Generations are logical views of the Garbage Collector Heap (read Part2). The GC has 3 generations: gen-0, gen-1 and gen-2. Gen-0 and gen-1 are known as ephemeral generations because they store small objects that have a short lifespan. GC generations also have precedence, so gen-0 is said to be a younger generation than gen-1, and gen-1 is a younger generation than gen-2. When objects in a generation are collected, all younger generations are also collected:

  • Collect (gen-0) = Collect (gen-0)
  • Collect (gen-1) = Collect (gen-0) + Collect (gen-1)
  • Collect (gen-2) = Collect (gen-1) + Collect (gen-2)

Gen-0 and gen-1 collections are very fast since the heap segment is relatively small while gen-2 collections can be relatively slow. Gen-2 collection is also referred to as Full Garbage Collection because the whole heap is collected. When it’s time for gen-2 collection, the GC stops the program execution and finds all the roots into the GC heap. Departing from each root the GC will visit every single object and also track down every pointer and references to other objects contained in every one of the visited objects and marking all them as it moves through the heap. At the end of the process, the GC has found all reachable objects in the system, so everything else can be collected because it is unreachable. The golden rules GC collections never break are:

  1. GC collects all objects that are not roots.
  2. GC collects all objects that are not reachable from GC roots.

In Garbage Collection Part 2, I briefly talked about GC roots but not with much depth. There are many different categories or types of GC roots, but the more common and significant ones are static variables, global variables and objects living in the Stack that are pointing to the Heap. Here is kind of a messy idea of how things look when it comes to GC roots:

Now, in reality the heap doesn’t look that bad nor it’s organized in such a hectic manner. The picture is meant to illustrate the GC roots pointers to the heap. Later on this article I’ll cover the innards of the heap in more details.

Every time GC collects, objects that survive the generational collection just completed (because they can be reached from at least one of the GC roots) gets promoted to an older (higher) generation. This promotion mechanism ensures on each GC cycle that the younger the generation, the shorter the lifetime of the objects in it. The GC generational object promotion works is rather simple and works as follows:

  1. Objects that survive gen-0 collection will be considered gen-1 objects and GC will attempt collecting them when it runs gen-1 collection.
  2. Objects that survive gen-1 collection will be considered gen-2 objects and GC will attempt collecting them when it runs gen-2 collection.
  3. Objects that survive gen-2 collection will be still considered gen-2 objects until they are collected.
  4. Only GC can promote objects between generations. Developers are only allowed to allocate objects to gen-0 and the GC will take care of the rest.

GC Heap Structure

As mentioned in a previous article, the managed heap is logically divided into 2 heaps, the Small Objects Heap (SOH) and the Large Object Heap (LOH), where the memory is allocated in segments. The next figure is a more accurate view of the managed heap (contrasting the previous figure)

Because the objects collected during gen-0 and gen-1 have a short lifespan, these 2 generations are known as the ephemeral generations. All objects collected by gen-0 and gen-1 are also allocated in the ephemeral memory segment. The ephemeral segment is always the newest segment acquired by the GC. Every time the GC requests the OS for more memory and a new segment is allocated, the new segment becomes the ephemeral segment and the old ephemeral segment gets designated for gen-2 objects.

Facts about GC generations

  1. The GC has 3 generations gen-0, gen-1 and gen-2.
  2. Gen-0 and gen1 are known as ephemeral collections.
  3. Gen-2 collections are known as Full Garbage Collection.
  4. Objects that survive collections get promoted to higher generations.
  5. Gen-2 collections are a lot more expensive and happen less often than gen-0 and gen1 collections.
  6. The managed heap is logically divided into the SOH and the LOH.
  7. Memory is allocated in segments on the manged heap.
  8. Always the newest segment allocated is the ephemeral segment.

Side reading: Check out this great article by Maoni Stephens titled Large Object Heap Uncovered, where she talks about many of the topics in this article.

1 Comment


Garbage Collection - Pt 2: .NET Object Life-cycle

This article is a continuation of Garbage Collection - Pt 1: Introduction. Like everything else in this world, objects in object-oriented programming have a lifetime from when they are born (created) to when they die (destroyed or destructed). In the .Net Framework objects  have the following life cycle:

  1. Object creation (new keyword, dynamic instantiation or activation, etc).
  2. The first time around, all static object initializers are called.
  3. The runtime allocates memory for the object in the managed heap.
  4. The object is used by the application. Members (Properties/Methods/Fields) of the object type are called and used to change the object.
  5. If the developer decided to add disposing conditions, then the object is disposed. This happens by coding a using statement or manually calling to the object’s Dispose method for IDisposable objects.
  6. If the object has a finalizer, the GC puts the object in the finalization queue.
  7. If the object was put in the finalization queue, the GC will, at an arbitraty moment in time, call the object’s finalizer.
  8. Object is destroyed by marking its memory section in the heap segment as a Free Object.

The CLR Garbage Collector intervenes in the most critical steps in the object lifecycle; GC is almost completely managing steps 3, 6, 7 and 8 in the life of an object. This is one of the main reasons I believe the GC is one of the most critical parts of the .NET Framework that is often overlooked and not fully understood.

These are the key concepts and agents that participate in the life of a .NET object that should be entirely comprehended:

The Managed Heap and the Stack

The .Net Framework has a managed heap (aka GC Heap), which is nothing more than an advance data structure that the GC uses when allocating reference type objects (mostly). Each process has one heap that is shared between all .NET application domains running within the process.

Similarly, the thread Stack is another advance data structure that tracks the code execution of a thread. There is on stack per thread per process.

They are often compared to a messy pile of laundry (GC heap) and an organized shoe rack (Stack). Ultimately they are both used to store objects with some distinctions. There are 4 things that we store on the Stack and Heap as the code executes:

  1. Value Types: Go on heap or stack, depending on where they were declared.
    • If a Value Type is declared outside of a method, but inside a Reference Type it will be placed within the Reference Type on the Heap.
    • If a Value Type is boxed it’ll be placed in the Heap.
    • If the Value Type is within an iterator block, it’ll be placed in the Heap.
    • Else goes in the Stack.
  2. Reference Types: Always go on the Heap.
  3. Pointers: Go on heap or stack, depending on where they were declared.
  4. Instructions: Always go on the Stack.

Facts about the Heap

  1. The managed heap is a fast data structure that allows for fast access to any object in the heap regardless of when was inserted or where it was inserted within the heap.
  2. The managed heap is divided into heap segments. Heap segments are physical chunks of memory the GC reserves from the OS on behalf of CLR managed code.
  3. The 2 main segments of the heap are:
    1. The Small Objects Heap (SOH) segment: where small objects (less than 85 Kb) are stored. The SOH is also known as the ephemeral segment.
    2. The Large Objects Heap (LOH) segment: where large objects (more than 85 Kb) are stored.
  4. All reference types are stored on the heap.

Facts about the Stack

  1. The Stack is a block of pre-allocated memory (usually 1MB) that never changes in size.
  2. The Stack is an organized storage structure where grabbing the top item is O(1).
  3. Objects stored in the Stack are considered ROOTS of the GC.
  4. Objects stored in the Stack have a lifetime determined by their scope.
  5. Objects stored in the Stack are NEVER collected by the GC. The storage memory used by the Stack gets deallocated by popping items from the Stack, hence its scoping significance.

Now you start to see how it all comes together. The last magic rule that glues them in harmony is that the objects the garbage collector collects are those that:

  1. Are NOT GC roots.
  2. Are not accessible by references from GC roots.

Object Finalizers

Finalizers are special methods that are automatically called by the GC before the object is collected. They can only be called by the GC provided they exist. The .NET ultimate base class Object has a Finalize method that can be overridden by child objects (anyone basically). The purpose of finalizers is to ensure all unmanaged resources the object may be using are properly cleaned up prior to the end of the object lifetime.

If a type has an implemented (overridden) finalizer at the time of collection, the GC will first put the object in the finalization queue, then call the finalizer and then the object is destroyed.

Finalizers are not directly supported by C# compilers; instead you should use destructors using the ~ character, like so:

The CLR implicitly translates C# destructors to create Finalize calls.

Facts about Finalizers

  1. Finalizers execution during garbage collection is not guaranteed at any specific time, unless calling a Close or a Dispose method.
  2. Finalizers of 2 different objects are not guaranteed to run in any specific order, even if they are part of object composition.
  3. Finalizers (or destructors) should ONLY be implemented when your object type directly handles unmanaged resources. Only unmanaged resources should be freed up inside the finalizer.
  4. Finalizers must ALWAYS call the base.Finalize() method of the parent (this is not true for C# destructors, that do this automatically for you)
  5. C# does not support finalizers directly, but it supports them through C# destructors.
  6. Finalizers run in arbitrary threads selected or created by the GC.
  7. The CLR will only continue to Finalize objects in the finalization queue <=> the number of finalizable objects in the queue continues to decrease.
  8. All finalizer calls might not run to completion if one of the finalizers blocks indefinetly (in the code) or the process in which the app domain is running terminates without giving the CLR chance to clean up.

More at MSDN.

Disposable Objects

Objects in .NET can implement the IDisposable interface, whose only contract is to implement the Dispose method. Disposable objects are only created explicitly and called explicitly by the developer, and its main goal is to dispose managed and unmanaged resources on demand (when the developer wants to). The GC never calls the Dispose method on an object automatically. The Dispose() method on a disposable object can only get executed in one  two scenarios:

  1. The developer invokes a direct call to dispose the object.
  2. The developer created the object in the context of the using statement

Given the following disposable object type:

The next alternatives are equivalent:


Facts about Disposable Objects

  1. An object is a disposable object if implements the IDisposable interface.
  2. The only method of the IDisposable interface is the Dispose() method.
  3. On the Dispose method the developer can free up both managed and unmanaged resources.
  4. Disposable objects can be used by calling directly the object.Dispose() method or using the object within a using statement
  5. The Dispose() method on disposable objects will never get called automatically by the GC.

More at MSDN.


Ok, so now that we know the key players, let’s look again at the life of a .NET object. This time around we can understand a bit better what’s going on under the carpet.

  1. Object creation (new keyword, dynamic instantiation or activation, etc).
  2. The first time around, all static object initializers are called.
  3. The runtime allocates memory for the object in the managed heap.
  4. The object is used by the application. Members (Properties/Methods/Fields) of the object type are called and used to change the object.
  5. If the developer decided to add disposing conditions, then the object is disposed. This happens by coding a using statement or manually calling to the object’s Dispose method for IDisposable objects.
  6. If the object has a finalizer, the GC puts the object in the finalization queue.
  7. If the object was put in the finalization queue, the GC will, at an arbitraty moment in time, call the object’s finalizer.
  8. Object is destroyed by marking its memory section in the heap segment as a Free Object.


PS:   A good friend has pointed out in the comments a good article by Eric Lippert titled "The Truth About Value Types". Go check it out!



Garbage Collection - Pt 1: Introduction

Garbage collection is a practice in software engineering and computer science that aims at the automation of memory management in a software program. Its origins date all the way back to 1958 and the Lisp programming language implementation (“Recursive Functions of Symbolic Expressions and their Computation by Machine” by John McCarthy), the first to carry such mechanism. The basic problem it tries to solve is that of re-claiming unused memory in a computer running a program. Software programs rely on memory as the main storage agent for variables, network connections and all other data needed by the program to run; this memory needs to be claimed by the software application in order to be used. But memory in a computer is not infinite, thus we need a way to mark the pieces of memory we are not using anymore as “free” again, so other programs can use them after me. There are 2 main ways to do this, a manual resource cleanup, or an automatic resource cleanup. They both have advantages and disadvantages, but this article will focus on the later since it’s the one represented by the Garbage Collectors.

There are plenty of great articles and papers about garbage collection theory. Starting in this article and with other entries following up, I’ll talk about the garbage collection concepts applied to the .NET Garbage Collector specifically and will cover the following areas:

  1. What is the .NET Garbage Collector? (below in this page)
  2. Object’s lifecycle and the GC. Finalizers and diposable objects  (read Part 2)
  3. Generational Collection (read Part 3)
  4. Garbage collection performance implications (article coming soon)
  5. Best practices (article coming soon)

What is the .NET Garbage Collector?

The .NET Framework Garbage Collector (GC from now on) is one of the least understood areas of the framework, while being one of the most extremely sensitive and important parts of it. In a nutshell, the GC is an automatic memory management service that takes care of the resource cleanup for all managed objects in the managed heap.

It takes away from the developer the micro-management of resources required in C++, where you needed to manually delete your variables to free up memory. It is important to note that GC is NOT a language feature, but a framework feature. The Garbage Collector is the VIP superstar in the .NET CLR, with influence to all .NET languages.

I’m convinced those of you that worked before with C or C++ cannot count how many times you forgot to free memory when it is no longer needed or tried to use memory after you've already disposed it. These tedious and repetitive tasks are the cause of the worst bugs any developer can be faced with, since their consequences are typically unpredictable. They are the cause of resource leaks (memory leaks) and object corruption (destabilization), making your application and system perform in erratic ways at random times.

Some garbage collector facts:

  1. It empowers developers to write applications with less worries about having to free memory manually.
  2. The .NET Garbage Collector is a generational collector with 3 generations (article coming soon).
  3. GC reserves memory in segments. The 2 main GC segments are dedicated to maintain 2 heaps:
    • The Small Objects Heap (SOH), where small objects are stored. The first segment of the small object heap is known as the ephemeral segment and it is where Gen 0 and Gen 1 are maintained.
    • The Large Object Heap (LOH), where Gen 2 and large objects are maintained.
  4. When GC is triggered it reclaims objects that are no longer being used, clears their memory, and keeps the memory available for future allocations.
  5. Unmanaged resources are not maintained by the GC, instead they use the traditional Destructors, Dispose and Finalize methods.
  6. The GC also compresses the memory that is in use to reduce the working space needed to maintain the heap
  7. The GC is triggered when:
    • The system is running low on memory.
    • The managed heap allocation threshold is about to be reached.
    • The programmer directly calls the GC.Collect() method in the code. This is not recommended and should be avoided.
  8. The GC executes 1 thread per logical processor and can operate in two modes: Server and Workstation.
    • Server Garbage Collection: For server applications in need of high throughput and scalability.
      • Server mode can manage multiple heaps and run many GC processes in different threads to maximize the physical hardware capabilities of servers.
      • Server GC also supports collections notifications that allows a server farm infrastructure where the router can re-direct work to a different server when it’s notified that the GC is about to perform collections, and then resume in the main server when GC finishes.
      • The downside of this mode is that all managed code is paused while the GC collection is running.
    • Workstation Garbage Collection: For client applications. This is the default mode of the CLR and the GC always runs in the workstation mode unless otherwise specified.
      • On versions of the framework prior to 4.0, GC used 2 methods of workstation collection: Concurrent and Non-Concurrent where concurrent GC collections allowed other managed threads to keep running and executing code during the garbage collection.
      • Starting in 4.0 the concurrent collection method was replaced (upgraded?) by a Background Collection method. The background collection method has some exiting performance implications (good ones) by adding support to run generations of the GC (including gen2) in parallel, something not supported by the concurrent method (article coming soon).

Here is an interview made in 2007 to Patrick Dussud, the architect of the .NET Garbage Collector: Channel 9 Video

Next, I'll cover up the .NET object lifecycle and the role of the GC in the lifecycle of an object.