Garbage collection is a practice in software engineering and computer science that aims at the automation of memory management in a software program. Its origins date all the way back to 1958 and the Lisp programming language implementation (“Recursive Functions of Symbolic Expressions and their Computation by Machine” by John McCarthy), the first to carry such mechanism. The basic problem it tries to solve is that of re-claiming unused memory in a computer running a program. Software programs rely on memory as the main storage agent for variables, network connections and all other data needed by the program to run; this memory needs to be claimed by the software application in order to be used. But memory in a computer is not infinite, thus we need a way to mark the pieces of memory we are not using anymore as “free” again, so other programs can use them after me. There are 2 main ways to do this, a manual resource cleanup, or an automatic resource cleanup. They both have advantages and disadvantages, but this article will focus on the later since it’s the one represented by the Garbage Collectors.

There are plenty of great articles and papers about garbage collection theory. Starting in this article and with other entries following up, I’ll talk about the garbage collection concepts applied to the .NET Garbage Collector specifically and will cover the following areas:

  1. What is the .NET Garbage Collector? (below in this page)
  2. Object’s lifecycle and the GC. Finalizers and diposable objects  (read Part 2)
  3. Generational Collection (read Part 3)
  4. Garbage collection performance implications (article coming soon)
  5. Best practices (article coming soon)

What is the .NET Garbage Collector?

The .NET Framework Garbage Collector (GC from now on) is one of the least understood areas of the framework, while being one of the most extremely sensitive and important parts of it. In a nutshell, the GC is an automatic memory management service that takes care of the resource cleanup for all managed objects in the managed heap.

It takes away from the developer the micro-management of resources required in C++, where you needed to manually delete your variables to free up memory. It is important to note that GC is NOT a language feature, but a framework feature. The Garbage Collector is the VIP superstar in the .NET CLR, with influence to all .NET languages.

I’m convinced those of you that worked before with C or C++ cannot count how many times you forgot to free memory when it is no longer needed or tried to use memory after you've already disposed it. These tedious and repetitive tasks are the cause of the worst bugs any developer can be faced with, since their consequences are typically unpredictable. They are the cause of resource leaks (memory leaks) and object corruption (destabilization), making your application and system perform in erratic ways at random times.

Some garbage collector facts:

  1. It empowers developers to write applications with less worries about having to free memory manually.
  2. The .NET Garbage Collector is a generational collector with 3 generations (article coming soon).
  3. GC reserves memory in segments. The 2 main GC segments are dedicated to maintain 2 heaps:
    • The Small Objects Heap (SOH), where small objects are stored. The first segment of the small object heap is known as the ephemeral segment and it is where Gen 0 and Gen 1 are maintained.
    • The Large Object Heap (LOH), where Gen 2 and large objects are maintained.
  4. When GC is triggered it reclaims objects that are no longer being used, clears their memory, and keeps the memory available for future allocations.
  5. Unmanaged resources are not maintained by the GC, instead they use the traditional Destructors, Dispose and Finalize methods.
  6. The GC also compresses the memory that is in use to reduce the working space needed to maintain the heap
  7. The GC is triggered when:
    • The system is running low on memory.
    • The managed heap allocation threshold is about to be reached.
    • The programmer directly calls the GC.Collect() method in the code. This is not recommended and should be avoided.
  8. The GC executes 1 thread per logical processor and can operate in two modes: Server and Workstation.
    • Server Garbage Collection: For server applications in need of high throughput and scalability.
      • Server mode can manage multiple heaps and run many GC processes in different threads to maximize the physical hardware capabilities of servers.
      • Server GC also supports collections notifications that allows a server farm infrastructure where the router can re-direct work to a different server when it’s notified that the GC is about to perform collections, and then resume in the main server when GC finishes.
      • The downside of this mode is that all managed code is paused while the GC collection is running.
    • Workstation Garbage Collection: For client applications. This is the default mode of the CLR and the GC always runs in the workstation mode unless otherwise specified.
      • On versions of the framework prior to 4.0, GC used 2 methods of workstation collection: Concurrent and Non-Concurrent where concurrent GC collections allowed other managed threads to keep running and executing code during the garbage collection.
      • Starting in 4.0 the concurrent collection method was replaced (upgraded?) by a Background Collection method. The background collection method has some exiting performance implications (good ones) by adding support to run generations of the GC (including gen2) in parallel, something not supported by the concurrent method (article coming soon).

Here is an interview made in 2007 to Patrick Dussud, the architect of the .NET Garbage Collector: Channel 9 Video

Next, I'll cover up the .NET object lifecycle and the role of the GC in the lifecycle of an object.

4 Comments