Viewing entries tagged
programming languages

Comment

The Myth of the Genius Programmer

http://www.youtube.com/watch?v=0SARbwvhupQ&list=PLCB5CF9838389D7F6&feature=view_all

From Google I/O 2009, here are Brian Fitzpatrick, Ben Collins-Sussman about the fears of programmers and the fear of looking 'stupid'.

Comment

2 Comments

Remembering Dennis Ritchie (RIP)

I discovered just today (01/26/2012) that Dennis Ritchie, one of the more prominent figures in computing, died. How was this possible, the news of the passing of such an important personality in the computer science community went silent for almost 4 months? The media barely covered his death, they something that stunned the world, just the same day: Steve Job’s death. The world is guilty of not showing their respects, just as it did with Steve Jobs. I wanted to write something to acknowledge Dennis Ritchie. Just as Steve change my life for being a believer, a visionary and a leader, Ritchie change it to create the science that supports almost everything I do (and even what Steve did). That’s a tall order to fill up, the word everything is BIG, so let’s check out some of his accomplishments:

  1. Creator of the C programming language. That along earns it quite an array of accolades and influence. Here are some ripple effects of this creation:
    • Without C, there is no C++
    • Without C, there is no Objective C
    • Without C, there no Unit OS, or Windows OS. No Linux, no Symbian, OSX, or iOS. Without these, there would be no Nokias, no Blackberries, no Androids, no iPhones or iPads.
    • Thanks to C, the words ‘programmer’ and ‘software developer’ became ubiquitous in society
  2. Because of C, software engineering and programming was made accessible to the masses, and became a ‘popular’ career path in university programs.
  3. Created the generic operating systems theory that supports the fundamental concepts of all modern operating systems.
  4. Implemented the UNIX OS. Again the ripple effects of this accomplishment are too much to list here.
  5. Author of “The C Programming Language,” also known as “K. & R.”, standing for Kernighan and Ritchie.
  6. Awarded the Turing Award in 1983 for his enormous contributions in the fields of computing and operating systems. The Turing Award is the highest distinction given to a human being for their contribution to the field of Computer Science.

His accomplishments were the foundation to the technology revolution we experienced in the last decades. The gigantic footprint Dennis Ritchie left in the world are in my modest opinion, far-reaching, transformational and revolutionary than anyone that left this world in 2011. He was certainly not the most charismatic and public person in the world, so the media didn’t like him as much as Steve.

This is image is trending today… for some reason 4 months after Ritchie died.

Here is another article making another side by side contribution.

I have nothing but great respect for both of these men. Without their dents in the universe, my life would have been a much different one. If you think Dennis Ritchie deserves your respect, share this with your friends, and let the world know that 2011 was the year when we lost two very important figures, and not just one.

2 Comments

1 Comment

Polymorphism... back to school

Almost anyone with some notion about programming or object-oriented techniques will be able to describe what polymorphism is and how to use it; I've discovered recently that very few can tell you how it actually works. While this may not be relevant to the daily work of many software developers, the consequences of not knowing and thus its misuses, affect everyone. We all know how to describe a car and even how to use it, but only mechanics know how to fix it, because they know how IT WORKS.

.

.

Polymorphism: What it is

Polymorphism is a characteristic or feature of programming languages. Programming languages either support it or they don’t. Since programming languages fall under the umbrella of sometimes substantially different paradigms, in this article I’m going to concentrate in polymorphism within the scope of Object Oriented Programming.

In OOP polymorphism is considered one of the basic principles and a very distinctive one. Most of the object-oriented languages have polymorphism among its many features. In a nutshell, polymorphism is best seen when the caller of an object with polymorphic implementation is not aware of the exact type the object is. Polymorphism interacts very closely with other features like inheritance and abstraction.

.

Polymorphism: What it is NOT

Contrary to an overwhelming number of articles found online, things like overloading polymorphism or parametric polymorphism are NOT object-oriented expressions of polymorphism, they are applicable to functional (declarative) programming languages. Signature-based polymorphism, overloading polymorphism, parasitic polymorphism and polymorphism in closures are meaningful on functional languages only (or at best, multi-paradigm languages with functional expressions and/or duck-typing languages, like Python or JavaScript).

Polymorphism is not boxing or unboxing, and is not the same as inheritance; instead and usually polymorphic manifestations occur are a consequence of inheritance, sub-typing and boxing.

.

Polymorphism: How it works

As its name suggests, polymorphism is the ability of an object to have more than one form, or more properly to look as it is of some other type. The most common manifestation can be seen with sub-typing, let’s see it through an example:

[sourcecode language="csharp"] public class Fruit { public virtual string Description() { return "This is a fruit"; } }

public class FreshApple: Fruit { public override string Description() { return "Fresh Apple"; } }

public class RottenBanana:Fruit { public override string Description() { return "Banana after 10 days"; } } [/sourcecode]

Now we can do something like this:

[sourcecode language="csharp"] Fruit[] fruits = { new FreshApple(), new RottenBanana(), new Fruit() }; foreach (var fruit in fruits) { Console.WriteLine("Fruit -> {0}", fruit.Description()); } Console.ReadLine(); [/sourcecode]

... and that would have the following output:

That’s all well and good and almost anyone will get this far. The question is HOW does this happens, HOW the runtime realizes that the method is must call is the one in the child classes instead of that one in the parent class? How deep this rabbit hole goes? There is no magic element here and you’ll see exactly HOW this occurs and WHY it is important to know about it.

These mappings of function calls to their implementations happen through a mechanism called dispatch tables or virtual tables (vTables for short). Virtual Tables is a feature of programming languages (again, they either support it or they don’t). Virtual Tables are present mostly on languages that support dynamic dispatch (C++, C#, Objective C, Java), meaning they can bound to function pointers or objects at run-time, and they can abstract from the actual implementation of the method/function/object until the very moment they are going to use it.

The vTable itself is nothing more than a data structure, no different from a Stack or a Hashtable for their purpose; it just happen to be the one used for the dynamic dispatch feature in programming languages. Some languages offer a different structure called the Binary Tree Dispatch as an alternative to vTables. As any data structure, they both have pros and cons when it comes to measure performance and cost. The point is that whether it is a vTable or bTree Dispatch, dynamic dispatching is bound using a data structure that is carried over with objects and functions to support this feature. Yes, I said “carried over”.

vTables are a hidden variable of objects. In .NET for example, all classes and structs inherit from Object and Object already has virtual functions like "ToString()" and "GetHashCode()", so every single object you create will have a hidden vTable private structure that is always initialized behind the scenes in the constructor call of every object. The purpose of these vTables being created all over the place is exactly to map the correct functions for polymorphic objects and functions. You can use Reflector to peek over an object at runtime (IL) and you'll find its vTable in the first memory location of each object's memory segment. The IL functions call and callvirt (polymorphism, yay!) will be used to call non-virtual and virtual methods respectively. There is simply no way of accessing an object's vTable directly from C#; this was done on purpose to minimize the security implications of it among other reasons. In C++, hmm...

[sourcecode language="cpp"] long *vptr = (long *) &obj; long *vtable = (long *)*vptr; [/sourcecode]

So the vTable of each object will hold a reference to each one of its virtual functions. Think of it, structurally, as a list of function pointers sorted in the same order they are declared. This serves its purpose on inheritance and polymorphism when a child object gets initialized regardless being assigned to a variable of type parent, the vTable of that object will be the one corresponding the actual inheritor. When virtual functions get called in the variable it will find the correct function in the child objects (actual object type) thanks to the function pointers on their respective vTables.

Virtual tables, inheritance and polymorphism in general have been criticized for their overhead in program execution. That is why one of the object-oriented programming principles is "Favor Object Composition Over Inheritance". Non-virtual functions never require the overhead of vTables making them naturally faster than virtual functions. Applications and systems with a deep inheritance architecture tend to spend a considerably large amount of their execution time just trying to figure out the root path of their virtual functions. This can become quite a performance tax if used without care, specially in older CPU architectures. Most modern compilers will have a trick or two under their sleeves to attempt to resolve virtual function calls at compile time without the need of vTables, but dynamic binding will always need of these creatures to resolve their destination paths.

And now you know how polymorphism works. Happy coding!

1 Comment

1 Comment

Garbage Collection – Pt 3: Generations

The .NET Garbage Collector is a generational collector, meaning it collects objects in different generations. The main purpose of generations is performance. Simply put, having the GC perform a full garbage collection on every object reference tree on every collection is way too expensive. The idea behind generational collections is that having a segmented collection, the GC will visit more often those objects with a short lifespan, than those with a long lifespan. Given the fact that most objects are short-lived, such as local variables and parameters (they'll go out of scope relatively faster), we can collect memory a lot faster and more efficiently if we keep collecting these kind of objects with a shorter frequency than objects with a long lifespan.

So how does the GC determine the lifespan of an object? Does it even do that? How does GC manage its generations? What exactly is a generation?

GC Roots and Generations

Generations are logical views of the Garbage Collector Heap (read Part2). The GC has 3 generations: gen-0, gen-1 and gen-2. Gen-0 and gen-1 are known as ephemeral generations because they store small objects that have a short lifespan. GC generations also have precedence, so gen-0 is said to be a younger generation than gen-1, and gen-1 is a younger generation than gen-2. When objects in a generation are collected, all younger generations are also collected:

  • Collect (gen-0) = Collect (gen-0)
  • Collect (gen-1) = Collect (gen-0) + Collect (gen-1)
  • Collect (gen-2) = Collect (gen-1) + Collect (gen-2)

Gen-0 and gen-1 collections are very fast since the heap segment is relatively small while gen-2 collections can be relatively slow. Gen-2 collection is also referred to as Full Garbage Collection because the whole heap is collected. When it’s time for gen-2 collection, the GC stops the program execution and finds all the roots into the GC heap. Departing from each root the GC will visit every single object and also track down every pointer and references to other objects contained in every one of the visited objects and marking all them as it moves through the heap. At the end of the process, the GC has found all reachable objects in the system, so everything else can be collected because it is unreachable. The golden rules GC collections never break are:

  1. GC collects all objects that are not roots.
  2. GC collects all objects that are not reachable from GC roots.

In Garbage Collection Part 2, I briefly talked about GC roots but not with much depth. There are many different categories or types of GC roots, but the more common and significant ones are static variables, global variables and objects living in the Stack that are pointing to the Heap. Here is kind of a messy idea of how things look when it comes to GC roots:

Now, in reality the heap doesn’t look that bad nor it’s organized in such a hectic manner. The picture is meant to illustrate the GC roots pointers to the heap. Later on this article I’ll cover the innards of the heap in more details.

Every time GC collects, objects that survive the generational collection just completed (because they can be reached from at least one of the GC roots) gets promoted to an older (higher) generation. This promotion mechanism ensures on each GC cycle that the younger the generation, the shorter the lifetime of the objects in it. The GC generational object promotion works is rather simple and works as follows:

  1. Objects that survive gen-0 collection will be considered gen-1 objects and GC will attempt collecting them when it runs gen-1 collection.
  2. Objects that survive gen-1 collection will be considered gen-2 objects and GC will attempt collecting them when it runs gen-2 collection.
  3. Objects that survive gen-2 collection will be still considered gen-2 objects until they are collected.
  4. Only GC can promote objects between generations. Developers are only allowed to allocate objects to gen-0 and the GC will take care of the rest.

GC Heap Structure

As mentioned in a previous article, the managed heap is logically divided into 2 heaps, the Small Objects Heap (SOH) and the Large Object Heap (LOH), where the memory is allocated in segments. The next figure is a more accurate view of the managed heap (contrasting the previous figure)

Because the objects collected during gen-0 and gen-1 have a short lifespan, these 2 generations are known as the ephemeral generations. All objects collected by gen-0 and gen-1 are also allocated in the ephemeral memory segment. The ephemeral segment is always the newest segment acquired by the GC. Every time the GC requests the OS for more memory and a new segment is allocated, the new segment becomes the ephemeral segment and the old ephemeral segment gets designated for gen-2 objects.

Facts about GC generations

  1. The GC has 3 generations gen-0, gen-1 and gen-2.
  2. Gen-0 and gen1 are known as ephemeral collections.
  3. Gen-2 collections are known as Full Garbage Collection.
  4. Objects that survive collections get promoted to higher generations.
  5. Gen-2 collections are a lot more expensive and happen less often than gen-0 and gen1 collections.
  6. The managed heap is logically divided into the SOH and the LOH.
  7. Memory is allocated in segments on the manged heap.
  8. Always the newest segment allocated is the ephemeral segment.

Side reading: Check out this great article by Maoni Stephens titled Large Object Heap Uncovered, where she talks about many of the topics in this article.

1 Comment

5 Comments

Garbage Collection - Pt 2: .NET Object Life-cycle

This article is a continuation of Garbage Collection - Pt 1: Introduction. Like everything else in this world, objects in object-oriented programming have a lifetime from when they are born (created) to when they die (destroyed or destructed). In the .Net Framework objects  have the following life cycle:

  1. Object creation (new keyword, dynamic instantiation or activation, etc).
  2. The first time around, all static object initializers are called.
  3. The runtime allocates memory for the object in the managed heap.
  4. The object is used by the application. Members (Properties/Methods/Fields) of the object type are called and used to change the object.
  5. If the developer decided to add disposing conditions, then the object is disposed. This happens by coding a using statement or manually calling to the object’s Dispose method for IDisposable objects.
  6. If the object has a finalizer, the GC puts the object in the finalization queue.
  7. If the object was put in the finalization queue, the GC will, at an arbitraty moment in time, call the object’s finalizer.
  8. Object is destroyed by marking its memory section in the heap segment as a Free Object.

The CLR Garbage Collector intervenes in the most critical steps in the object lifecycle; GC is almost completely managing steps 3, 6, 7 and 8 in the life of an object. This is one of the main reasons I believe the GC is one of the most critical parts of the .NET Framework that is often overlooked and not fully understood.

These are the key concepts and agents that participate in the life of a .NET object that should be entirely comprehended:

The Managed Heap and the Stack

The .Net Framework has a managed heap (aka GC Heap), which is nothing more than an advance data structure that the GC uses when allocating reference type objects (mostly). Each process has one heap that is shared between all .NET application domains running within the process.

Similarly, the thread Stack is another advance data structure that tracks the code execution of a thread. There is on stack per thread per process.

They are often compared to a messy pile of laundry (GC heap) and an organized shoe rack (Stack). Ultimately they are both used to store objects with some distinctions. There are 4 things that we store on the Stack and Heap as the code executes:

  1. Value Types: Go on heap or stack, depending on where they were declared.
    • If a Value Type is declared outside of a method, but inside a Reference Type it will be placed within the Reference Type on the Heap.
    • If a Value Type is boxed it’ll be placed in the Heap.
    • If the Value Type is within an iterator block, it’ll be placed in the Heap.
    • Else goes in the Stack.
  2. Reference Types: Always go on the Heap.
  3. Pointers: Go on heap or stack, depending on where they were declared.
  4. Instructions: Always go on the Stack.

Facts about the Heap

  1. The managed heap is a fast data structure that allows for fast access to any object in the heap regardless of when was inserted or where it was inserted within the heap.
  2. The managed heap is divided into heap segments. Heap segments are physical chunks of memory the GC reserves from the OS on behalf of CLR managed code.
  3. The 2 main segments of the heap are:
    1. The Small Objects Heap (SOH) segment: where small objects (less than 85 Kb) are stored. The SOH is also known as the ephemeral segment.
    2. The Large Objects Heap (LOH) segment: where large objects (more than 85 Kb) are stored.
  4. All reference types are stored on the heap.

Facts about the Stack

  1. The Stack is a block of pre-allocated memory (usually 1MB) that never changes in size.
  2. The Stack is an organized storage structure where grabbing the top item is O(1).
  3. Objects stored in the Stack are considered ROOTS of the GC.
  4. Objects stored in the Stack have a lifetime determined by their scope.
  5. Objects stored in the Stack are NEVER collected by the GC. The storage memory used by the Stack gets deallocated by popping items from the Stack, hence its scoping significance.

Now you start to see how it all comes together. The last magic rule that glues them in harmony is that the objects the garbage collector collects are those that:

  1. Are NOT GC roots.
  2. Are not accessible by references from GC roots.

Object Finalizers

Finalizers are special methods that are automatically called by the GC before the object is collected. They can only be called by the GC provided they exist. The .NET ultimate base class Object has a Finalize method that can be overridden by child objects (anyone basically). The purpose of finalizers is to ensure all unmanaged resources the object may be using are properly cleaned up prior to the end of the object lifetime.

If a type has an implemented (overridden) finalizer at the time of collection, the GC will first put the object in the finalization queue, then call the finalizer and then the object is destroyed.

Finalizers are not directly supported by C# compilers; instead you should use destructors using the ~ character, like so:

The CLR implicitly translates C# destructors to create Finalize calls.

Facts about Finalizers

  1. Finalizers execution during garbage collection is not guaranteed at any specific time, unless calling a Close or a Dispose method.
  2. Finalizers of 2 different objects are not guaranteed to run in any specific order, even if they are part of object composition.
  3. Finalizers (or destructors) should ONLY be implemented when your object type directly handles unmanaged resources. Only unmanaged resources should be freed up inside the finalizer.
  4. Finalizers must ALWAYS call the base.Finalize() method of the parent (this is not true for C# destructors, that do this automatically for you)
  5. C# does not support finalizers directly, but it supports them through C# destructors.
  6. Finalizers run in arbitrary threads selected or created by the GC.
  7. The CLR will only continue to Finalize objects in the finalization queue <=> the number of finalizable objects in the queue continues to decrease.
  8. All finalizer calls might not run to completion if one of the finalizers blocks indefinetly (in the code) or the process in which the app domain is running terminates without giving the CLR chance to clean up.

More at MSDN.

Disposable Objects

Objects in .NET can implement the IDisposable interface, whose only contract is to implement the Dispose method. Disposable objects are only created explicitly and called explicitly by the developer, and its main goal is to dispose managed and unmanaged resources on demand (when the developer wants to). The GC never calls the Dispose method on an object automatically. The Dispose() method on a disposable object can only get executed in one  two scenarios:

  1. The developer invokes a direct call to dispose the object.
  2. The developer created the object in the context of the using statement

Given the following disposable object type:

The next alternatives are equivalent:

=

Facts about Disposable Objects

  1. An object is a disposable object if implements the IDisposable interface.
  2. The only method of the IDisposable interface is the Dispose() method.
  3. On the Dispose method the developer can free up both managed and unmanaged resources.
  4. Disposable objects can be used by calling directly the object.Dispose() method or using the object within a using statement
  5. The Dispose() method on disposable objects will never get called automatically by the GC.

More at MSDN.

----------------------------------------------------------------------------------------------------------------------------------------------

Ok, so now that we know the key players, let’s look again at the life of a .NET object. This time around we can understand a bit better what’s going on under the carpet.

  1. Object creation (new keyword, dynamic instantiation or activation, etc).
  2. The first time around, all static object initializers are called.
  3. The runtime allocates memory for the object in the managed heap.
  4. The object is used by the application. Members (Properties/Methods/Fields) of the object type are called and used to change the object.
  5. If the developer decided to add disposing conditions, then the object is disposed. This happens by coding a using statement or manually calling to the object’s Dispose method for IDisposable objects.
  6. If the object has a finalizer, the GC puts the object in the finalization queue.
  7. If the object was put in the finalization queue, the GC will, at an arbitraty moment in time, call the object’s finalizer.
  8. Object is destroyed by marking its memory section in the heap segment as a Free Object.

 

PS:   A good friend has pointed out in the comments a good article by Eric Lippert titled "The Truth About Value Types". Go check it out!

5 Comments

4 Comments

Garbage Collection - Pt 1: Introduction

Garbage collection is a practice in software engineering and computer science that aims at the automation of memory management in a software program. Its origins date all the way back to 1958 and the Lisp programming language implementation (“Recursive Functions of Symbolic Expressions and their Computation by Machine” by John McCarthy), the first to carry such mechanism. The basic problem it tries to solve is that of re-claiming unused memory in a computer running a program. Software programs rely on memory as the main storage agent for variables, network connections and all other data needed by the program to run; this memory needs to be claimed by the software application in order to be used. But memory in a computer is not infinite, thus we need a way to mark the pieces of memory we are not using anymore as “free” again, so other programs can use them after me. There are 2 main ways to do this, a manual resource cleanup, or an automatic resource cleanup. They both have advantages and disadvantages, but this article will focus on the later since it’s the one represented by the Garbage Collectors.

There are plenty of great articles and papers about garbage collection theory. Starting in this article and with other entries following up, I’ll talk about the garbage collection concepts applied to the .NET Garbage Collector specifically and will cover the following areas:

  1. What is the .NET Garbage Collector? (below in this page)
  2. Object’s lifecycle and the GC. Finalizers and diposable objects  (read Part 2)
  3. Generational Collection (read Part 3)
  4. Garbage collection performance implications (article coming soon)
  5. Best practices (article coming soon)

What is the .NET Garbage Collector?

The .NET Framework Garbage Collector (GC from now on) is one of the least understood areas of the framework, while being one of the most extremely sensitive and important parts of it. In a nutshell, the GC is an automatic memory management service that takes care of the resource cleanup for all managed objects in the managed heap.

It takes away from the developer the micro-management of resources required in C++, where you needed to manually delete your variables to free up memory. It is important to note that GC is NOT a language feature, but a framework feature. The Garbage Collector is the VIP superstar in the .NET CLR, with influence to all .NET languages.

I’m convinced those of you that worked before with C or C++ cannot count how many times you forgot to free memory when it is no longer needed or tried to use memory after you've already disposed it. These tedious and repetitive tasks are the cause of the worst bugs any developer can be faced with, since their consequences are typically unpredictable. They are the cause of resource leaks (memory leaks) and object corruption (destabilization), making your application and system perform in erratic ways at random times.

Some garbage collector facts:

  1. It empowers developers to write applications with less worries about having to free memory manually.
  2. The .NET Garbage Collector is a generational collector with 3 generations (article coming soon).
  3. GC reserves memory in segments. The 2 main GC segments are dedicated to maintain 2 heaps:
    • The Small Objects Heap (SOH), where small objects are stored. The first segment of the small object heap is known as the ephemeral segment and it is where Gen 0 and Gen 1 are maintained.
    • The Large Object Heap (LOH), where Gen 2 and large objects are maintained.
  4. When GC is triggered it reclaims objects that are no longer being used, clears their memory, and keeps the memory available for future allocations.
  5. Unmanaged resources are not maintained by the GC, instead they use the traditional Destructors, Dispose and Finalize methods.
  6. The GC also compresses the memory that is in use to reduce the working space needed to maintain the heap
  7. The GC is triggered when:
    • The system is running low on memory.
    • The managed heap allocation threshold is about to be reached.
    • The programmer directly calls the GC.Collect() method in the code. This is not recommended and should be avoided.
  8. The GC executes 1 thread per logical processor and can operate in two modes: Server and Workstation.
    • Server Garbage Collection: For server applications in need of high throughput and scalability.
      • Server mode can manage multiple heaps and run many GC processes in different threads to maximize the physical hardware capabilities of servers.
      • Server GC also supports collections notifications that allows a server farm infrastructure where the router can re-direct work to a different server when it’s notified that the GC is about to perform collections, and then resume in the main server when GC finishes.
      • The downside of this mode is that all managed code is paused while the GC collection is running.
    • Workstation Garbage Collection: For client applications. This is the default mode of the CLR and the GC always runs in the workstation mode unless otherwise specified.
      • On versions of the framework prior to 4.0, GC used 2 methods of workstation collection: Concurrent and Non-Concurrent where concurrent GC collections allowed other managed threads to keep running and executing code during the garbage collection.
      • Starting in 4.0 the concurrent collection method was replaced (upgraded?) by a Background Collection method. The background collection method has some exiting performance implications (good ones) by adding support to run generations of the GC (including gen2) in parallel, something not supported by the concurrent method (article coming soon).

Here is an interview made in 2007 to Patrick Dussud, the architect of the .NET Garbage Collector: Channel 9 Video

Next, I'll cover up the .NET object lifecycle and the role of the GC in the lifecycle of an object.

4 Comments

4 Comments

Writing Elegant Code and the Maintainability Index.

Every time I come across a procedure or code file with a 1000 lines of code I just want to sentence the creator to permanent programming abstinence.  To write elegant and maintainable code there are a couple of things you'll want to consider (and learn, if not known). This article will cover some of them, with a focus on Object Oriented Programming. First we shall find a good motive to do this. Why would we want to write elegant code? What is our motivation for code metrics?

  • Improve software quality: Well-known practices on your code will (with high probability) make software more stable and usable.
  • Software Readability: Most software applications are not the sole creation of an individual. Working with teams is always a challenge; code metrics allows teams to standardize the way they work together and read more easily each other’s code.
  • Code flexibility: Applications with a good balance of cyclomatic complexity and design patterns are more malleable and adaptable to small and big changes in requirements and business rules.
  • Reduce future maintenance needs: Most applications need some sort of review or improvement on their lifetime. Coming back to code written a year ago will be easier if the code have good metrics.

To have a good maintainability index in your code you should have a couple of metrics measured up. Ultimately the maintainability index is what works for your specific case.  This is not a fixed industry set of rules, but rather a combination of them that works for the requirement of your organization application and team. Let’s take a look at what I PERSONALLY CONSIDER good metrics for the ultimate maintainability index of software applications:

Cyclomatic Complexity (CC)

  • Cyclomatic Complexity (CC)
  • Very popular code metric in software engineering.
  • Measures the structural complexity of a function.
  • It is created by calculating the number of decision statements and different code paths in the flow of a function.
  • Often correlated to code size.
  • A function with no decisions has a CC = 1, being the best case scenario.
  • A function with CC >= 8 should raise red flags and you should certainly review that code with a critical eye. Always remember, the closest to 1 the better.

Lines of Code (LOC)

  • LOC is the oldest and easier to measure metric.
  • Measures the size of a program by counting the lines of code.
  • Some recommended values by entities (for .NET Languages):
    • Code Files: LOC <=600
    • Classes: LOC <=1000 (after excluding auto-generated code and combining all partial classes)
    • Procedures/Methods/Functions: LOC<=100
    • Properties/Attributes: LOC <=30
  • If the numbers on your application show larger number than the previous values, you should check your code again. A very high count indicates that a type or a procedure it trying to do too much work and you should try to split it up.
  • The higher the LOC numbers the harder the code will be to maintain.

Depth of Nesting

  • Measures the nesting levels in a procedure.
  • The deeper the nesting, the more complex the procedure is. Deep nesting levels often leads to errors and oversights in the program logic.
  • Avoid having too many nested logic cases, look for alternate solutions to deep if then else for foreach switch statements; they loose logic sense in the context of a method when they run too deep.
  • Reading: Vern A. French on Software Quality

Depth of Inheritance (DIT)

  • Measures the number of classes that extend to the root of the class hierarchy at each level.
  • Classes that are located too deep in the inheritance tree are relatively complex to develop and maintain. The more of those you have the more complex and hard to maintain your code will be.
  • Avoid too many classes at deeper levels of inheritance. Avoid deep inheritance trees in general.
  • Maintain your DIT <= 4. A value greater than 4 will compromise encapsulation and increase code density.

Coupling and Cohesion Index (Corey Ladas on Coupling)

  • To keep it short, you always should: MINIMIZE COUPLING, MAXIMIZE COHESION
  • Coupling: aka “dependency” is the degree of reliance a functional unit of software has on another of the same category (Types to Types and DLLs to DLLs).
    • The 2 most important types of coupling are Class Coupling (between object types) and Library Coupling (between DLLs).
    • High coupling (BAD) indicates a design that is difficult to reuse and maintain because of its many interdependencies on other modules.
  • Cohesion: Expresses how coherent a functional unit of software is. Cohesion measures the SEMANTIC strength between components within a functional unit. (Classes and Types within a DLL; properties and methods within a Class)
    • If the members of your class (properties and methods) are strongly-related, then the class is said to have HIGH COHESION.
    • Cohesion has a strong influence on coupling. Systems with poor (low) cohesion usually have poor (high) coupling.

Design Patterns

  • Design Patterns play an extremely important role in the architecture of complex software systems. Patterns are proven solutions to problems that repeat over and over again in software engineering. They provide the general arrangement of elements and communication to solve such problems without doing the same twice.
  • I can only recommend one of the greatest books in this subject called: “Design Patterns: Elements of Reusable Object-Oriented Software” by the Gang of Four. Look it up online, buy it, read it and I’ll guarantee you’ll never look at software architecture and design the same way after.
  • Depending on the needs of the problem, applying proven design patterns like Observer, Command and Singleton patterns will greatly improve the quality, readability, scalability and maintainability of your code.

Triex Index (TI - Performance Execution)

  • Triex Index (TI) measures the algorithmic time complexity (Big O) of object types.
  • TI is more important to be considered for object types that carry a large load of analysis, like collections and workers objects.
  • This metric receives a strong influence from Cohesion and should only be applied to classes and types and not to individual members. The coherence is also very influential on the time complexity coherence between its members.
  • This is a personal metric. I haven't found this type of metrics elsewhere, but I think it's singular and significant to the overall elegance and performance of your code.
  • The Triex Index is calculated by dividing the infinite product of the execution order (Big O) for all members of a class, by n to the power of c-1; where c is the number of members in the class/type.
    • TI > n2 -  Bad. Needs algorithm revision
    • n<TI<n2 -  Regular.
    • TI<=n          -  Good
  • If a member of a class is hurting the overall TI of the class, try splitting its logic into one or more methods with less costly execution orders. Be careful not to harm the coupling and cohesion metrics of the type while doing this step.

The same way I have my preferences, I have my disagreements with some colleagues when it comes to other known code metrics. I consider some of this code metrics a waste of time when measuring code elegance and maintainability:

  • Law of Demeter (for Functions and Methods)
    • The Principle of Least Knowledge is a design guideline I’ll always encourage to use. What I consider unnecessary is the “use only one dot” rule enforcement for functions and methods.
    • “a.b.Method()” breaks the law, where “a.Method()” does not. That's just ridiculous.
  • Weighted Methods Per Class (WMC)
    • Counts the total number of methods in a type.
  • Number of Children
    • Counts the number of immediate children in a hierarchy.
  • Constructors defined by class
    • Counts the number of constructor a class have.
  • Kolmogorov complexity
    • Measures the computation resources needed to represent an object
  • Number of interfaces implemented by class
    • Counts the number of interfaces a class implements.

Elegant Code

There are 2 main reasons I do not like to consider these metrics when looking at maintainability and code elegance:

  1. Many of them are obsolete metrics when we look at modern software engineering techniques and languages like C# and LINQ. Things like methods per class or number of children do not apply very well to the core concepts of these modern techniques. Just imagine measuring the Weighted Methods Per Class” in a world full of extension methods that ultimately do not depend on the original creator of the object type.
  2. The second reason is that the complexity of the problem never changes. If we have to solve a problem that by nature is a complex one, it doesn’t matter how many constructors of methods we have, or whether the functions do not abide by the Law of Demeter. That is irrelevant if the solution does not solve the problem. The complexity of any given problem is a constant; the only way out is to change the perspective to the problem and abstract as much complexity as possible. When you abstract a complex problem, you end up with a large number of abstractions. Counting them is meaningless, BECAUSE THE PROBLEM IS A COMPLEX ONE and ITS COMPLEXITY WILL NOT CHANGE.

It’ll be extremely hard to come up with good numbers for all of these metrics. Yet, you should know about them and the reason for their existence. Then, when you are having a developer saying “Holy cow, I don’t understand the logic of this method, it has too many if then else, I’m lost.” you’ll say “Aha! that method may have a high CC” and go straight to the problem and solve it. Applying metrics to your source code is not magic, you still own what you write, and ultimately you have to know your business well to have elegant code. But hopefully with the help of these metrics and a couple of tools you'll make your code work like a charm and look neat as a pin.

4 Comments

1 Comment

Installing and using FxCop

This article is supporting the article “Definition of DONE” with regards of code analysis and best practices. FxCop is an application that analyzes managed code assemblies (code that targets the .NET Framework CLR) and reports information about whether the assemblies are abiding by good design guidelines and best practices. Things like architectural design, localization, performance, and security improvements are among the many things the tool will check automatically for you and give you a nice detailed report about its findings. Many of the issues are related to violations of the programming and Microsoft guidelines for writing robust and easily maintainable code using the .NET Framework.

On the home page of the tool says that "FxCop is intended for class library developers". Wait what? Class Library Developers? WTF, whatever...the fact is that the tool is good for any type of managed library, including service libraries, winForms and WPF projects. If it looks like a DLL, smells like a DLL, and it has extension "*.dll" or "*.exe" => FxCop will screen the hell out of it.

I also find FxCop a very useful learning tool for developers who are new to the .NET Framework or who are unfamiliar with the .NET Framework Design Guidelines. There is plenty of documentation online (MSDN) about the Design Guidelines and the different rule sets of best practices Microsoft recommends.

Think of FxCop as a bunch of Unit Tests that examine how your libraries conform to a bunch of best practices

FxCop is one of the tools I use to follow my Scrum definition of "Done" when writing code in .NET. But I also mentioned(wrote) about ReSharper Code Analysis features and how great they are and how we use them too. So the singular question becomes: WHAT IS THE DIFFERENCE BETWEEN FxCop and ReSharper Code Analysis?

The answer is simple:

  • ReSharper Code Analysis analyses your .NET language (C#, VB.NET) source code.
  • FxCop analyses the binaries produced by your source code. FxCop will look for the Common Intermediate Language (CIL) generated by the compiler of your .NET language.

In a sense, FxCop analysis should be done after ReSharper analysis, and will trap everything R# missed. FxCop was initially developed as an internal Microsoft Solution for optimization of new software being produced in house, and ultimately it made is way to the public.

Installing FX Cop

  1. Verify whether or not you already have the Windows SDK 7.1. If you already have the folder "C:Program FilesMicrosoft SDKsWindowsv7.1" on your FS, that means you have it; otherwise you need to install it. If you have it, skip the next steps.
  2. Download the Microsoft Windows SDK for Windows 7 and .NET Framework 4. You can download the web installer from HERE or search in Google for "windows sdk windows 7 web installer .net 4.0". Make sure you are downloading the one for .NET 4.0 and no other one.
  3. Install the SDK with the following settings
  4. Install FX Cop 10 by going to "C:ProgramFilesMicrosoftSDKsWindowsv7.1BinFxCop" and running the setup file.

Using FX Cop

There are 2 ways of using FxCop: standalone or within Visual Studio.

Using FxCop in standalone could not be simpler. It works similar to VS in the sense that you create projects where each project is nothing else than a collection of DLLs that are analyzed together (called targets in FxCop... why not, Microsoft).

To use FxCop as a Visual Studio tool:

  • Open VS and go to Tools->External Tools
  • Add a new tool and call it FxCop with the command line tool pointing to "C:Program Files (x86)Microsoft Fxcop 10.0FxCopCmd.exe"

Now without leaving VS you can either run it as a CMD line tool by going to Tools->FxCop OR you can configure your projects to enable code analysis when you build your project. Doing it the second way will allow you to get the errors and warnings on the same build output window you get compilation errors.

Wrap Up

FxCop is a very solid tool to enforce good practices on your projects and assemblies, and it offers a wide array of configurable features not on the scope of this article. Ultimately you can write your custom rules for FxCop and enforce them as part of your project code analysis. Visual Studio integration is also a great way to maintain your code as you write it.

For more resources on FxCop, check out the FxCop Official Blog.

1 Comment