Almost anyone with some notion about programming or object-oriented techniques will be able to describe what polymorphism is and how to use it; I've discovered recently that very few can tell you how it actually works. While this may not be relevant to the daily work of many software developers, the consequences of not knowing and thus its misuses, affect everyone. We all know how to describe a car and even how to use it, but only mechanics know how to fix it, because they know how IT WORKS.



Polymorphism: What it is

Polymorphism is a characteristic or feature of programming languages. Programming languages either support it or they don’t. Since programming languages fall under the umbrella of sometimes substantially different paradigms, in this article I’m going to concentrate in polymorphism within the scope of Object Oriented Programming.

In OOP polymorphism is considered one of the basic principles and a very distinctive one. Most of the object-oriented languages have polymorphism among its many features. In a nutshell, polymorphism is best seen when the caller of an object with polymorphic implementation is not aware of the exact type the object is. Polymorphism interacts very closely with other features like inheritance and abstraction.


Polymorphism: What it is NOT

Contrary to an overwhelming number of articles found online, things like overloading polymorphism or parametric polymorphism are NOT object-oriented expressions of polymorphism, they are applicable to functional (declarative) programming languages. Signature-based polymorphism, overloading polymorphism, parasitic polymorphism and polymorphism in closures are meaningful on functional languages only (or at best, multi-paradigm languages with functional expressions and/or duck-typing languages, like Python or JavaScript).

Polymorphism is not boxing or unboxing, and is not the same as inheritance; instead and usually polymorphic manifestations occur are a consequence of inheritance, sub-typing and boxing.


Polymorphism: How it works

As its name suggests, polymorphism is the ability of an object to have more than one form, or more properly to look as it is of some other type. The most common manifestation can be seen with sub-typing, let’s see it through an example:

[sourcecode language="csharp"] public class Fruit { public virtual string Description() { return "This is a fruit"; } }

public class FreshApple: Fruit { public override string Description() { return "Fresh Apple"; } }

public class RottenBanana:Fruit { public override string Description() { return "Banana after 10 days"; } } [/sourcecode]

Now we can do something like this:

[sourcecode language="csharp"] Fruit[] fruits = { new FreshApple(), new RottenBanana(), new Fruit() }; foreach (var fruit in fruits) { Console.WriteLine("Fruit -> {0}", fruit.Description()); } Console.ReadLine(); [/sourcecode]

... and that would have the following output:

That’s all well and good and almost anyone will get this far. The question is HOW does this happens, HOW the runtime realizes that the method is must call is the one in the child classes instead of that one in the parent class? How deep this rabbit hole goes? There is no magic element here and you’ll see exactly HOW this occurs and WHY it is important to know about it.

These mappings of function calls to their implementations happen through a mechanism called dispatch tables or virtual tables (vTables for short). Virtual Tables is a feature of programming languages (again, they either support it or they don’t). Virtual Tables are present mostly on languages that support dynamic dispatch (C++, C#, Objective C, Java), meaning they can bound to function pointers or objects at run-time, and they can abstract from the actual implementation of the method/function/object until the very moment they are going to use it.

The vTable itself is nothing more than a data structure, no different from a Stack or a Hashtable for their purpose; it just happen to be the one used for the dynamic dispatch feature in programming languages. Some languages offer a different structure called the Binary Tree Dispatch as an alternative to vTables. As any data structure, they both have pros and cons when it comes to measure performance and cost. The point is that whether it is a vTable or bTree Dispatch, dynamic dispatching is bound using a data structure that is carried over with objects and functions to support this feature. Yes, I said “carried over”.

vTables are a hidden variable of objects. In .NET for example, all classes and structs inherit from Object and Object already has virtual functions like "ToString()" and "GetHashCode()", so every single object you create will have a hidden vTable private structure that is always initialized behind the scenes in the constructor call of every object. The purpose of these vTables being created all over the place is exactly to map the correct functions for polymorphic objects and functions. You can use Reflector to peek over an object at runtime (IL) and you'll find its vTable in the first memory location of each object's memory segment. The IL functions call and callvirt (polymorphism, yay!) will be used to call non-virtual and virtual methods respectively. There is simply no way of accessing an object's vTable directly from C#; this was done on purpose to minimize the security implications of it among other reasons. In C++, hmm...

[sourcecode language="cpp"] long *vptr = (long *) &obj; long *vtable = (long *)*vptr; [/sourcecode]

So the vTable of each object will hold a reference to each one of its virtual functions. Think of it, structurally, as a list of function pointers sorted in the same order they are declared. This serves its purpose on inheritance and polymorphism when a child object gets initialized regardless being assigned to a variable of type parent, the vTable of that object will be the one corresponding the actual inheritor. When virtual functions get called in the variable it will find the correct function in the child objects (actual object type) thanks to the function pointers on their respective vTables.

Virtual tables, inheritance and polymorphism in general have been criticized for their overhead in program execution. That is why one of the object-oriented programming principles is "Favor Object Composition Over Inheritance". Non-virtual functions never require the overhead of vTables making them naturally faster than virtual functions. Applications and systems with a deep inheritance architecture tend to spend a considerably large amount of their execution time just trying to figure out the root path of their virtual functions. This can become quite a performance tax if used without care, specially in older CPU architectures. Most modern compilers will have a trick or two under their sleeves to attempt to resolve virtual function calls at compile time without the need of vTables, but dynamic binding will always need of these creatures to resolve their destination paths.

And now you know how polymorphism works. Happy coding!

1 Comment