Viewing entries tagged
scalability

2 Comments

Library Oriented Architecture

Library Oriented Architecture Icon

Library Oriented Architecture may sound like yet another buzzword in the software arena, but one that is not properly documented as of yet. It is not a common term and certainly far from the widely popular SOA or Service Oriented Architecture. Since there is no formal definition on the term LOA, I’m going to take a stab at it:

“Library Oriented Architecture defines the methodology for creating software components in the form of reusable libraries exclusively constrained to a specific domain ontology.”

What does it mean? Well, the part about ontology I’m not going to drill too deeply into that, in a nutshell “don’t confuse a contact with a user, they belong to different domain ontologies” (I wrote a different article about it HERE). In this piece we are going to drill down into the software piece, the separation of concerns, and how to define a practical framework to create things in the right places.

I caught the term LOA for the first time at SuperConf 2012 in Miami. Richard Crowley came to the stage and threw the new term at the crowd and got back a few long faces in return. Richard’s own words, when referring to the Library-Oriented approach:

Package logical components of your application independently – literally as separate gems, eggs, RPMs, or whatever- and maintain them as internal open-source projects… This approach combats the tightly-coupled spaghetti so often lurking in big codebases by giving everything the Right Place in which to exist.

His talk was very solid and I recommend everyone with a hard-core-techie-heart to spare a few minutes on it. You can find his reflections about developing interoperability HERE.

It caught my attention just by the name, because I’ve been saying, “It’s like SOA, but with libraries” for some time now. “It’s like SOA, but with libraries” always came up when I was trying to explain an architectural pattern for building solid systems and frameworks. In general, LOA is just a way of thinking about software engineering. Library Oriented Architecture defines the structuring of libraries for domain ontologies and it has 3 basic principles:

  1. A software library implementation and subject area expertise must be constrained to only 1 ontology domain.
  2. A software library that needs to use concepts and artifacts from a different ontology domain than the one it belongs to, must interface and reuse the library corresponding to that specific ontology domain.
  3. All domain specific software libraries must be maintained and supported with separate lifecycles.

Before we get into the weeds here, we ought to ask ourselves: Why in the world do we need a new term, or a new architecture, or a new anything in software engineering? Well, we don’t, but if you care to write badass apps and software systems that can evolve gracefully with time, this can turn out to be a very good road to take. For those who enjoy bullet points, here are some of the motivations to explore LOA a bit further:

  1. Simplify configuration management of distributed systems.
  2. Build highly reliable software systems because of the inherent properties of the LOA principles.
  3. Increase the Maintainability Index of your distributed systems and integration repositories.
  4. Minimize the risk of high coupling, especially for large systems (read Writing Elegant Code and the Maintainability Index).
  5. Bring developers up to speed orders of magnitude more quickly than a traditional system. Move developers and teams across libraries and domain ontologies and collaborate seamlessly.
  6. Spot bugs and zero-in on the problem almost instantly. There is something to be said about the amount of time a developer spends debugging.
  7. Maximization of the Bus Factor of the software engineering team.
  8. Information Systems build using LOA are technology-independent, and have the ability to entire libraries and domain implementations with localized impact and minimal upstream ripple effect.

Ok, enough reading, let’s see how this materializes in a diagram.

Library Oriented Architecture

Note that this is a specific implementation of Library Oriented Architecture for compiled libraries. You can adapt this to your own needs for scripted languages and even mix it around however you want. For the sake of simplicity, we’ll stick to this sample for now.

The second thing I want to note here is that the diagram is not describing how to implement LOA. It simply lays the foundations for a software engineering practice that happens to follow LOA principles. I’m sharing this because I think is useful and maybe someone will like it enough to offer some suggestions to improve it further.

I want you to notice a couple of things that are illustrated on the diagram:

  1. All 3 principles mentioned above are followed.
  2. The framework favors convention over configuration. Lib names, namespace naming and schema conventions are noted in the last column.
  3. You can clearly dissect the domains vertically and they span all the way from the data storage layer to the actual library implementing the domain specific logic.
  4. A library representing an ontology domain never interfaces with the data-sources, or even data access layer, from any other domain; instead it interfaces directly with the library representing that domain.
  5. Services are merely wrappers of libraries, with minimal or no business logic other than the orchestration of the libraries it needs in order to fulfill its function.
    • This is important because services are always tightly coupling their technology implementations and serialization mechanisms (WCF, ASMX, SOAP, REST, XML, etc.)
    • Part of the service implementation concern is usually dealing with this technology-specific fuzz that is unrelated to the actual business functionality the service is providing.
  6. Exception handing is bubbled up to the lib layer, such that we always get meaningful stack traces when debugging.
  7. Logging, as a cross cutting concern, should be manageable at all levels of the framework, however the domain deems necessary.
  8. If the implementations of the domain-specific libraries share a common framework, such as .NET or Java, they most likely have a superseded library set that extends each framework. For the example illustrated in the diagram, we called them framework infrastructure libraries, or Common Libs for short.

So, now that we have a framework for engineering our software needs, let’s see how to materialize it.

Suppose you are working on the next Foursquare, and it comes to the point where you need services that help you normalize addresses, and work with GIS and coordinates, and a bunch of other geo-location functions that your next-Foursquare needs.

It is hard sometimes to resist the temptation of the ‘just-do-it’ approach, where you ‘just’ create a static class in the same web app, change your Visual Studio web project to make an API call to 3rd party services, and start integrating directly to Google Maps, Bing Maps, etc. Then you ‘just’ add 5 or 6 app settings to your config file for those 3rd party services and boom, you are up and running. This approach is excellent for a POC, but it will not take you too far, and your app is not scalable to the point it could be with a Library Oriented approach.

Let’s see how we do it in LOA. In this world, it takes you maybe a couple of extra clicks, but once you get the hang of it, you can almost do it with your eyes closed.

  1. The Lib Layer
    1. Create a class library for the GEO domain ontology. Call it something like Geo.dll or YourCompany.Geo.dll. This library becomes part of your lib layer.
      • Deciding the boundaries of domain ontology is not an easy task. I recommend you just wing it at first and you’ll get better with time.
      • You need to read a lot about ontology to get an idea of the existential issues and mind-bending philosophical arguments that come out of it. If you feel so adventurous you can read about ontology HERE and HERE. It will help you understand the philosophical nature of reality and being, but this is certainly not necessary to move on. Common sense will do for now.
      • Just don’t go crazy with academia here and follow common sense. If you do, you may find later that you want to split your domain in two, and that is OK. Embrace the chaos and the entropy that comes out of engineering for scalability, it is part of the game.
    2. Define your APIs as methods of a static class, and add a simple[sourcecode language="csharp"]throw new NotImplementedException("TODO");[/sourcecode]
    3. Write your Unit Tests towards your APIs with your assertions (Test Driven Development practice comes handy here).
  2. The DAL Layer
    1. Sometimes your ontology domain does not need to store any data. If that is the case, skip to step 3, else continue reading.
    2. Create a new library for the GEO domain data access layer. Name it according to the convention you previously setup in your company and dev environment. For this example we’ll call it GeoDal.dll
    3. Using your favorite technique, setup the data access classes, mappings and caching strategy.
      • If your persistent data store and your app require caching, this is the place to put it. I say if, because if you choose something like AWS Dynamo DB where 1 MB reads take between 1 and 10 milliseconds, maybe you want to skip cache altogether for your ‘Barbie Closet’ app :)
      • Memcached, APC, redis, AppFabric, your custom solution, whatever works for you here.
      • You can also use your favorite ORM (NHibernate, Entity Framework, etc.) and they already come with some level of caching on them.
      • Bottom line, LOA does not have any principle preventing you from going wild here, therefore your imagination and experience are the limit.
  3. The Data Layer
    1. For this exercise suppose we need to persist Addresses, Coordinates and Google Maps URLs.
    2. I suggest you scope your data entities by your domain ontology. A way we’ve found to work quite nicely is to use named schemas on RDBMS and setup namespace conventions for your NoSql databases.
    3. For the GEO domain schema, we used SQL Server and created a named security schema called [Geo]. The use of named schemas makes it easy to avoid long table names, provides nice visual grouping of entities and a more granular security for your entities.

When it comes to data modeling, another technique I like to use is that of unaltered historical event data. Any ontology domain can be dissected into 3 purpose-specific data models: Configuration Data, Event Data, and Audit Data. They all serve very different purposes and in general we like to keep them in separate schemas with separate security, this way we’re not comingling concerns. Each concern has a different DAL library and potentially they all interface with the library representing the domain at the Lib Level. This post is already way too long, I’ll try to cover some more data modeling strategies in future posts.

Now that we have a clearly separated domain library for our GEO domain, we can decide to wrap with whatever technology specific services we need. This is very convenient because when you want to move your SOA stack to a different technology, you don’t have to re-write your entire domain infrastructure, only the service layer. More importantly, it allows for greater scalability, since it degrades gracefully and plays nicely with different frameworks and technologies. A well implemented Library Oriented Architecture can be said to be technology-agnostic, and that makes it a great SOA enabler.

That’s it for this episode folks. Send me your comments or emails if you are using Library Oriented Architecture, or if you have any suggestions on how to improve the methodology or framework.

Happy coding!

2 Comments

4 Comments

Garbage Collection - Pt 1: Introduction

Garbage collection is a practice in software engineering and computer science that aims at the automation of memory management in a software program. Its origins date all the way back to 1958 and the Lisp programming language implementation (“Recursive Functions of Symbolic Expressions and their Computation by Machine” by John McCarthy), the first to carry such mechanism. The basic problem it tries to solve is that of re-claiming unused memory in a computer running a program. Software programs rely on memory as the main storage agent for variables, network connections and all other data needed by the program to run; this memory needs to be claimed by the software application in order to be used. But memory in a computer is not infinite, thus we need a way to mark the pieces of memory we are not using anymore as “free” again, so other programs can use them after me. There are 2 main ways to do this, a manual resource cleanup, or an automatic resource cleanup. They both have advantages and disadvantages, but this article will focus on the later since it’s the one represented by the Garbage Collectors.

There are plenty of great articles and papers about garbage collection theory. Starting in this article and with other entries following up, I’ll talk about the garbage collection concepts applied to the .NET Garbage Collector specifically and will cover the following areas:

  1. What is the .NET Garbage Collector? (below in this page)
  2. Object’s lifecycle and the GC. Finalizers and diposable objects  (read Part 2)
  3. Generational Collection (read Part 3)
  4. Garbage collection performance implications (article coming soon)
  5. Best practices (article coming soon)

What is the .NET Garbage Collector?

The .NET Framework Garbage Collector (GC from now on) is one of the least understood areas of the framework, while being one of the most extremely sensitive and important parts of it. In a nutshell, the GC is an automatic memory management service that takes care of the resource cleanup for all managed objects in the managed heap.

It takes away from the developer the micro-management of resources required in C++, where you needed to manually delete your variables to free up memory. It is important to note that GC is NOT a language feature, but a framework feature. The Garbage Collector is the VIP superstar in the .NET CLR, with influence to all .NET languages.

I’m convinced those of you that worked before with C or C++ cannot count how many times you forgot to free memory when it is no longer needed or tried to use memory after you've already disposed it. These tedious and repetitive tasks are the cause of the worst bugs any developer can be faced with, since their consequences are typically unpredictable. They are the cause of resource leaks (memory leaks) and object corruption (destabilization), making your application and system perform in erratic ways at random times.

Some garbage collector facts:

  1. It empowers developers to write applications with less worries about having to free memory manually.
  2. The .NET Garbage Collector is a generational collector with 3 generations (article coming soon).
  3. GC reserves memory in segments. The 2 main GC segments are dedicated to maintain 2 heaps:
    • The Small Objects Heap (SOH), where small objects are stored. The first segment of the small object heap is known as the ephemeral segment and it is where Gen 0 and Gen 1 are maintained.
    • The Large Object Heap (LOH), where Gen 2 and large objects are maintained.
  4. When GC is triggered it reclaims objects that are no longer being used, clears their memory, and keeps the memory available for future allocations.
  5. Unmanaged resources are not maintained by the GC, instead they use the traditional Destructors, Dispose and Finalize methods.
  6. The GC also compresses the memory that is in use to reduce the working space needed to maintain the heap
  7. The GC is triggered when:
    • The system is running low on memory.
    • The managed heap allocation threshold is about to be reached.
    • The programmer directly calls the GC.Collect() method in the code. This is not recommended and should be avoided.
  8. The GC executes 1 thread per logical processor and can operate in two modes: Server and Workstation.
    • Server Garbage Collection: For server applications in need of high throughput and scalability.
      • Server mode can manage multiple heaps and run many GC processes in different threads to maximize the physical hardware capabilities of servers.
      • Server GC also supports collections notifications that allows a server farm infrastructure where the router can re-direct work to a different server when it’s notified that the GC is about to perform collections, and then resume in the main server when GC finishes.
      • The downside of this mode is that all managed code is paused while the GC collection is running.
    • Workstation Garbage Collection: For client applications. This is the default mode of the CLR and the GC always runs in the workstation mode unless otherwise specified.
      • On versions of the framework prior to 4.0, GC used 2 methods of workstation collection: Concurrent and Non-Concurrent where concurrent GC collections allowed other managed threads to keep running and executing code during the garbage collection.
      • Starting in 4.0 the concurrent collection method was replaced (upgraded?) by a Background Collection method. The background collection method has some exiting performance implications (good ones) by adding support to run generations of the GC (including gen2) in parallel, something not supported by the concurrent method (article coming soon).

Here is an interview made in 2007 to Patrick Dussud, the architect of the .NET Garbage Collector: Channel 9 Video

Next, I'll cover up the .NET object lifecycle and the role of the GC in the lifecycle of an object.

4 Comments