Comment

Understanding different kinds of Data Reporting. Why and how is done.

In business, the words reporting, data reporting, audit reports, big data, and analytics are used very loosely sometimes. So, I thought I'd put in writing some thoughts about reporting as a reference point for anyone scratching their heads.

We've all heard the parrots say "DATA is King", but how do we get "DATA"?

Data for almost ANY application can be categorized into one of 3 categories:

  1. Operational Data (aka. Configuration Data)
  2. Event Data (aka. Business Event Data)
  3. Audit Data (aka. Tech Data)

The patterns to display and materialize these data sources generally follow a very ordered and logical process where:

  1. Operational Data is shown in small digestible chunks so a human being can easily analyze and modify what is presented to him/her at the moment.
    • Most often than not, it takes the form of a UI in a webpage or a native app.
    • Some other times, the configuration data is shown in printable format. More often than not, these take the form of counters and aggregate live data. A common use case is to keep counters of live metrics,  KPI alerts, and application heartbeats.
    • Operational data is modeled to support full CRUD operations.
    • Regardless of the format, this kind of data is never presented with thousands of rows or configuration entries... and the reason for that is: is not human readable. Just imagine your reaction if every time you do right-click on a file in Windows you get a menu with 2 thousand lines; not very useful.
    • Last but not least, configuration data is ALWAYS live data, it NEVER holds historical significance.
  1. Event Data exists with the sole purpose of fulfilling business intelligenceand analytics.
    • Event Data is always decentralized, meaning there is no magic bullet that can fulfill all applications, since the nature of the events being logged is usually tightly coupled with the actions that occur within applications, and those have some semantic meaning around the business of the application.
    • Event Data usually take the form of a separate database, or a different schema within the same database where events are logged in an asynchronous way so they don't interrupt the main flow of the application in question.
    • Event Data is WRITE-ONLY data. There are NO UPDATES, and NO DELETES, only writes to drop the events when they occur and reads to generate the reports. This usually makes event databases grow very rapidly depending on the number of events being reported.
    • Event data generally follows a star data model with multiple correlated tables holding DUPLICATED data from the operational data, with the additional time stamp on each event dropped.
    • Depending on the needs and performance boundaries allowed by the reporting tool, reports can run directly from the Event database, or from a Data Warehouse (DW). The DW is nothing more than a centralized database or collections of databases that are highly optimized to run reports as fast and efficient as possible, so business folks can run analytics and complex searches on the data from different perspectives.
    • Reports running from a Data Warehouse repository usually perform better and faster, since the tables are flattened to match exactly the dimensions of the reports. The data is said to be 'curated' when is transformed and loaded into the DW, since the schema of the data after ETLs are run, can be totally different from the way it was designed on for the operation of the application (the Operational Data)
    • If you run your reports from Event database, your queries will likely have a lot of joins; whereas if you run them from the DW, very few joins will be necessary (because the data is curated to match those reporting needs, and perform faster).
    • Last but not least, event data ALWAYS holds historical significance and ALWAYS report on BUSINESS data.
  1. Audit Dataexists with the sole purpose of reporting on Technology events. That is everything that is NOT BUSINESS data, but relates to the performance, security, functionality, etc, from the application.
    • Audit Data usually follows a very similar path to the Event data in terms of implementation and consumption (like having an Audit database with Star schema, and moving to the DW, blah, blah, blah).
    • The main differentiator between Audit Data and Business Data is that one relates to the business events, while the other relates to technology events.
    • Just like events, audit data ALWAYS holds historical significance and ALWAYS reports TECHNOLOGY-DRIVEN data.

When asked to generate reports and such, ask yourself what type of report it is and don't be afraid to ask questions and challenge the true business need for the 'report'. If someone is looking for operational (live) data, and there are UI/Screens already available that show such data for configuration or anything else, then ask yourself if you want to go on a limb and create a lot of overhead duplicating the effort, or simply adding a "Save as PDF" button or link in the screen will do the trick.

The last piece is about BIG DATA. That's the new hot word around IT these days. Big Data is nothing more than a group of very large data sets (in the order or exabytes) whose relationships and business semantics are so broad and messy, that is very difficult to drive analytics using the traditional DBA tools found in a DW environment. You can think of it like the next level to data warehousing.

When a DW gets big enough, or when massive amounts of data from different DW needs business analysis, then it becomes BIG DATA. The challenges are different and harder when you are trying to put order to this kind of mess, but once you do, a lot of goodies can come out of it, including predictive analytics based on past performance and events. Predictive business analytic engines are one of the hardest things to do, and it is a valued asset, especially for large corporations that have been hoarding data for decades, and now want to use it as a competitive advantage to plan ahead.

Good luck with your reporting adventures!

Comment

8 Comments

My Science Tattoo: Why I did it, and what it means?

I got my tattoo yesterday, here is how it looks like.

When a close friend, co-worker, or family looks at it for a minute, I get two immediate reactions:

  1. That's a badass tattoo! But what is it? What it means?
  2. Why did you do it? If you change your mind later on what you inked then [blablabla]...

I've always liked tattoos and for some time wanted to get something done that was truly meaningful to me.  I was inspired and got my last bit of courage from the Science Tattoo Emporium Discover Magazine, and decided to materialize it.

Here is what I incorporated in the tattoo and why:

Pi (Π)

Why?: There are the three math constants that blow everybody's mind once they realize how closely connected they are to the universe. They are Pi (Π), Euler(e), and Phi(φ). For my tattoo I decided to go with Pi, since is a universally known constant that doesn't require much introduction, yet is a totally badass one. Here's why:

  1. It's one of the oldest known constants in mathematics. It dates back to 2589–2566 BC, when the Egyptians built the Great Pyramid of Giza, whose ratio between the volume and height is exactly 2Π.
  2. Pi is found everywhere we see a circle shape, since you calculate Pi by dividing the circle's circumference by its diameter (Π=C/d). Many people learn this by memory in school, but never truly get to understand what it means, or even how to interpret what Π represents in the context of a circle. The circumference of a circle is slightly more than 3 times its diameter, and number is what Π represents. Wikipedia has a good animation of Pi 'unrolled'
  3. Pi is both irrational and transcendental, and passes all tests for normality and statistical randomness. These properties make it the only known constant with universal distribution of its digits that cannot be represented by the fraction of any two numbers, or polynomial with rational coefficients.
  4. Pi is part of the most beautiful mathematical equation of all time: Euler's identity
  5. Pi is heavily used in formulas from other branches of science, such as cosmology, number theory, statistics, fractals, thermodynamics, mechanics and electromagnetism.

Carbon Atom

Why?: Carbon element is present in all forms of life. It the most abundant element by mass on the human body, only after Oxygen. Every organic compound is takes form, mass and strength thanks to carbon bonds, making carbon the chemical foundation of all known life.

Fibonacci Tiling

Why?: Fibonacci tiling is the composition of squares whose sides are successive Fibonacci numbers in length (1,1,2,3,5,8, etc). The series F(n) = F(n-1) + F(n-2) is a mysterious one in the world of mathematics. Its exemplar simplicity makes it easy to be understood by almost every body, yet it's found in the most complex settings in science. Here are a few that will blow your mind:

  1. The Fibonacci numbers are also an example of a complete sequence. This means that every positive integer can be written as a sum of Fibonacci numbers, where any one number is used once at most.
  2. Found in the sum of shallow diagonals in the Pascal Triangle
  3. In numbers represented on base 2, or binary, they are spotted quite often. The samples below and the fact that computer science and electrical engineering use the binary system more than I can count every second, makes this sequence far more interesting  and mysterious to me. Check this out:
    • The number of binary strings of length n without an even number of consecutive zeroes or ones is  twice  Fn.
    • The number of binary strings of length n without an odd number of consecutive ones is  Fn+1.
    • Also, the number of binary strings of length n without consecutive ones (1) is Fn+2.
  4. The Fibonacci cube is an undirected graph with a Fibonacci number of nodes that has been proposed as a network topology for parallel computing.
  5. In music, Fibonacci numbers are used to determine tunings, and, as in visual art, also to determine the length or size of content in the formal elements.

Golden Ratio (φ)

Why?: The Golden Ratio is yet another mystical mathematical rarity. Two numbers {a,b} are in the golden ratio, when the sum of the quantities divided by the larger of the two is equals to the larger divided by the smaller.

  1. With {a,b}, with a>b; {a,b} are in the golden ratio <=> ((a+b)/a) = (a/b)
  2. To make it even cooler, it turns out that it doesn't matter what numbers a and b are, if they are in the golden ratio => a/b = φ = "Phi", a mathematical constant calculated by the equation (1+√5)/2, and dating back to Plato's time.
  3. Mathematicians throughout history have obsessed with this constant, including: Phidias, Plato, Euclid, Fibonacci, Pacioli (coining it the "divine proportion"), Da Vinci, Kepler and Ohm.
  4. Fibonacci series have a close relation to the Golden Ratio, so much so, that Kepler observed how consecutive numbers in the Fibonacci series converge to the Golden Ratio. This is both sick and awesome, like some sort of mind game the universe is playing on us:
  5. In art, it is a proportion that dictates aesthetical pleasures to the human eye. It is found in master pieces such as Leonardo Da Vinci's "Mona Lisa" and Salvador Dali's "Leda Atomica"
  6. In Nature, the Golden Spiral (one on the center of the tattoo) is found in the shapes of many living organisms, such as:
    1. The arrangement of branches along the stems of plants and the veins in leaves;
    2. The skeletons and adjacent bone lengths in animals;
    3. The proportions of chemical compounds and the geometry of crystals
    4. Almost all human body proportions between adjacent organs, organ systems, bones, and body parts exist in complete harmony following the Golden Ratio.
    5. The groves of the human DNA's double helix.
    6. The human perception of beauty is based on how closed our facial expressions align with the Golden Ratio.
  7. From cloud formation, to river networks, to craters, lighting bolts and coast lines, the Golden Ratio has a recurring presence.
  8. There are too many manifestations of the Golden Ratio in science to mention here; but one of the most interesting ones is in chaos theory and the fractal sets.
  9. With all the above, you feel there is something magical, mysterious and universal about Phi and the Golden Ratio, that makes it a special science jewel.

Silicon

Why?: Silicon is a chemical element with atomic number 14, that makes about 28% of Earth's crust. Is also one of the most used materials in the production of semiconductors and integrated circuits used in computers every day. That's where the "Silicon Valley" name comes from. The reasons for choosing this element as part of the tattoo are obvious to those that know me over the years. I'm the ultimate nerd, and I'll be wearing those circuits in my arm with proud.

I think is pretty safe to say these science elements will not change anytime soon, so my tattoo will remain meaningful to me an the world in the foreseeable future ;)

I'm super happy how it turned out... and I'll be sporting my science tattoo for the rest of my life with pride.

There is strong, and then, there is nerd-strong! :P

8 Comments

Comment

A Message to "Enterprise Architects"

Dear Enterprise Architects: The Web was built on a RESTful architecture. Stop trying to invent fancier systems for distributed computing (wink to the herds of ESBs out there).

There isn't a bigger, more proven, more performant, more scalable specimen of distributed computing than the Interweb.

That's right, less is more. Fuck everything else for distributed computing and messaging, even for the "Enterprise".  You need to un-train yourself, stop being a baby and embrace it already, it is the year 2012.

And relax, your Enterprise applications will scale just fine. Now have some cheez,

Best,

Michel Triana

 

World Wide Web - Enterprise Architects

Comment

Comment

Mercurial's code stairway to heaven

Joe (@JoeWurz) and I (@MichelTriana) were doing a lot of crazy shit with code and database scripts today and keeping everything in our neat Mercurial repos, when I came across something I've never seen before in the Mercurial electrified DAG.

My theory is that we both change the same code and the same amounts of bytes, then I did a commit in and pushed, then he continued to work on his local repo adding more stuff, when it was time to push, he go pulled first, got a conflict and then resolved it by choosing his local copy, which had a part with the same change as the previous one I pushed... WTF, that's a crazy case scenario. If  my theory is true thought, then little Mercurial is kinda smart in realizing (and reporting it in the graph) that some code went to heaven.

Comment

Comment

Driving and your health [INFOGRAPHIC]

Visual.ly published this infographic called 'The Killer Commute', and a few things popped from the data about driving and your health:

  1. You are 40% more likely to divorce your spouse if you commute 45 minutes or more per day.
  2. Your risk of heart attack TRIPLES if you commute regularly for over 20 minutes daily over the same route.

Here is the full infographic from Visual.ly

Comment

Comment

A Class on "Employee Equity" [Video]

This is a class given by Fred Wilson from Union Square Ventures. Very insightful as to how the VC world works and how to structure employee equity for your startup. Fred's blog is http://www.avc.com. There is a section titled MBA Mondays where he usually gives quite interesting talks and publishes posts to help in the money matters and transparency for startups and VCs.

This particular class I found it on the SkillShare's LiveStream. You can watch the full video by clicking on the image below

Click to Play

Comment

2 Comments

Library Oriented Architecture

Library Oriented Architecture Icon

Library Oriented Architecture may sound like yet another buzzword in the software arena, but one that is not properly documented as of yet. It is not a common term and certainly far from the widely popular SOA or Service Oriented Architecture. Since there is no formal definition on the term LOA, I’m going to take a stab at it:

“Library Oriented Architecture defines the methodology for creating software components in the form of reusable libraries exclusively constrained to a specific domain ontology.”

What does it mean? Well, the part about ontology I’m not going to drill too deeply into that, in a nutshell “don’t confuse a contact with a user, they belong to different domain ontologies” (I wrote a different article about it HERE). In this piece we are going to drill down into the software piece, the separation of concerns, and how to define a practical framework to create things in the right places.

I caught the term LOA for the first time at SuperConf 2012 in Miami. Richard Crowley came to the stage and threw the new term at the crowd and got back a few long faces in return. Richard’s own words, when referring to the Library-Oriented approach:

Package logical components of your application independently – literally as separate gems, eggs, RPMs, or whatever- and maintain them as internal open-source projects… This approach combats the tightly-coupled spaghetti so often lurking in big codebases by giving everything the Right Place in which to exist.

His talk was very solid and I recommend everyone with a hard-core-techie-heart to spare a few minutes on it. You can find his reflections about developing interoperability HERE.

It caught my attention just by the name, because I’ve been saying, “It’s like SOA, but with libraries” for some time now. “It’s like SOA, but with libraries” always came up when I was trying to explain an architectural pattern for building solid systems and frameworks. In general, LOA is just a way of thinking about software engineering. Library Oriented Architecture defines the structuring of libraries for domain ontologies and it has 3 basic principles:

  1. A software library implementation and subject area expertise must be constrained to only 1 ontology domain.
  2. A software library that needs to use concepts and artifacts from a different ontology domain than the one it belongs to, must interface and reuse the library corresponding to that specific ontology domain.
  3. All domain specific software libraries must be maintained and supported with separate lifecycles.

Before we get into the weeds here, we ought to ask ourselves: Why in the world do we need a new term, or a new architecture, or a new anything in software engineering? Well, we don’t, but if you care to write badass apps and software systems that can evolve gracefully with time, this can turn out to be a very good road to take. For those who enjoy bullet points, here are some of the motivations to explore LOA a bit further:

  1. Simplify configuration management of distributed systems.
  2. Build highly reliable software systems because of the inherent properties of the LOA principles.
  3. Increase the Maintainability Index of your distributed systems and integration repositories.
  4. Minimize the risk of high coupling, especially for large systems (read Writing Elegant Code and the Maintainability Index).
  5. Bring developers up to speed orders of magnitude more quickly than a traditional system. Move developers and teams across libraries and domain ontologies and collaborate seamlessly.
  6. Spot bugs and zero-in on the problem almost instantly. There is something to be said about the amount of time a developer spends debugging.
  7. Maximization of the Bus Factor of the software engineering team.
  8. Information Systems build using LOA are technology-independent, and have the ability to entire libraries and domain implementations with localized impact and minimal upstream ripple effect.

Ok, enough reading, let’s see how this materializes in a diagram.

Library Oriented Architecture

Note that this is a specific implementation of Library Oriented Architecture for compiled libraries. You can adapt this to your own needs for scripted languages and even mix it around however you want. For the sake of simplicity, we’ll stick to this sample for now.

The second thing I want to note here is that the diagram is not describing how to implement LOA. It simply lays the foundations for a software engineering practice that happens to follow LOA principles. I’m sharing this because I think is useful and maybe someone will like it enough to offer some suggestions to improve it further.

I want you to notice a couple of things that are illustrated on the diagram:

  1. All 3 principles mentioned above are followed.
  2. The framework favors convention over configuration. Lib names, namespace naming and schema conventions are noted in the last column.
  3. You can clearly dissect the domains vertically and they span all the way from the data storage layer to the actual library implementing the domain specific logic.
  4. A library representing an ontology domain never interfaces with the data-sources, or even data access layer, from any other domain; instead it interfaces directly with the library representing that domain.
  5. Services are merely wrappers of libraries, with minimal or no business logic other than the orchestration of the libraries it needs in order to fulfill its function.
    • This is important because services are always tightly coupling their technology implementations and serialization mechanisms (WCF, ASMX, SOAP, REST, XML, etc.)
    • Part of the service implementation concern is usually dealing with this technology-specific fuzz that is unrelated to the actual business functionality the service is providing.
  6. Exception handing is bubbled up to the lib layer, such that we always get meaningful stack traces when debugging.
  7. Logging, as a cross cutting concern, should be manageable at all levels of the framework, however the domain deems necessary.
  8. If the implementations of the domain-specific libraries share a common framework, such as .NET or Java, they most likely have a superseded library set that extends each framework. For the example illustrated in the diagram, we called them framework infrastructure libraries, or Common Libs for short.

So, now that we have a framework for engineering our software needs, let’s see how to materialize it.

Suppose you are working on the next Foursquare, and it comes to the point where you need services that help you normalize addresses, and work with GIS and coordinates, and a bunch of other geo-location functions that your next-Foursquare needs.

It is hard sometimes to resist the temptation of the ‘just-do-it’ approach, where you ‘just’ create a static class in the same web app, change your Visual Studio web project to make an API call to 3rd party services, and start integrating directly to Google Maps, Bing Maps, etc. Then you ‘just’ add 5 or 6 app settings to your config file for those 3rd party services and boom, you are up and running. This approach is excellent for a POC, but it will not take you too far, and your app is not scalable to the point it could be with a Library Oriented approach.

Let’s see how we do it in LOA. In this world, it takes you maybe a couple of extra clicks, but once you get the hang of it, you can almost do it with your eyes closed.

  1. The Lib Layer
    1. Create a class library for the GEO domain ontology. Call it something like Geo.dll or YourCompany.Geo.dll. This library becomes part of your lib layer.
      • Deciding the boundaries of domain ontology is not an easy task. I recommend you just wing it at first and you’ll get better with time.
      • You need to read a lot about ontology to get an idea of the existential issues and mind-bending philosophical arguments that come out of it. If you feel so adventurous you can read about ontology HERE and HERE. It will help you understand the philosophical nature of reality and being, but this is certainly not necessary to move on. Common sense will do for now.
      • Just don’t go crazy with academia here and follow common sense. If you do, you may find later that you want to split your domain in two, and that is OK. Embrace the chaos and the entropy that comes out of engineering for scalability, it is part of the game.
    2. Define your APIs as methods of a static class, and add a simple[sourcecode language="csharp"]throw new NotImplementedException("TODO");[/sourcecode]
    3. Write your Unit Tests towards your APIs with your assertions (Test Driven Development practice comes handy here).
  2. The DAL Layer
    1. Sometimes your ontology domain does not need to store any data. If that is the case, skip to step 3, else continue reading.
    2. Create a new library for the GEO domain data access layer. Name it according to the convention you previously setup in your company and dev environment. For this example we’ll call it GeoDal.dll
    3. Using your favorite technique, setup the data access classes, mappings and caching strategy.
      • If your persistent data store and your app require caching, this is the place to put it. I say if, because if you choose something like AWS Dynamo DB where 1 MB reads take between 1 and 10 milliseconds, maybe you want to skip cache altogether for your ‘Barbie Closet’ app :)
      • Memcached, APC, redis, AppFabric, your custom solution, whatever works for you here.
      • You can also use your favorite ORM (NHibernate, Entity Framework, etc.) and they already come with some level of caching on them.
      • Bottom line, LOA does not have any principle preventing you from going wild here, therefore your imagination and experience are the limit.
  3. The Data Layer
    1. For this exercise suppose we need to persist Addresses, Coordinates and Google Maps URLs.
    2. I suggest you scope your data entities by your domain ontology. A way we’ve found to work quite nicely is to use named schemas on RDBMS and setup namespace conventions for your NoSql databases.
    3. For the GEO domain schema, we used SQL Server and created a named security schema called [Geo]. The use of named schemas makes it easy to avoid long table names, provides nice visual grouping of entities and a more granular security for your entities.

When it comes to data modeling, another technique I like to use is that of unaltered historical event data. Any ontology domain can be dissected into 3 purpose-specific data models: Configuration Data, Event Data, and Audit Data. They all serve very different purposes and in general we like to keep them in separate schemas with separate security, this way we’re not comingling concerns. Each concern has a different DAL library and potentially they all interface with the library representing the domain at the Lib Level. This post is already way too long, I’ll try to cover some more data modeling strategies in future posts.

Now that we have a clearly separated domain library for our GEO domain, we can decide to wrap with whatever technology specific services we need. This is very convenient because when you want to move your SOA stack to a different technology, you don’t have to re-write your entire domain infrastructure, only the service layer. More importantly, it allows for greater scalability, since it degrades gracefully and plays nicely with different frameworks and technologies. A well implemented Library Oriented Architecture can be said to be technology-agnostic, and that makes it a great SOA enabler.

That’s it for this episode folks. Send me your comments or emails if you are using Library Oriented Architecture, or if you have any suggestions on how to improve the methodology or framework.

Happy coding!

2 Comments

Comment

The Myth of the Genius Programmer

http://www.youtube.com/watch?v=0SARbwvhupQ&list=PLCB5CF9838389D7F6&feature=view_all

From Google I/O 2009, here are Brian Fitzpatrick, Ben Collins-Sussman about the fears of programmers and the fear of looking 'stupid'.

Comment