In business, the words reporting, data reporting, audit reports, big data, and analytics are used very loosely sometimes. So, I thought I'd put in writing some thoughts about reporting as a reference point for anyone scratching their heads.

We've all heard the parrots say "DATA is King", but how do we get "DATA"?

Data for almost ANY application can be categorized into one of 3 categories:

  1. Operational Data (aka. Configuration Data)
  2. Event Data (aka. Business Event Data)
  3. Audit Data (aka. Tech Data)

The patterns to display and materialize these data sources generally follow a very ordered and logical process where:

  1. Operational Data is shown in small digestible chunks so a human being can easily analyze and modify what is presented to him/her at the moment.
    • Most often than not, it takes the form of a UI in a webpage or a native app.
    • Some other times, the configuration data is shown in printable format. More often than not, these take the form of counters and aggregate live data. A common use case is to keep counters of live metrics,  KPI alerts, and application heartbeats.
    • Operational data is modeled to support full CRUD operations.
    • Regardless of the format, this kind of data is never presented with thousands of rows or configuration entries... and the reason for that is: is not human readable. Just imagine your reaction if every time you do right-click on a file in Windows you get a menu with 2 thousand lines; not very useful.
    • Last but not least, configuration data is ALWAYS live data, it NEVER holds historical significance.
  1. Event Data exists with the sole purpose of fulfilling business intelligenceand analytics.
    • Event Data is always decentralized, meaning there is no magic bullet that can fulfill all applications, since the nature of the events being logged is usually tightly coupled with the actions that occur within applications, and those have some semantic meaning around the business of the application.
    • Event Data usually take the form of a separate database, or a different schema within the same database where events are logged in an asynchronous way so they don't interrupt the main flow of the application in question.
    • Event Data is WRITE-ONLY data. There are NO UPDATES, and NO DELETES, only writes to drop the events when they occur and reads to generate the reports. This usually makes event databases grow very rapidly depending on the number of events being reported.
    • Event data generally follows a star data model with multiple correlated tables holding DUPLICATED data from the operational data, with the additional time stamp on each event dropped.
    • Depending on the needs and performance boundaries allowed by the reporting tool, reports can run directly from the Event database, or from a Data Warehouse (DW). The DW is nothing more than a centralized database or collections of databases that are highly optimized to run reports as fast and efficient as possible, so business folks can run analytics and complex searches on the data from different perspectives.
    • Reports running from a Data Warehouse repository usually perform better and faster, since the tables are flattened to match exactly the dimensions of the reports. The data is said to be 'curated' when is transformed and loaded into the DW, since the schema of the data after ETLs are run, can be totally different from the way it was designed on for the operation of the application (the Operational Data)
    • If you run your reports from Event database, your queries will likely have a lot of joins; whereas if you run them from the DW, very few joins will be necessary (because the data is curated to match those reporting needs, and perform faster).
    • Last but not least, event data ALWAYS holds historical significance and ALWAYS report on BUSINESS data.
  1. Audit Dataexists with the sole purpose of reporting on Technology events. That is everything that is NOT BUSINESS data, but relates to the performance, security, functionality, etc, from the application.
    • Audit Data usually follows a very similar path to the Event data in terms of implementation and consumption (like having an Audit database with Star schema, and moving to the DW, blah, blah, blah).
    • The main differentiator between Audit Data and Business Data is that one relates to the business events, while the other relates to technology events.
    • Just like events, audit data ALWAYS holds historical significance and ALWAYS reports TECHNOLOGY-DRIVEN data.

When asked to generate reports and such, ask yourself what type of report it is and don't be afraid to ask questions and challenge the true business need for the 'report'. If someone is looking for operational (live) data, and there are UI/Screens already available that show such data for configuration or anything else, then ask yourself if you want to go on a limb and create a lot of overhead duplicating the effort, or simply adding a "Save as PDF" button or link in the screen will do the trick.

The last piece is about BIG DATA. That's the new hot word around IT these days. Big Data is nothing more than a group of very large data sets (in the order or exabytes) whose relationships and business semantics are so broad and messy, that is very difficult to drive analytics using the traditional DBA tools found in a DW environment. You can think of it like the next level to data warehousing.

When a DW gets big enough, or when massive amounts of data from different DW needs business analysis, then it becomes BIG DATA. The challenges are different and harder when you are trying to put order to this kind of mess, but once you do, a lot of goodies can come out of it, including predictive analytics based on past performance and events. Predictive business analytic engines are one of the hardest things to do, and it is a valued asset, especially for large corporations that have been hoarding data for decades, and now want to use it as a competitive advantage to plan ahead.

Good luck with your reporting adventures!

Comment