November
10
Moore’s law tells us that computational power doubles every 2 years and hence computing and storage costs are expected to be lower than they have ever been - all the time.
Big Data is a computing paradigm that was created specifically to deal with the challenges of very large amounts of sprawling being created; where parallel computing is utilised to handle that the desparate volume, velocity and variety of data at the expense of certain aspects such as atomic integrity . (This isn’t a blog about Big Data so we’ll move on quickly from there)
This perception of ever growing compute power as well as cheaper and faster storage and access has led some corporations to be careless with how they treat their data. (At a 30,000ft level, this can be summed up by why sort, just search - the old practice of catagorising data can’t keep up).
However as human interaction behaviours and organisational value chains transition more towards the digital realm, the magnitude the data challenge (i.e. how to make use and extract value from it) is a continuing growing one.
A paper by IDC (International Data Corporate) estimates that global data usage will grow from 33 zetabybes to 175 zetabytes between 2018 to 2025. This represents a ~27% per annum growth rate. (not quite double but close).
So in short, organisations are not and never will be entirely free of putting some information or data management discipline into their business intelligence and decision-making capabilities.
Large organisations contain a multitude of source operational / transaction systems whose results need to be rolled up into group-level summaries both for statutory / regulatory reporting purposes as well as business performance reporting. Allthough using the same underlying data, reports for different purposes may roll up through different hierarchies.
For example, BPM may want a product profitability view with product categorisation definitions different to regulatory definitions.
To date, there has been 2 predominant approaches to data warehousing. Bill Inmon’s top down approach and Ralph Kimball’s bottom up approach. These approaches have the same goal; find out what’s going on in your organisation.
You can think of Kimball’s approach as asking divisions to submit to the head office a set of metrics and leaving up to them to collate that information. The benefits of this approach include:
Key downside is the potential and often occurence of overlapping data marts with duplicate data that follow different definitions causing data integrity issues. Furthermore it is sometimes difficult to prescribe consistent metric and data definitions across federated data warehouse and integration teams. Difficulty in determining the number of unique customers an organisation has is a case-in-point.
The Inmon approach by contrast seeks to take direct feeds that are subject oriented; that is to take all customer and product data from each data source and integrate these subject areas together)
Furthermore, the Enterprise Data Warehouse is time-variant and non-volatile; it takes periodic snapshots so point in time changes can be compared and the data is typically read-only.
The primary benefits of this approach is that large amounts of data can be brought into an enterprise repository (an Enterprise Data Warehouse / EDW) and the downstream users (typically Finance / Risk / Strategy folk) can create a broad variety of metrics consistently and broadly applicable without having to go back to the transactional systems.
The downside of this approach is:
These two approaches have essentially one key difference; to pre-aggregate or not to pre-aggregate…that is the question.
Unfortunately, I would say that both of these approaches and indeed the notion of a standalone traditional data warehouseis inadeqaute. The shortcomings include:
In the next edition we will cover new data integration patterns and platform architectures.
Check back here shortly for the next edition or get in touch with our team to find out more.