Some of the techniques that have made Business Intelligence so successful over the past couple decades are its Achilles heel. This post contains a few examples of how some of these techniques that we hold dear may be holding us back.
Earlier this year I wrote a post discussing how the perspective of IT needs to shift from a foundation in scarcity to one based on abundance. Exponential growth (so prevalent in technology) is something I’ve thought about for a while but I ran into a discussion this week that really brought home how deep a change in thinking is required. I worked with business intelligence activities extensively at the start of this century. While we were working with what seemed like large quantities of data at the time, the goal was to always have a controlled amount of very clean data. This allowed our limited algorithms and computing capabilities to make the most sense from the data. We gathered the raw data in an operational data store, loaded it into our star schema and started our analysis; dumping the raw data before the next load. It was effective for its time but clearly based on a ‘scarcity perspective.’
With big data, you want to start with as much wild and untamed data as you can get. There is an abundance of storage, computing and algorithms that can be brought to bear on it. With additional data, you have more observations and a greater understanding of the context in the real world, not some purified fantasy world. Sure it can be messy but you also have the opportunity to see those anomalous clusters that used to be sanitized out of existence. The fact that your ‘nice’ normal distribution actually appears to be bimodal when you look at the raw unfiltered data, points to something unexpected. That second little bump in the road may be the most important element of your contextual understanding. After all, people don’t make decisions of data, they make decisions off the context the data describes.