Faced with the ongoing confusion over the term ‘Big Data,’ here’s a handy – and somewhat cynical – guide to some of the key definitions that you might see out there.
The first thing to note is that – despite what Wikipedia says – everybody in the industry generally agrees that Big Data isn’t just about having more data (since that’s just inevitable, and boring).
Big Data as the three Vs: Volume, Velocity, and Variety. This is the most venerable and well-known definition, first coined by Doug Laney of Gartner over twelve years ago. Since then, many others have tried to take it to 11 with additional Vs including Validity, Veracity, Value, and Visibility.
Why did a 12-year old term suddenly zoom into the spotlight? It wasn’t simply because we do indeed now have a lot more volume, velocity, and variety than a decade ago. Instead, it was fueled by new technology, and in particular the fast rise of open source technologies such as Hadoop and other NoSQL ways of storing and manipulating data.
The users of these new tools needed a term that differentiated them from previous technologies, and–somehow–ended up settling on the woefully inadequate term Big Data. If you go to a big data conference, you can be assured that sessions featuring relational databases–no matter how many Vs they boast–will be in the minority.
The problem with big-data-as-technology is that (a) it’s vague enough that every vendor in the industry jumped in to claim it for themselves and (b) everybody ‘knew’ that they were supposed to elevate the debate and talk about something more business-y and useful.
Here are two good attempts to help organizations understand why Big Data now is different from mere big data in the past:
This is another business-y approach that divides the world by intent and timing rather than the type of data, courtesy of SAP’s Steve Lucas.