Big data has been making a big splash for almost a decade now, but most people are still uncertain about what “big data” actually is, how it works, and the technologies that lay behind its magic. To answer these questions, I am publishing an extensive, 6-part blog series on big data technologies and what it means to you.
To illustrate the power of big data in business, let’s look at a popular example from the retail giant Walmart. According to WalmartLabs’ Stephen O’Sullivan, back in 2013 Walmart made a pivotal move by collating all of its data resources onto a single, 250-node Hadoop cluster that acts as a big data mega store. Utilizing several types of traditional and big data technologies, this centralized big data hub processes new data on a scale of terabytes per day, as well as historical data on a scale of petabytes per day. Similarly, as part of WalmartLabs’ efforts to advance itself via big data analytics, it developed a new semantic search engine called Polaris. With respect to the search engine’s effectiveness, Walmart’s Corporate website reported that “Walmart.com has already seen an approximate 10-15 percent increase in shoppers completing a purchase after searching for a product using the new search engine”.
While this may seem like something old, we continue to learn about Big Data through successful trial and error followed by production implementation. This first post in the series will cover how “big data” is defined and some of the technologies that are commonly used for handling it.
Any introduction to big data would be incomplete without discussing the most common 3-Vs talked about with Big Data. Big data is data that has volume, variety or velocity such that it can’t be handled and processed using traditional data technologies. This is called the “3-V criteria”. If your data meets any of the 3-V criteria, then it meets the big data threshold and would be reasonable to classify your project as a “big data project”.