Advances in cloud computing, data processing speeds, and the huge amount of data input from sources like IoT mean that companies are now collecting previously unseen amounts of data. Big data is now bigger than ever. But organizing, processing, and understanding the data is still a major challenge for many organizations.
Is your company still struggling to understand what big data is, and how to manage it? Here are 6 myths about big data, from the experts, to help you separate truth from fiction in the realm of big data.
Big data is a buzzword these days. But what it really means is still often unclear. Some people refer to big data as, simply, a large amount of data. But, that's not quite correct. It's a little more complex than that. Big data refers to how data sets, either structured (like Excel sheets) or unstructured (like metadata from email) combine with data like social media analytics or IoT data to form a bigger story. The big data story shows trends about what is happening within an organization—a story that is difficult to capture with traditional analytic techniques.
Jim Adler, head of data at Toyota Research Institute, also makes a good point: Data has a mass. "It's like water: When it's in a glass, it's very manageable. But when it's in a flood, it's overwhelming," he said "Data analysis systems that work on a single machine's worth of data will be washed away when data scales grow 100 or 1000 times. So, sure, prototype in the small, but architect for the large."
"The biggest myth is you have to have clean data to do analysis," said Arijit Sengupta, CEO of BeyondCore. "Nobody has clean data. This whole crazy idea that I have to clean it to analyze doesn't work. What you do is, you do a 'good enough' analysis. You take your data, despite all the dirtiness, and you analyze it. This shows where you have data quality problems. I can show you some patterns that are perfectly fine despite the data quality problems. Now, you can do focused data quality work to just improve the data to get a slightly better insight."
Megan Beauchemin, director of business intelligence and analytics for InOutsource, agreed. "Often times, organizations will put these efforts on the back burner, because their data is not clean. This is not necessary. Deploying an analytic application will illuminate, visually, areas of weakness in data," she said. "Once these shortfalls have been identified, a cleanup plan can be put into place. The analytic application can then utilize a mechanism to highlight clean-up efforts and monitor progress."
"If your data is not clean, I think that is all the more reason to jump in," Beauchermin said. "Once you tie that data together, and you're bringing it to life visually in an application where you're seeing those associations and you're seeing the data come together, you're going to very quickly see shortfalls in your data." Then, she said, you can see where the data issues lie, offering a benchmark as you clean the data up.
Here's another reason you shouldn't wait to clean up your data: "By the time you've cleaned your data, it's three months old—so you have stale data," said Sengupta. So, the information is no longer relevant.
Sengupta spoke about a conference where Josh Bartman, from the First Interstate Bank, brought up an important point. "Josh showed how he was running an analysis, finding a problem, changing the analysis, rerunning the analysis. He said, 'Look, my analyses are only about four to five minutes apart.