More than one recent big data study has labeled 2016 as the year of action, a time to actually act on the insights that data analytics can provide. That theme was echoed at the recent Strata & Hadoop World conference in San Jose, Calif.
David Weldon, editor of Information Management (a sister publication to Health Data Management) spoke with Ash Parikh, vice president of data integration, data security and big data at Informatica for his take on what this new focus means.
What are the most common themes that you are hearing and how do those themes align with what you expected?
The Big Data space has been evolving gradually over the last few years, reflecting the level of maturity in real customer projects. A few years ago, if you were to attend any of the major Big Data events, road shows or conferences, you would walk away extremely excited about the buzz, the giveaways, the myriad technologies mushrooming by the minute, and the newness of the space in general.
We are starting to see a shift—there is more awareness in general that it is a nightmare to keep up with all the new technologies being introduced and ones that are fast becoming outdated in such a short time.
Additionally, there is more discussion about how to deliver value from all the investments around Big Data—how can I increase campaign effectiveness, how can I ensure improved healthcare outcomes or how can I reduce the risk of fraud? The fact that there are now more sessions and articles and blogs and Tweets about how to not turn a data lake into a data swamp, is evidence in itself that companies are starting to ask some hard questions.
What are the most common challenges that organizations are facing in data management and data analytics?
Firstly, the audience is not even fully aware that there is a problem. According to Gartner and other leading industry analyst firms, over 70 percent of Big Data projects either fail entirely or struggle to go beyond experimentation because of a lack of due diligence upfront to data management.
It is generally felt that it is enough to simply spin up a Hadoop cluster, dump all types of data into it at scale, create a sandbox, and experiment, and then almost magically, those golden needles in the haystack (read that as new and unique insights) will reveal themselves. Typically, all this is done by bringing together a host of open-source technologies and throwing hand-coding at the problem.
If this effort needs to scale, and more importantly, deliver trusted and timely insights, customers typically translate this to more hand coding.;