Think you had a tough day? Spare a thought for recruiters who compete with each other to hire one of the country’s most sought-after specialists: the data scientist. The demand for data experts is so high that there could still be a 60 percent hiring gap for data scientists three years from now.
It’s all a part of the huge effort to tap into the promise of big data. Many businesses are struggling to realize gains in their big data investments, and the reasons extend beyond a lack of qualified data scientists.
In my experience, the lack the skills to properly manage data is a very real problem, but another challenge to big data insight is the way enterprises organize data – many companies silo or fragment their data. It’s gotten to the point that some organizations are wondering what benefits big data held for them in the first place.
One immediate benefit of big data is automation – the ability to automatically identify and preemptively resolve symptoms before they become a problem, as well as to eliminate time-wasting processes. This one-two punch frees up time and resources, enabling organizations to focus on better understanding what the end user wants and needs.
To realize the benefits of automation, we must consider how data ought to be stored today. We need to discuss data lakes.
Data lakes are repositories for storing relevant data requiring analysis. The types of data stored in these lakes usually come in three forms: structured, unstructured, and semi-structured. These data are stored in their raw forms, allowing for deep and complex analysis and not losing fidelity due to aggregated data. The more data that organizations pool into their data lakes, the more opportunity they have to discover previously unseen correlations and insights.
The ease and flexibility of using data contained in data lakes helps to identify repeatable tasks and processes. In fact, data lakes, because they act as a central repository for automated systems, can be used in building a system capable of recognizing trends, learning, and acting on its own accord.
Let’s use the process of resetting a password as an example. The system monitors the actions of an administrator helping an end user reset his or her password. It observes the steps involved in resetting the password and stores this information in its data lake. Then, the next time a user submits a password reset request, a software robot, as we call them, can walk the end user through the password reset process without the need for admin intervention.