It may be a stretch to call data science commonplace, but the question “what’s next” is often heard with regard to analytics. And then the conversation often turns straight to Artificial Intelligence and deep learning. Instead, a tough love review of the current reality may be in order.
The simple truth is that, as currently configured, data-centric companies will struggle to cross the divide between what is currently considered effective data science and a modality where analytics is an inherent part of the fundamental fabric of business operations that benefits from continuous improvement. Today data science is all too often a process where new insights and models get developed as a one-time effort or deployed to production on an ad hoc basis, and require regular babysitting for monitoring and updating.
This is not to imply that companies are not on the right path with their data science initiatives, but merely acknowledgement that the steps they have taken thus far have brought them to the edge of a chasm they will have to cross. To the credit of more progressive organizations, creating an industrial-caliber data lake to store a lot of data of varying forms is an essential, foundational step. Building on that, the development of systems of data democratization that provide ready access to data for those seeking insight is critical. There’s no doubt that companies that have achieved those two steps already reap benefits.
Nevertheless, that’s as far as most have come and, more significant for the future, that’s all they have prepared themselves to accomplish. Today many companies have the data and data scientists who are equipped to do analysis and build models that can be carefully engineered to plug into some usable business application. But every deployment of a model is a custom, fragile, one-off job and ensuring quality of models is done as a fragile, manual effort. Change the model and the whole thing needs to be rebuilt. Often useful analyses are often performed once but can’t be reproduced or even worse get recreated periodically but inconsistently. And if a new version model doesn’t work well it can be a painful struggle to restore a previous version, let alone have systematic testing of models to continuously improve them.
It’s not enough to know how to wrestle with raw data. Companies need an infrastructure capable of continuously testing and improving models, starting with governed understood analytic data sets as input. This is an environment in which normalized data lets one do any kind of data science at any time.
It’s been done before.