Data science is fast becoming a critical skill for developers and managers across industries, and it looks like a lot of fun as well. But it’s pretty complicated - there are a lot of engineering and analytical options to navigate, and it’s hard to know if you’re doing it right or where the bear traps lie. In this series we explore ways in to making sense of data science - understanding where it’s needed and where it’s not, and how to make it an asset for you, from people who’ve been there and done it.
Organisations are increasingly adopting Data Science and advanced analytics, which influence their decision making, products and services to a growing extent. That regularly raises the question of what is the best set of tools for Data Science. On the surface, this subject appears to be about technology comparisons. You could end up reviewing a lengthy list of pros and cons about R, Spark ML, and related technologies like Jupyter or Zeppelin. In fact, we could write a whole series of technology comparisons. However, for the organisation, this is first and foremost a question of what capabilities will support its future business goals. Focusing on them makes the technology choices easier reducing the risk of wasting time and effort.
How can we arrive at a framework to have the above conversation about selecting technologies in a pragmatic and productive manner? In this article, we explore a suitable framework with a real-world example. A typical starting point for organisations is a paralysing number of silos and a plethora of adopted technologies. You don’t want to add more technologies and silos merely because stakeholders ask for them. New technologies and infrastructure should displace existing technologies and break down and replace silos. But this is not trivial in an environment where traditional analytics and business intelligence vendors claim to have the answer to the new challenges and a flood of new technologies, many of them open source, add further choices. The latter often claim to replace the traditional tools and bring capabilities beyond their reach. The incumbents counter that they offer better enterprise qualities like security and support.
The real world example customer we discuss here approached my employer over a year ago with a tremendous challenge that consisted of immediate and long-term strategic requirements. This FTSE100 company was at a transformational moment of its life. It was changing significantly organisationally and needed to reinvent parts of its current platform because of past fragmentation and dependencies that were not maintainable and did not deliver business value. The urgent request to us was to address immediate business needs on advanced reporting and basic analytics for a new platform blending in historical data in a fully transparent fashion with a tight deadline. The existing data warehouse technology based on an appliance technology was costly and limiting. New reports and advanced analytics were prohibitively slow or impossible to execute without investing large sums and without adding future proofing analytics capabilities.
The cost and limitations were a grave concern. The customer recognised that long-term the value derived from its core business activities inevitably will shrink as the market is becoming increasingly competitive with disruptive technology changes on the horizon. The leaders in the organisation realised that novel capabilities were needed to prepare the business for the future immediately after addressing the urgent requirement.
We worked with the key stakeholders and developed a plan to bring together in a central place the main datasets with full flexibility to process and analyse it in the future for the next evolution of the business.