This post will explain why anyone transforming their company into a data-driven organization should care about software development best practices, even if they don’t consider themselves a software company.
We’ve discussed it before, but it’s worth mentioning again: developing data systems is hard. Really hard. Companies spend a lot of time and effort talking to vendors and carefully selecting the software components for their enterprise data infrastructure. However, they often overlook the tools and processes that will enable them to build and operate the applications that make those component choices valuable. The consequences are dire: unmaintainable systems, unmet goals, unhappy users.
Paralleling the rise of data in the collective consciousness of businesses has been another trend: the emergence in the software development world of a number of interrelated approaches and practices (continuous delivery, continuous integration, and DevOps) meant to improve the quality of software delivery. In particular, DevOps has emerged as a way to put into practice some of the same organizational principles espoused by agile methodology (collaboration, enablement, and a focus on iterative development cycles). The goal is to have operations, development, and QA capabilities work closely together (often using the same tools) throughout the entire software lifecycle.
In this post, I will explain why anyone transforming their company into a data-driven organization should care about software development best practices, even if they don’t consider themselves a software company.
In future posts, I’ll delve into the impact of data and distributed systems on development and operations, and the capabilities and practices that will help your data systems development succeed.
The open source projects that power much of the data systems built today were originally created as infrastructure: software that provided generalized functionality for multiple use cases. These technologies were typically created to reap the benefits that detailed insights and expanded processing capabilities provided to specific organizations. These companies understood that to effectively build such infrastructure you have to enable both the developers creating the software and the consumers of the infrastructure. As both creators and consumers of these systems and their dependent applications, they were able to bothbuild (and extend) the general capabilities of their data infrastructure and provide feedbackon necessary functionality as the goals of the organization evolved.
As data has become increasingly relevant across many industries, an ever-expanding number of companies (who otherwise would not consider themselves “software” companies) have begun building new data systems in order to transform themselves into data-driven organizations. These companies will sensibly make their first move toward this goal by using packaged software or commercially-supported platforms built around open sourced data infrastructure projects. And, yes, many of those platforms are great pieces of software and will provide a great deal of needed functionality.
However, it pays to be blunt here: If you are on the path to being a data-driven company, you have to be on the path to being a development-enabled company.
At some point, the specifics of your business will demand software development. The features you want to provide to your users, be they internal business users, developers, or external customers, will outmatch the pre-built functionality of packaged software solutions. Simply integrating a number of components or disparate systems together will not fulfill your ambitions.
Like the companies that created the original data infrastructure software, you will need to be able to create the processes within your organization to effectively build and provide feedback on the software you are creating.