Data science is an exciting, changing field. Curious minds and enthusiastic investigators can often get bogged down by algorithms, models, and new technology. If we’re not careful, we forget what we’re actually here to do: solve real problems. And if what we do is just theory, what’s the point?
To be relevant and useful, the day to day activities of data scientists must
In short, data science should result in real applications. The reality of this is multi-faceted. One important problem is managing data teams to get to that real world result. By using agile data science methods, we help data teams do fast and directed work, and manage the inherent uncertainty of data science and application development.
In this post, I’ll look at the practical ingredients of managing agile data science.
It’s a fact that data science results are probabilistic and unpredictable. At the start of a project, it can often look like there’s an obvious route from A to B. When you get started, it’s never that simple. Agile teams do away with strict planning and go into projects with a creative mindset; they embrace uncertainty instead of shying away from it.
This comes in handy when a roadblock pops up—traditionally-run data science teams can get stuck deciding on their options, while the flexible agile data science teams are more likely to find a new solution. Unpredictability and the need to adapt quickly to problems doesn’t scare them; it excites them.
At the same time, the agile planning method focuses hard on application to the customer’s problem. Otherwise, it’s easy for us to get lost down the rabbit hole of stringent rules about hypotheses, models, and results. In the latter scenario, we end up producing things that work—that validate our hypotheses—but that have little application to the real world scenario we’re producing them for. Wasting time is not good for us or our customers.
There are some key concepts that underpin the agile method we employ at SVDS. Collectively they provide us with the goals for a project, the top level strategy for investigation, and day-to-day action plans.
It’s great to have a method, but it helps to see how it’s used to solve a real problem. At SVDS, we used this method to create a system that tells train riders when the Caltrain is running late to a stop, and its approximate time of arrival. Let’s dive into how that worked.
I’ll give a brief overview of our Caltrain work, but if you want to learn more check out our project page. The point of this project, its charter, was to create an app that would tell the user when the Caltrain was running late, and how long it would be until it arrived at a designated stop. The Caltrain system has its own app, but it suffered from being inaccurate, and didn’t tell riders if a train was late, and how late it was. No one likes being late for work, so we wanted to create a solution for them.