Summary: If you’re responsible for a Data Science team of more than three or four it’s time to start thinking about productivity and efficiency.
Efficiency is not something we often think about in managing our data science teams but increasingly we should. Supposing you are a data scientist who is now asked to lead your group, or even more difficult, suppose you are not a data scientist and have the DS group reporting to you. How do you ensure you’re getting the appropriate return for your investment?
Especially if you are a non-data scientist executive with overall responsibility for a DS group, even asking the right questions may seem daunting. After all it took a lot of effort to get funding, then to find those rare hires, and finally to get them up and running. And they speak that arcane DS dialect that not even the techies in IT can understand. It looks like they’re doing OK. They’re producing some useful models and bringing new business insights. But could they be doing better? Here are three tips to consider to get the most out of your group.
First, there’s no longer a place for the lone wolf data scientist in an advanced analytics shop. That almost assuredly means that you’ve needed to drive them toward a common advanced analytics platform. Could be R or Python, or it could be SAS, SPSS, or one of the other proprietaries, but you can’t have everybody doing their own thing.
When you prepare predictive models using different platforms or languages the overall intent may be the same, but unless everybody’s speaking the same language the communication will suffer, meaning fewer minds can share a problem and that there’s less supervision and collaboration. Both these are worrisome.
By the way, a common platform doesn’t mean just the one that runs the data science algorithms. Importantly, since most of the time in any new project is spent in blending and cleansing/preparing the data your common platform should be capable in this area also.
Here’s my personal bias. Having everyone write original R or Python code may look cool but it’s not reliably repeatable by the same data scientist or between data scientists. Platforms that incorporate drag-and-drop interfaces (aka visual IDEs) are typically designed with repeatability in mind and to my way of thinking that a big advantage.
The second is there should be an agreed process or methodology. All data scientists are originally raised with certain principles and these are most commonly embodied in the CRISP-DM methodology (Cross Industry Standard Process for Data Mining). I had the pleasure of helping to develop this back in the 90s and there’s nothing magic here, just good common sense. But unless you have an agreed methodology and enforce it, you won’t know who is cutting corners and with what consequences.
There are however two other areas you need to look at that may not be as obvious as these.