How do Data Scientists fit into an organization? How is the role of a Data Scientist different from the role of a Data Engineer? How should the Data Science team be organized to create the most value for the business?
Those are questions companies are struggling with as they begin to incorporate Big Data, Analytics, and Data Science into their technology organizations. The Enterprise Data World 2016 Conference provided insight into the topic at a panel discussion, Organizing for Enterprise Data Science. The panel included Chris Bergh from DataKitchen, John Akred from Silicon Valley Data Science, and Tim Berglund from DataStax, and was moderated by April Reeve fromReeve Consulting LLC.
The panel began with the most basic questions, what does a Data Scientist do? What are the skills needed? Where can you find people with those skills? Reeve started the discussion by presenting the common Venn diagram of Data Scientist skills, showing they need hacking ability, subject matter expertise, and statistical skills. These skills support the roles of Data Scientists in researching the possibility of predicting behavior, introducing new data sources, performing analysis, proposing and validating predictive models, and developing prototypes of predictive solutions.
While performing those functions requires development skills, math and statistical knowledge, and business expertise, Bergh pointed out that it’s difficult to find people with all these skills. On the other hand, there is lots of incentive for people to develop these skills now: “It’s the sexiest job of the 21st century. The alpha nerds nowadays are Data Scientists.”
When Reeve suggested that the definition was too broad and that these skills couldn’t usually be found in a single person, Bergh agreed, adding, “I don’t think it’s too broad; I think it’s aspirational. I think your team should have all these skills as opposed to one person.”
Another role that’s common on Data Science teams is a Data Engineer. The panelists were in agreement that the roles of Data Scientists and Data Engineers are distinct.
“They are fundamentally different sides of the same coin. One is a mindset about improvisation, agility, and responding to current inputs and the other is about spending a lot of time very thoughtfully interpreting something,” Akred said.
Akred pointed out that the Venn diagram description of the Data Scientist job is missing an important skill that Data Engineers bring: familiarity with enterprise data systems. The challenges of “how do I find out what the data is; how do I find out what matters to this business” are issues that data engineers can address, he said. These are crucial skills, as understanding enterprise and operational systems is necessary to surface the data so the Data Scientists can do their analysis. Since Data Scientists often come from academic backgrounds where they haven’t interacted with enterprise systems, they lack those skills.
In fact, Data Engineers and Data Scientists have complementary skills. “Those two groups need to get along and work well together,” Akred added. If they don’t, the analytics team can lose a lot of productivity.
If the Data Scientists and Data Engineers work well together, ultimately there’s an insight that needs to be shared or a model that needs to be migrated into production. Typically the Data Scientists hand off that task to Data Engineers, Solution Architects, or some kind of software development team.
Bergh pointed out that deploying the work of Data Scientists doesn’t always result in implementing an application. Sometimes the result of the model is “deployed” in a PowerPoint shared with senior business management, and a software team isn’t required.
When it comes to developing a production application and building data pipelines, Reeve said:
“If the question is, ‘Are data engineers different from my regular enterprise programming group?’ maybe, maybe not.