I still think that Hacking Skills, Math & Statistics Knowledge and Substantive Expertise (shortened to “Programming”, “Statistics” and “Business” for legibility) are important… but I think that the role of Communication is important, too. All the insights you derive by leveraging your hacking, stats and business expertise won’t make a bit of a difference unless you can communicate them to people who may not have that unique blend of knowledge. You may need to explain your statistical insights to a business manager who needs to be convinced to spend money or change processes. Or to a programmer who doesn’t think statistically.
So here is the new data science Venn diagram, which also includes communication as one indispensable ingredient.
Davenport and Patil describe data scientists as curious, self-directed and innovative, i.e., they are not limited by the tools available and when needed fashion their own tools and even conduct academic- style research. Not surprisingly, people with this combination of skills and characteristics are rare, as rare and as much in demand as the computer programmers in the 1990s.
This rarity and high demand for data science skills has meant that statisticians, machine learners, data miners, data analysts, DBAs as well as quantitative analysts, i.e., people with any data or analytics skills have re-badged themselves as data scientists so that they are more marketable. This is not unlike the pre-Y2K hype when computer operators and users of PCs, re-badged themselves as computer programmers.
The term “data scientist” itself has become so diffuse that it represents anybody from data base administrators to analysts doing simplistic summaries on Excel spreadsheet to data engineers setting up Hadoop infrastructure to advanced analytics practitioners who discover valuable insights from data using existing tools as well as those like the data scientists in Google and Facebook who derive insights from data using their own enhanced toolkit.