Getting More Insights from Data: Nine Facts about the Practice of Data Science
- by 7wData
The value of data is measured by what you do with it, and organizations are relying on data scientists to extract that value. I recently conducted a survey of data professionals to better understand what it means to be a data scientist. I discovered a few things that can help organizations optimize the value of their data. While I wrote about these findings in prior posts, I want to summarize the major points here, in a more concise way.
While some of these points below seem rather mundane or obvious, it's important to note that these ideas are no longer only opinions; they are backed up by empirical data. This is how data science really works.
1. There are a handful of different skills that make up the field of data science. While we measured five distinct skill types, a factor analysis of proficiency ratings of these five skills resulted in three distinct skill types:
2. There are different kinds of data scientists. Our study examined four distinct job roles among these data professionals:
Respondents were asked to select which of these job roles best described their work. They could choose one or any combination of job roles. The correlation across job roles (1 = selected; 0 = not selected) was quite low (average r was -.07; highest r was -.30), suggesting that these four job roles are distinct from each other.
3. Different job roles require different skill sets. Data professionals in different job roles have different skill sets.  Not surprisingly, data professionals who identified as Developers reported the highest levels of proficiency in Technology and Programming skills compared to their counterparts. Additionally, Researchers reported the highest levels of proficiency in Statistics and Math while data professionals who identified as Business Management reported the highest levels of proficiency in Business. Finally, data professionals who identified as Creative reported moderate ratings across all skill sets, suggesting they are indeed jack-of-all-trades.
4. The scientific method is an effective way to approach data-intensive projects. Scientists have been getting insight from data for centuries using the scientific method. Formally defined, the scientific method is a body of techniques for objectively investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. The scientific method includes the collection of empirical evidence, subject to specific principles of reasoning. The application of the scientific method helps us be honest with ourselves and minimizes the chances of us arriving at the wrong conclusion. The scientific method plays a critical role in understanding any data, irrespective of their size or speed or variety.
5. Statistics skills, compared to other data skills, are good predictors of success of analytics projects. We found that, of the 25 data skills studied, proficiency in Data Mining and Visualization Tools was among the top 4 skills that was correlated with satisfaction with project success across the four different job roles; no matter what your job role is, a solid understanding of data mining and visualization tools will improve your success and satisfaction in analytics projects. Additionally, for data professionals in Business Management roles, their proficiency business skills were the weakest predictors of their project success while their proficiency Statistics skills (e.g., statistics and statistical thinking, data mining and visualization tools, science/scientific method) were among the strongest predictors of project success.
6. Finding a data professional who is proficient in all data science skill areas is extremely difficult. Data professionals rarely possess proficiency in all five skill areas at the level needed to be successful at work. In fact, the chance of finding a data professional with expert skills in all five data science skills is akin to finding a unicorn; they just don't exist.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Evolving Your Data Architecture for Trustworthy Generative AI
18 April 2024
5 PM CET – 6 PM CET
Read MoreShift Difficult Problems Left with Graph Analysis on Streaming Data
29 April 2024
12 PM ET – 1 PM ET
Read More