Hacking the Data Science Radar with Data Science

Hacking the Data Science Radar with Data Science

Hacking the Data Science Radar with Data Science

Hacking the Data Science Radar with Data Science
04 Jul 16
This post was first published by original author Duncan Garmonsway and reproduced with his kind permission.
This post reverse-engineers the Mango Solutions Data Science Radar using
Programming (R)
Why hack? Because getting at the innards also reveals
What a good score is in each category
Which statements are most important
Whether scores are comparable across people
Whether you should strongly agree with the statement “On average, I spend at least 25% of my time manipulating data into analysis-ready formats”
The radar
Based on Likert-style responses to 24 provocative statements, the Data Science Radar visualises your skills along six axes, the “core attributes of a contemporary ‘Data Scientist’.” It looks like this.
Mango Solutions Data Science Radar
First attempt: Multivariate multiple regression
How can we score better? Hacking the url would be cheating , so instead, let’s use science: hypothesise -> test -> improve. Here are some initial guesses.
Each of the 24 statements relates to exactly one attribute, i.e. four statements per attribute.
The Likert values (strongly agree, agree, somewhat agree, etc.) are coded from 1 to 7 (since there are seven points on each axis).
There is a linear relationship between the coded agreement with the statements, and the attributes.
So something like
$$text{score}_{text{attribute}} = frac{1}{4} sum_{i = 1}^{4} text{answer}_i$$
where answeri = 1, 2, ⋯, 7 by encoding “Strongly disagree” as 1, up to “Strongly agree” as 7, including only four relevant answers per attribute. The best-possible set of answers would score 7 on every axis, and the worst set would score 1.
If the hypotheses are correct, then all we need to do to prove them is to record 24 sets of random answers, the resulting scores, and fit a multivariate linear model. We’d expect each score (outcome variable) to have four non-zero coefficients (out of the 24 input variables). Let’s try it.
# The first two aren't random, but they're still linearly independent of the # others, which is what matters. random_data <- read_csv("./data/radar-random.csv") lm1 <- lm(cbind(Communicator, `Data Wrangler`, Modeller, Programmer, Technologist, Visualiser) ~ ., data = random_data) lm1 ## ## Call: ## lm(formula = cbind(Communicator, `Data Wrangler`, Modeller, Programmer, ## Technologist, Visualiser) ~ ., data = random_data) ## ## Coefficients: ## Communicator Data Wrangler Modeller Programmer Technologist Visualiser ## (Intercept) 2.060e+00 2.422e+00 3.247e+00 6.658e-01 -1.331e+00 1.456e+00 ## q01 1.997e-01 -2.507e-02 2.602e-01 -1.103e-01 -5.866e-02 -7.103e-02 ## q02 -2.571e-01 2.729e-02 -4.514e-01 2.090e-01 1.554e-01 1.281e-01 ## q03 3.087e-01 1.744e-02 -3.471e-01 -1.303e-03 5.611e-02 1.978e-01 ## q04 4.356e-01 8.534e-04 -8.676e-03 -2.346e-02 -7.130e-02 -4.193e-02 ## q05 -2.524e-01 2.267e-01 8.732e-01 -1.559e-01 -1.907e-01 -3.885e-01 ## q06 -1.948e-01 1.545e-01 7.016e-01 -7.626e-02 -1.271e-01 -3.897e-01 ## q07 -7.925e-03 2.075e-01 4.423e-01 -1.089e-01 -2.015e-01 -2.247e-01 ## q08 8.902e-02 -4.810e-01 -1.246e-02 8.111e-02 -5.556e-02 -4.572e-02 ## q09 1.901e-01 5.174e-02 -5.260e-01 -9.428e-02 5.506e-02 2.620e-01 ## q10 9.750e-02 -1.248e-02 -2.365e-01 3.181e-02 1.557e-01 3.267e-01 ## q11 -2.099e-01 -5.220e-02 2.943e-01 2.032e-01 6.801e-02 -1.775e-01 ## q12 -1.000e-01 1.813e-15 7.000e-01 -1.333e-01 9.653e-16 -1.000e-01 ## q13 5.164e-02 2.647e-02 -3.386e-01 2.881e-01 -4.010e-03 1.428e-01 ## q14 1.211e-01 -8.162e-02 -3.835e-02 -2.508e-01 -4.963e-02 7.972e-02 ## q15 4.971e-03 5.

Read Also:
Machine Learning Data for Self-Driving Cars: Shared or Proprietary

 



Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Data science without statistics is possible, even desirable

AI Paris

6
Jun
2017
AI Paris

20% off with code AIP17-7WDATA-20

Read Also:
10 Tools for Data Visualizing and Analysis for Business

Chief Data Officer Summit San Francisco

7
Jun
2017
Chief Data Officer Summit San Francisco

$200 off with code DATA200

Read Also:
4 Reasons Your Machine Learning Model is Wrong (and How to Fix It)

Customer Analytics Innovation Summit Chicago

7
Jun
2017
Customer Analytics Innovation Summit Chicago

$200 off with code DATA200

Read Also:
Machine Learning Data for Self-Driving Cars: Shared or Proprietary

HR & Workforce Analytics Innovation Summit 2017 London

12
Jun
2017
HR & Workforce Analytics Innovation Summit 2017 London

$200 off with code DATA200

Read Also:
How Big Data Is Changing Conventional Lending

Leave a Reply

Your email address will not be published. Required fields are marked *