Hacking the Data Science Radar with Data Science

Hacking the Data Science Radar with Data Science

Hacking the Data Science Radar with Data Science

Hacking the Data Science Radar with Data Science
04 Jul 16
This post was first published by original author Duncan Garmonsway and reproduced with his kind permission.
This post reverse-engineers the Mango Solutions Data Science Radar using
Programming (R)
Why hack? Because getting at the innards also reveals
What a good score is in each category
Which statements are most important
Whether scores are comparable across people
Whether you should strongly agree with the statement “On average, I spend at least 25% of my time manipulating data into analysis-ready formats”
The radar
Based on Likert-style responses to 24 provocative statements, the Data Science Radar visualises your skills along six axes, the “core attributes of a contemporary ‘Data Scientist’.” It looks like this.
Mango Solutions Data Science Radar
First attempt: Multivariate multiple regression
How can we score better? Hacking the url would be cheating , so instead, let’s use science: hypothesise -> test -> improve. Here are some initial guesses.
Each of the 24 statements relates to exactly one attribute, i.e. four statements per attribute.
The Likert values (strongly agree, agree, somewhat agree, etc.) are coded from 1 to 7 (since there are seven points on each axis).
There is a linear relationship between the coded agreement with the statements, and the attributes.
So something like
$$text{score}_{text{attribute}} = frac{1}{4} sum_{i = 1}^{4} text{answer}_i$$
where answeri = 1, 2, ⋯, 7 by encoding “Strongly disagree” as 1, up to “Strongly agree” as 7, including only four relevant answers per attribute. The best-possible set of answers would score 7 on every axis, and the worst set would score 1.
If the hypotheses are correct, then all we need to do to prove them is to record 24 sets of random answers, the resulting scores, and fit a multivariate linear model. We’d expect each score (outcome variable) to have four non-zero coefficients (out of the 24 input variables). Let’s try it.
# The first two aren't random, but they're still linearly independent of the # others, which is what matters. random_data <- read_csv("./data/radar-random.csv") lm1 <- lm(cbind(Communicator, `Data Wrangler`, Modeller, Programmer, Technologist, Visualiser) ~ ., data = random_data) lm1 ## ## Call: ## lm(formula = cbind(Communicator, `Data Wrangler`, Modeller, Programmer, ## Technologist, Visualiser) ~ ., data = random_data) ## ## Coefficients: ## Communicator Data Wrangler Modeller Programmer Technologist Visualiser ## (Intercept) 2.060e+00 2.422e+00 3.247e+00 6.658e-01 -1.331e+00 1.456e+00 ## q01 1.997e-01 -2.507e-02 2.602e-01 -1.103e-01 -5.866e-02 -7.103e-02 ## q02 -2.571e-01 2.729e-02 -4.514e-01 2.090e-01 1.554e-01 1.281e-01 ## q03 3.087e-01 1.744e-02 -3.471e-01 -1.303e-03 5.611e-02 1.978e-01 ## q04 4.356e-01 8.534e-04 -8.676e-03 -2.346e-02 -7.130e-02 -4.193e-02 ## q05 -2.524e-01 2.267e-01 8.732e-01 -1.559e-01 -1.907e-01 -3.885e-01 ## q06 -1.948e-01 1.545e-01 7.016e-01 -7.626e-02 -1.271e-01 -3.897e-01 ## q07 -7.925e-03 2.075e-01 4.423e-01 -1.089e-01 -2.015e-01 -2.247e-01 ## q08 8.902e-02 -4.810e-01 -1.246e-02 8.111e-02 -5.556e-02 -4.572e-02 ## q09 1.901e-01 5.174e-02 -5.260e-01 -9.428e-02 5.506e-02 2.620e-01 ## q10 9.750e-02 -1.248e-02 -2.365e-01 3.181e-02 1.557e-01 3.267e-01 ## q11 -2.099e-01 -5.220e-02 2.943e-01 2.032e-01 6.801e-02 -1.775e-01 ## q12 -1.000e-01 1.813e-15 7.000e-01 -1.333e-01 9.653e-16 -1.000e-01 ## q13 5.164e-02 2.647e-02 -3.386e-01 2.881e-01 -4.010e-03 1.428e-01 ## q14 1.211e-01 -8.162e-02 -3.835e-02 -2.508e-01 -4.963e-02 7.972e-02 ## q15 4.971e-03 5.

Read Also:
Predictive Analytics Can Help Coordinate Care

 



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
5 features every product development road map should have

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
Predictive Analytics Can Help Coordinate Care

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
5 Steps for Creating a Scalable Data Security Plan

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
​Why the CIO should care about Cyber Security

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Clever computers can 'see' with radar and tell objects apart

Leave a Reply

Your email address will not be published. Required fields are marked *