An in-depth, multifaceted, and all-around very helpful roadmap for making the switch from 'science' to 'data science,' yet generally useful for data science beginners or anyone looking to get into data science.
After posting What I do or: science to data science I got a lot of emails on how to make this transition.
In this post I try to summarize my advice. I don’t intend to write a complete walkthrough, but to provide a starting point, with links to further materials. I target it at people with academic, quantitative background (e.g. physics, mathematics, statistics), regardless if they are undergraduate students, PhDs or after a few postdocs. Some points may be valid for other backgrounds (but then - use it at your own risk).
Here and everywhere else: please don’t take approach of learn book[s] then play - start with playing!
All projects required me to learn something new - be it a library, a machine learning model or a software tool. Analyzing real, and often - dirty, data using a mixture of programming and statistics. Or, as Josh Wills put it:
From my perspective the whole process looks that way:
And everything needs to be done in a reproducible way - so others can interact with your code, or even run it on a server. Depending on the job, there may be more emphasis on one part or the other. Or even look at this tweet - while humorous, it shows a balanced list of typical skills and activities of a data scientist:
If you want to learn more about what is data science, look at the following links:
When you have some academic title, no-one will question your intelligence. But they are justified to question your practical skills. From my experience, you need to fulfill two requirements:
Most data science things are simple and at the point that you are able to use R or Python you can start working, gradually increasing your knowledge and experience. That is, after a few months you should be ready to start an entry-level job.
Initially, I was afraid that it is a problem that I lack 10+ years of experience with C++ and Java.;