There’s a lot of literature on learning the technical aspect of data science: statistics, machine learning, data munging, big data. This material will serve you well when starting out or working under a lead. But what about when you are ready to spread your wings and lead a project yourself or embark on a project independently? Here you need a different sort of storytelling – the type that communicates why you are working on a project, what the value is, and what you have accomplished. Without these skills, you run the risk of aimlessly seeking a solution without much to show for it. The last thing you want is to be a deer in the headlights when someone asks you what the business value of your work is. Pair the 3 Vs of big data with the 3 Ps of model development to increase the success rate of your project. Read on to learn how to detail the problem, the process, and your progress on any data science project.
In the real-world, problems are often not well defined. It is up to the practitioner to define the problem. Compare this to many classroom settings and entry-level positions that detail every last minutiae of work. This is equivalent to color-by-number coloring books. You are given the problem and the method. Your job is strictly execution. This can be an effective approach for learning a subject but less so for solving actual problems where things are more open-ended.
At some point in your drawing career you graduate from this detailed instruction and move to coloring books without numbers. The problem is still given to you, but now you choose the method. Hence, you have to decide what colors to use. More importantly, a “successful” drawing is now contingent upon whether you choose good color combinations.
Finally, you outgrow coloring books altogether. What happens now? Instead of a line drawing, you are given a blank sheet of paper. It is up to you to define the problem. Here you have the greatest freedom but also the highest risk of failure.
This progression from color-by-number to an empty piece of paper isn’t so different from the maturation of a (data) scientist. First you learn the techniques. Then you learn how to apply the techniques to problems given to you. Finally, you define the problems. As your career advances, your success will be contingent on transforming blank sheets of paper into something valuable, ie identifying opportunities from data. Hence, your first challenge is defining the problem.
There are a number of ways to ask this question. Equally valid are:
The answers need to be specific. A lot of times they will sound like a use case or user story, which takes the form of “I want to do X because Y”. This will help you identify who the beneficiary is for the project as well. If you don’t know what you are solving nor who benefits, you most certainly will fail as your project becomes indistinguishable from entertainment.