Someone once said “if you can’t measure something, you can’t understand it.” Another version of this belief says: “If you can’t measure it, it doesn’t exist.” This is a false way of thinking – a fallacy – in fact it is sometimes called the McNamara fallacy. This mindset can have dire consequences in national affairs as well as in personal medical treatment (such as the application of “progression-free survival” metrics in cancer patients, where the reduction in tumors is lauded as a victory while the corresponding reduction in quality of life is ignored).
Similarly, in the world of data science and analytics, we are often drawn into this same way of thinking. Quantitative data are the ready-made inputs to our mathematical models. The siren call of quantifiable predictive and prescriptive models is difficult to resist. If the outputs from our models are quantitative (e.g., accuracy, precision, recall, or some other validation metric), then why not also the inputs to our models? Isn’t that the essence of being data-driven?
What we overlook when we say “data-driven” is that we really mean to say “evidence-based”. Evidence is not only quantitative. Similarly, data are not only quantitative. Consequently, what we miss in the rush to be more quantitative is the enormous value of qualitative data sets. The value of qualitative data comes in several ways, including:
We will explore these ideas by responding to four basic questions related to qualitative data:
1. What are some ways that we encounter qualitative data?
Qualitative data can come from surveys, customer response forms, documents, and even social media. These are invaluable sources of information that organizations already collect and exploit for important insights. Historically, the analysis of qualitative data tended to be very human-intensive, since we could not just submit a database query against a document and get some numbers back that we can feed into a visualization. Consequently, historical qualitative data analysis were typically limited in scope. However, that situation is now rapidly changing. There are increasingly clever ways that qualitative data are being transformed into quantitative data, thereby unleashing the full power of quantitative analytics on the qualitative data also. Some transformation methods include scoring (assigning a numerical rank or score to specific qualitative responses or comments), sentiment analysis (assigning a positive or negative value to the sentiment being expressed in the qualitative data, and then assigning a numerical value to the strength of that sentiment), text analytics (summarizing the content of textual information in quantitative ways, such as topic models and heat maps), and natural language and semantic processing (extracting meaning from the language, whether written or verbal). Consequently, qualitative data are already first-class citizens in the world of big data and they should be allowed equal opportunity to deliver business insights and value.
2. What are some of the similarities and differences between qualitative data and quantitative data when it comes to deriving insights?
Since qualitative data generally are data that are not quantitative, that means that these data are unstructured and usually textual. They might be from customer surveys, response forms, online forums, feedback comment blocks on web forms, written comments, phone calls to call centers, anecdotal evidence (e.g., gathered by our sales force or marketing team), news reports, and so on. Consequently, the extraction of structure and objective insights from such data requires a model: How do we model the words, or the comments, or the survey responses that we are collecting? How much weight do we assign to different content? How do we combine and integrate multiple sources? Answers to these questions are not really that much different from the answers we give to these exact same questions when we handle quantitative data.