The intersection of big data and business is growing daily. Although enterprises have been studying analytics for decades, data science is a relatively new capability. And interacting in a new data-driven culture can be difficult, particularly for those who aren’t data experts.
One particular challenge that many of these individuals face is how to request new data or analytics from data scientists. They don’t know the right questions to ask, the correct terms to use, or the range of factors to consider to get the information they need. In the end, analysts are left uncertain about how to proceed, and managers are frustrated when the information they get isn’t what they intended.
At The Data Incubator, we work with hundreds of companies looking to hire data scientists and data engineers or enroll their employees in our corporate training programs. We often field questions from our hiring and training clients about how to interact with their data experts. While it’s impossible to give an exhaustive account, here are some important factors to think about when communicating with data scientists, particularly as you begin a data search.
What question should we ask? As you begin working with your data analysts, be clear about what you hope to achieve. Think about the business impact you want the data to have and the company’s ability to act on that information. By hearing what you hope to gain from their assistance, the data scientist can collaborate with you to define the right set of questions to answer and better understand exactly what information to seek.
Even the subtlest ambiguity can have major implications. For example, advertising managers may ask analysts, “What is the most efficient way to use ads to increase sales?” Though this seems reasonable, it may not be the right question since the ultimate objective of most firms isn’t to increase sales, but to maximize profit. Research from the Institute of Practitioners in Advertising shows that using ads to reduce price sensitivity is typically twice as profitable as trying to increase sales. The value of the insight obtained will depend heavily on the question asked. Be as specific and actionable as possible.
What data do we need? As you define the right question and objectives for analysis, you and your data scientist should assess the availability of the data. Ask if someone has already collected the relevant data and performed analysis. The ever-growing breadth of public data often provides easily accessible answers to common questions. Cerner, a supplier of health care IT solutions, uses data sets from the U.S. Department of Health and Human Services to supplement their own data. iMedicare uses information from the Centers for Medicare and Medicaid Services to select policies. Consider whether public data could be used toward your problem as well. You can also work with other analysts in the organization to determine if the data has previously been analyzed for similar reasons by others internally.
Then, assess whether the available data is sufficient. Data may not contain all the relevant information needed to answer your questions. It may also be influenced by latent factors that can be difficult to recognize. Consider the vintage effect in private lending data: Even seemingly identical loans typically perform very differently based on the time of issuance, despite the fact they may have had identical data at that time. The effect comes from fluctuations in the underlying underwriting standards at issuance, information that is not typically represented in loan data.
You should also inquire if the data is unbiased, since sample size alone is not sufficient to guarantee its validity. Finally, ask if the data scientist has enough data to answer the question.