“AI,” “big data,” and “machine learning” are all trending buzzwords, and you might be curious about how they apply to your domain. You might even have startups beating down your door, pitching you their new “AI-powered” product. So how can you know which problems in your business are amenable to machine learning? To decide, you need to think about the problem to be solved and the available data, and ask questions about feasibility, intuition, and expectations.
Start by distinguishing between automation problems and learning problems. Machine learning can help automate your processes, but not all automation problems require learning.
Automation without learning is appropriate when the problem is relatively straightforward. These are the kinds of tasks where you have a clear, predefined sequence of steps that is currently being executed by a human, but that could conceivably be transitioned to a machine. This sort of automation has been happening in businesses for decades. Screening incoming data from an outside data provider for well-defined potential errors is an example of a problem ready for automation. (For example, hedge funds automatically filtering out bad data in the form of a negative value for trading volume, which can’t be negative.) On the other hand, encoding human language into a structured dataset is something that is just a tad too ambitious for a straightforward set of rules.
For the second type of problems, standard automation is not enough – they require learning from data. And we now venture into the arena of machine learning. Machine learning, at its core, is a set of statistical methods meant to find patterns of predictability in datasets. These methods are great at determining how certain features of the data are related to the outcomes you are interested in. What these methods cannot do is access any knowledge outside of the data you provide. For example, researchers at the Univeristy of Pittsburg in the late 1990s evaluated machine learning algorithms for predicting mortality rates from pneumonia. The algorithms recommended that hospitals send home pneumonia patients who were also asthma sufferers, estimating their risk of death from pneumonia to be lower. It turned out that the dataset fed into the algorithms did not account for the fact that asthma sufferers had been immediately sent to intensive care, and had fared better only due to the additional attention.
So what are good business problems for machine learning methods? Essentially, any problems that: (1) require prediction rather than causal inference; and (2) are sufficiently self-contained, or relatively insulated from outside influences. The first means that you are interested in understanding how, on average, certain aspects of the data relate to each other, and not in the causal channels of their relationship. Keep in mind that the statistical methods do not bring to the table the intuition, theory, or domain knowledge of human analysts.