Some of the earliest applications of artificial intelligence in healthcare were in diagnosis—it was a major push in expert systems, for example, where you aim to build up a knowledge base that lets software be as good as a human clinician. Expert systems hit their peak in the late 1980s, but required a lot of knowledge to be encoded by people who had lots of other things to do. Hardware was also a problem for AI in the 1980s.
The promise of AI in diagnostics is that you can help people in locations where there aren’t enough doctors. Computers are not as creative as human pattern matchers, but that fact also means they can be more consistent than people. In addition to access and affordability, then, there’s the possibility that AI doctors could actually promote better outcomes than the ones with stethoscopes around their necks.
But how do you send a computer to medical school? And where do they go for their Continuing Medical Education credits?
Let’s start with an example of how statistical models could come to conclusions earlier than clinicians. Preeclampsia is a leading cause of death among pregnant women in the Western world and the main cause of fetal complications. 15% of first-time pregnancies involve women who have high blood pressure and half of those end up with preeclampsia. To solve it, you have to deliver the baby even if that makes it premature.
The problem is: does a patient have preeclampsia or are they developing it? If it’s not actually preeclampsia, you want to start anti-hypertensive treatment. An example of the promise of personalized statistical healthcare is Velikova and Lucas (2014). If their models work beyond their small sample size, they’d have a system that would diagnose preeclampsia a median of 4 weeks earlier than human clinicians.
In work like this, choosing the data carefully is important since it’s easy to accidentally lump people who weren’t actually recorded as non-preeclampsia. There are ways to being robust to noisy training data, but being able to say “the stuff in Training Category A really belong there and the stuff in Training Category B really belong there” is best. Similarly, teachers avoid peppering lectures to human medical students with errors, falsehoods, and noise.
A basic rule-of-thumb is that if you can’t get human beings to agree on what to call something, you’re going to have a hard time using machine learning to do it automatically. So an important part of the design of any machine learning project is piloting the project with people.
Let’s take a look at a healthcare project done by researchers at Beth Israel using CrowdFlower—you can read all the details in their published paper here.
For a variety of conditions, it’s necessary for pathologists to identify sections of images that are problematic. This can just help with diagnosis and it can also be used as training data for machine learning. In the image below, the crowd is being asked to draw circles around cell nuclei.
The following table compares research fellows and the crowd to expert pathologists. The takeaway is that the crowd can be pretty good at this task. Or rather, they are when the task is well designed.
Designing a task for humans gets you data that you can train models from.