AI has a big data problem. Here’s how to fix it
- by 7wData
Artificial intelligence has, quite literally, got a big data problem – and one that the COVID-19 crisis has now made impossible to ignore any longer.
For businesses, governments, and individuals alike, the global pandemic has effectively redefined "normal" life; but while most of us have now adjusted to the change, the same cannot be said of AI systems, which base their predictions on what the past used to look like.
Speaking at the CogX 2020 conference, British mathematician David Barber said: "The deployment of AI systems is currently clunky. Typically, you go out there, collect your data set, label it, train the System and then deploy it. And that's it – you don't revisit the deployed System. But that's not good if the environment is changing."
Barber was referring to supervised machine learning, which he called today's "classical paradigm" in AI, and which consists of teaching algorithms by example. In a supervised model, an AI system is fed a large dataset that has been previously labeled by humans, and which is used to train the technology into recognizing patterns and making predictions.
You could train an algorithm to automate the lending decision in a bank for example, based on individuals' incomes or credit scores. Cue COVID-19, along with a whole new set of banking patterns, and the AI system is likely to be at a loss to decide who gets the cash.
Similarly, a few months into the COVID-19 crisis, a US researcher pointed out that algorithms, despite all the training data they have been fed, wouldn't be all that helpful in understanding the nature of the outbreak or its spread across the globe.
Because of the lack of training data about past coronaviruses, explains the research, most of the predictions generated by AI tools were found to lack reliability, and results often skewed away from the severity of the crisis.
Meanwhile, in healthtech, the makers of AI health tools struggled to update their algorithms due to a lack of relevant data about the virus, resulting in many "symptom finder" chatbots being a little off the mark.
With data from a pre-COVID environment not matching the real world anymore, supervised algorithms are running out of examples to base their predictions on. And to make matters worse, AI systems don't flag their uncertainties to their human operator.
"The AI won't tell you when it actually isn't confident about the accuracy of its prediction and needs a human to come in," said Barber. "There are many uncertainties in these systems. So it is important that the AI can alert the human when it is not confident about its decision."
This is what Barber described as an "AI co-worker situation", where humans and machines would interact to make sure that gaps aren't left unfilled. In fact, it is a method within artificial intelligence that is slowly emerging as a particularly efficient one.
Dubbed "active learning", it consists of establishing a teacher-learner relationship between AI systems and human operators. Instead of feeding the algorithm a huge labeled dataset, and letting it draw conclusions – often in a less-than-transparent way – active learning lets the AI system do the bulk of data labeling on its own, and crucially, ask questions when it has a doubt.
The process involves a small pool of human-labeled data, called the seed, which is used to train the algorithm. The AI system is then presented with a larger set of unlabeled data, which the algorithm annotates by itself, based on its training – before integrating the newly labeled data back into the seed.
When the tool isn't confident about a particular label, it can ask for help from a human operator in the form of a query. The choices made by human experts are then fed back into the system, to improve the overall learning process.
The immediate appeal of active learning lies in the much smaller volume of labeled data that is needed to train the system.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More