When machines use predictions to make decisions that can impact the rest of our lives, we are entitled to explanations of how they work and how predictive models were created (including the data that was used).
Last week I went to the workshops at NIPS (biggest ML conference in the world) and I also attended part of the ML and the Law symposium the day before. I found out a little bit too late about the symposia but I was still able to attend two panels on which there were both lawyers and computer scientists. They were very insightful and informative — did you know that this Spring, the European Union passed a regulation giving its citizens a “right to an explanation” for decisions made by machine-learning systems?
Below is the rest of my notes from the symposium…
The panel discussions were motivated by the problem of explaining ML-powered decisions which have an important impact on people’s lives:
We need to be able to test how systems get to their conclusions; if we can’t test, we can’t contest. Individuals are entitled to know which data is being processed of them, and to explanations of how predictions & decisions work, in terms they can understand. On this topic, panelists agreed that we need to be clearer about what a correct explanation is and what it should contain. (Maybe this notion changes from one domain to the other. Do we need to have explanations in natural language, or with a certain form / structure?)
The promise of AI (and ML which is part of it) is to be enable better decisions. We humans have many cognitive biases that can make us poor decision makers in many situations. But if the entire society is to depend on ML, we all have to learn to understand how it works (lawyers, but also the general public). Lawyers need to understand the methodology and how computer science works. In the context of machine learning, that means understanding the data, how it’s collected, and the biases that may be laying in the data. Everyone should know what ML allows to do and what are its limitations. Models learnt from data may be biased, because the data collection was biased (e.g. data on credit defaults which only contain credits that were approved by some process). ML can be used in ways that are completely nonsensical: for instance, if you train a model on pictures of dogs and cats, and then you show a plant, the model will see a dog or a cat!
We should be able to scrutinize algorithms that learn models from data, the data itself (from its collection to its preparation in ML-ready form — a.k.a. “featurization”), and the algorithms that make predictions and decisions from a given model. (However, it may not be that clear how we can audit what happens behind Intellectual Property walls.)
If we are to use ML to improve many aspects of our lives, we need models to be interpretable so we can fully trust them.