The data scientists writing the algorithms that drive giants like Alphabet Inc. (Google) and Facebook Inc. are today's technology wizards, and companies and governments increasingly use their creations -- often in secret and with little oversight -- to do everything from hiring and firing employees to identifying likely suspects for police monitoring. But there's a dark side -- and computer scientists warn that we'll need a lot more transparency if the big-data revolution is really to work for all of us.
In her recent book "Weapons of Math Destruction," mathematician Cathy O'Neil tells the story of Sarah Wysocki, a teacher fired from her job at MacFarland Middle School in Washington, after a computer algorithm churning through numbers on student performance judged her to be a poor teacher. Both students and parents consistently ranked Wysocki as an excellent teacher, yet she couldn't fairly challenge the decision because the company that developed the algorithm claimed a right to proprietary secrecy. Her firing stood despite near certainty that the algorithm, with the limited data it analyzed, couldn't have reached any statistically meaningful conclusion.
Wysocki was soon hired by a better-funded school system that relied on people to make decisions. Many others haven't been so fortunate. O'Neil's book presents an alarming picture of a race to profit from the explosion of data on human behavior, often taking place with little concern for basic norms of fairness. Companies sometimes deny credit to individuals if they've shopped in stores frequented by others with poor credit histories. Automated analysis of data is widely used to make decisions on university admissions, on hiring, even policing strategy, and the practice, as O'Neil shows, often reinforces racial discrimination, despite its seeming objectivity.
What can be done? That's far from clear, but computer scientists increasingly recognize the seriousness of the problem. In anew paper, computer scientist Bruno Lepri and colleagues explore some key ideas on how technology itself might help.
A first thing is to find ways to control how data can be used. Researchers at the Massachusetts Institute of Technology Media Lab are developing a cloud-computing platform called Enigmato let individuals share their data, while controlling how it can be used. Suppose an insurance company wants to use people's mobile phone data to assess risks more accurately, which could reduce client premiums.