Since the dawn of big data, privacy concerns have overshadowed every advancement and every new algorithm. This is the same for machine learning, which learns from big data to essentially think for itself. This presents an entirely new threat to privacy, opening up volumes of data for analysis on a whole new scale. Many standard applications of machine learning and statistics will, by default, compromise the privacy of individuals represented in the data sets. They are also vulnerable to hackers, who would edit the training data, compromising both the data and the final goal of the algorithm.
A recent project that demonstrates how machine learning could directly be used in the invasion of privacy was carried out by researchers at Cornell Tech in New York. Ph.D. candidate, Richard McPherson, Professor Vitaly Shmatikov, and Reza Shokri applied basic machine learning algorithms – not even specifically written for the purpose – to identify people in blurred and pixelated images. In tests where humans had no chance of identifying the person (0.19%), they say the algorithm had 71% accuracy. This went up to 83% when the computer was given five opportunities.
Blurred and pixelated images have long been used to disguise people and objects. License plate numbers are routinely blurred on television, as well as the faces of underage criminals, victims of particularly horrific crimes and tragedies, and those wishing to remain anonymous when interviewed. YouTube even offers its own facial blurring tool, developed to mask protestors and prevent potential retribution.
What’s possibly the most shocking part of the research is the ease with which the researchers were able to make it work.