Nobody understands why deep neural networks are so good at solving complex problems. Now physicists say the secret is buried in the laws of physics.
In the last couple of years, deep learning techniques have taken the world of artificial intelligence by storm. One by one, the abilities and techniques that humans once imagined were uniquely our own have begun to fall to the onslaught of ever more powerful machines. Deep neural networks are now better than humans at tasks such as face recognition and object recognition. They’ve mastered the ancient game of Go and thrashed the best human players.
But there is a problem. There is no mathematical reason why networks arranged in layers should be so good at these challenges. Mathematicians are flummoxed. Despite the huge success of deep neural networks, nobody is quite sure how they achieve their success.
Today that changes thanks to the work of Henry Lin at Harvard University and Max Tegmark at MIT. These guys say the reason why mathematicians have been so embarrassed is that the answer depends on the nature of the universe. In other words, the answer lies in the regime of physics rather than mathematics.
First, let’s set up the problem using the example of classifying a megabit grayscale image to determine whether it shows a cat or a dog.
Such an image consists of 1 million pixels that can each take one of 256 greyscale values. So in theory, there can be 256 possible images, and for each one it is necessary to compute whether it shows a cat or dog.
That’s hard, not least because there are significantly more images than there are atoms in the universe. And yet neural networks, with merely thousands or millions of parameters, somehow manage this classification task with ease.
In the language of mathematics, neural networks work by approximating complex mathematical functions with simpler ones. When it comes to classifying images of cats and dogs, the neural network must implement a function that takes as an input a million grayscale pixels and outputs the probability distribution of what it might represent.
The problem is that there are orders of magnitude more mathematical functions than possible networks to approximate them. And yet deep neural networks somehow get the right answer.
Now Lin and Tegmark say they’ve worked out why. The answer is that the universe is governed by a tiny subset of all possible functions. In other words, when the laws of physics are written down mathematically, they can all be described by functions that have a remarkable set of simple properties.
So deep neural networks don’t have to approximate any possible mathematical function, only a tiny subset of them.
To put this in perspective, consider the order of a polynomial function, which is the size of its highest exponent. So a quadratic equation like y=x2 has order 2, the equation y=x24 has order 24, and so on.
Obviously, the number of orders is infinite and yet only a tiny subset of polynomials appear in the laws of physics. “For reasons that are still not fully understood, our universe can be accurately described by polynomial Hamiltonians of low order,” say Lin and Tegmark. Typically, the polynomials that describe laws of physics have orders ranging from 2 to 4.
The laws of physics have other important properties.