The explanation for universality we've discussed is certainly
not a practical prescription for how to compute using neural
networks! In this, it's much like proofs of universality for NAND
gates and the like. For this reason, I've focused mostly on
trying to make the construction clear and easy to follow, and
not on optimizing the details of the construction. However, you
may find it a fun and instructive exercise to see if you can
improve the construction.
Although the result isn't directly useful in constructing
networks, it's important because it takes off the table the
question of whether any particular function is computable
using a neural network. The answer to that question is always
"yes". So the right question to ask is not whether any particular
function is computable, but rather what's a good way to
compute the function.
The universality construction we've developed uses just two
hidden layers to compute an arbitrary function. Furthermore,
as we've discussed, it's possible to get the same result with just
a single hidden layer. Given this, you might wonder why we
would ever be interested in deep networks, i.e., networks with
many hidden layers. Can't we simply replace those networks
with shallow, single hidden layer networks?
Chapter acknowledgments: Thanks to Jen Dodd and Chris Olah for many
discussions about universality in neural networks. My thanks, in particular, to
Chris for suggesting the use of a lookup table to prove universality. The
interactive visual form of the chapter is inspired by the work of people such as
Mike Bostock, Amit Patel, Bret Victor, and Steven Wittens.
While in principle that's possible, there are good practical
reasons to use deep networks. As argued in Chapter 1, deep
networks have a hierarchical structure which makes them
particularly well adapted to learn the hierarchies of knowledge
that seem to be useful in solving real-world problems. Put more
concretely, when attacking problems such as image
recognition, it helps to use a system that understands not just
individual pixels, but also increasingly more complex concepts:
from edges to simple geometric shapes, all the way up through
complex, multi-object scenes. In later chapters, we'll see
evidence suggesting that deep networks do a better job than
shallow networks at learning such hierarchies of knowledge. To
sum up: universality tells us that neural networks can compute
any function; and empirical evidence suggests that deep