In particular, for each pixel in the input image, we encoded the
pixel's intensity as the value for a corresponding neuron in the
input layer. For the pixel images we've been using, this
means our network has ( ) input neurons. We then
trained the network's weights and biases so that the network's
output would - we hope! - correctly identify the input image:
'0', '1', '2', ..., '8', or '9'.
Our earlier networks work pretty well: we've obtained a
classification accuracy better than 98 percent, using training
and test data from the MNIST handwritten digit data set. But
upon reflection, it's strange to use networks with fully-
connected layers to classify images. The reason is that such a
network architecture does not take into account the spatial
structure of the images. For instance, it treats input pixels
which are far apart and close together on exactly the same
footing. Such concepts of spatial structure must instead be
inferred from the training data. But what if, instead of starting
with a network architecture which is tabula rasa, we used an
architecture which tries to take advantage of the spatial
structure? In this section I describe convolutional neural
networks*
*The origins of convolutional neural networks go back to the 1970s. But the
seminal paper establishing the modern subject of convolutional networks was
a 1998 paper, "Gradient-based learning applied to document recognition", by
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. LeCun has
since made an interesting remark on the terminology for convolutional nets:
"The [biological] neural inspiration in models like convolutional nets is very
tenuous. That's why I call them 'convolutional nets' not 'convolutional neural
nets', and why we call the nodes 'units' and not 'neurons' ". Despite this
remark, convolutional nets use many of the same ideas as the neural networks
we've studied up to now: ideas such as backpropagation, gradient descent,
regularization, non-linear activation functions, and so on. And so we will
follow common practice, and consider them a type of neural network. I will use
the terms "convolutional neural network" and "convolutional net(work)"
interchangeably. I will also use the terms "[artificial] neuron" and "unit"
interchangeably.
. These networks use a special architecture which is particularly
well-adapted to classify images. Using this architecture makes
convolutional networks fast to train. This, in turns, helps us
train deep, many-layer networks, which are very good at
classifying images. Today, deep convolutional networks or
some close variant are used in most neural networks for image
recognition.
Convolutional neural networks use three basic ideas: local
receptive fields, shared weights, and pooling. Let's look at each