>>> net = Network([
ConvPoolLayer(image_shape=(mini_batch_size, 1, 28, 28),
filter_shape=(20, 1, 5, 5),
poolsize=(2, 2),
activation_fn=ReLU),
ConvPoolLayer(image_shape=(mini_batch_size, 20, 12, 12),
filter_shape=(40, 20, 5, 5),
poolsize=(2, 2),
activation_fn=ReLU),
FullyConnectedLayer(n_in=40*4*4, n_out=100, activation_fn=ReLU),
FullyConnectedLayer(n_in=100, n_out=100, activation_fn=ReLU),
SoftmaxLayer(n_in=100, n_out=10)], mini_batch_size)
>>> net.SGD(expanded_training_data, 60, mini_batch_size, 0.03,
validation_data, test_data, lmbda=0.1)
Doing this, I obtained a test accuracy of percent. Again,
the expanded net isn't helping so much. Running similar
experiments with fully-connected layers containing and
neurons yields results of and percent. That's
encouraging, but still falls short of a really decisive win.
What's going on here? Is it that the expanded or extra fully-
connected layers really don't help with MNIST? Or might it be
that our network has the capacity to do better, but we're going
about learning the wrong way? For instance, maybe we could
use stronger regularization techniques to reduce the tendency
to overfit. One possibility is the dropout technique introduced
back in Chapter 3. Recall that the basic idea of dropout is to
remove individual activations at random while training the
network. This makes the model more robust to the loss of
individual pieces of evidence, and thus less likely to rely on
particular idiosyncracies of the training data. Let's try applying
dropout to the final fully-connected layers:
>>> net = Network([
ConvPoolLayer(image_shape=(mini_batch_size, 1, 28, 28),
filter_shape=(20, 1, 5, 5),
poolsize=(2, 2),
activation_fn=ReLU),
ConvPoolLayer(image_shape=(mini_batch_size, 20, 12, 12),
filter_shape=(40, 20, 5, 5),
poolsize=(2, 2),
activation_fn=ReLU),
FullyConnectedLayer(
n_in=40*4*4, n_out=1000, activation_fn=ReLU, p_dropout=0.5),
FullyConnectedLayer(
n_in=1000, n_out=1000, activation_fn=ReLU, p_dropout=0.5),
SoftmaxLayer(n_in=1000, n_out=10, p_dropout=0.5)],
mini_batch_size)
>>> net.SGD(expanded_training_data, 40, mini_batch_size, 0.03,
validation_data, test_data)
Using this, we obtain an accuracy of percent, which is a
substantial improvement over our earlier results, especially our