translation of the algorithm described above. In particular, the
update_mini_batch method updates the Network's weights and
biases by computing the gradient for the current mini_batch of
training examples:
class Network(object):
...
def update_mini_batch(self, mini_batch, eta):
"""Update the network's weights and biases by applying
gradient descent using backpropagation to a single mini batch.
The "mini_batch" is a list of tuples "(x, y)", and "eta"
is the learning rate."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
for x, y in mini_batch:
delta_nabla_b, delta_nabla_w = self.backprop(x, y)
nabla_b = [nb+dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
nabla_w = [nw+dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
self.weights = [w-(eta/len(mini_batch))*nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b-(eta/len(mini_batch))*nb
for b, nb in zip(self.biases, nabla_b)]
Most of the work is done by the line delta_nabla_b,
delta_nabla_w = self.backprop(x, y) which uses the backprop
method to figure out the partial derivatives and
. The backprop method follows the algorithm in the last
section closely. There is one small change - we use a slightly
different approach to indexing the layers. This change is made
to take advantage of a feature of Python, namely the use of
negative list indices to count backward from the end of a list,
so, e.g., l[-3] is the third last entry in a list l. The code for
backprop is below, together with a few helper functions, which
are used to compute the function, the derivative , and the
derivative of the cost function. With these inclusions you
should be able to understand the code in a self-contained way.
If something's tripping you up, you may find it helpful to
consult the original description (and complete listing) of the
code.
class Network(object):
...
def backprop(self, x, y):
"""Return a tuple "(nabla_b, nabla_w)" representing the
gradient for the cost function C_x. "nabla_b" and
"nabla_w" are layer-by-layer lists of numpy arrays, similar
to "self.biases" and "self.weights"."""
nabla_b = [np.zeros(b.shape) for b in self.biases]
nabla_w = [np.zeros(w.shape) for w in self.weights]
# feedforward
activation = x
activations = [x] # list to store all the activations, layer by layer
zs = [] # list to store all the z vectors, layer by layer
for b, w in zip(self.biases, self.weights):
z = np.dot(w, activation)+b