说明1:以下公式中的C都是指Cx
等式一:输出层上的error
\begin{eqnarray}
\delta^L_j = \frac{\partial C}{\partial a^L_j} \sigma'(z^L_j).
\tag{BP1}\end{eqnarray}
公式说明:
∂ajL∂C代表当ajL变化时C的变化有多快。
σ′(zjL)代表在zjL激活函数σ变化有多快。
将公式(BP1)写成矩阵形式为:
\begin{eqnarray}
\delta^L = \nabla_a C \odot \sigma'(z^L).
\tag{BP1a}\end{eqnarray}
例如在上一章中使用的二次代价函数Cx=21∥y−aL∥2,其对应的error计算公式为:
\begin{eqnarray}
\delta^L = (a^L-y) \odot \sigma'(z^L).
\tag{30}\end{eqnarray}
等式二:根据下一层计算当前层的error
\begin{eqnarray}
\delta^l = ((w^{l+1})^T \delta^{l+1}) \odot \sigma'(z^l),
\tag{BP2}\end{eqnarray}
we can think intuitively of this as moving the error backward through the network, giving us some sort of measure of the error at the output of the lth layer. We then take the Hadamard product ⊙σ′(zl). This moves the error backward through the activation function in layer l, giving us the error δl in the weighted input to layer l.
[?] 这一段不太懂?
通过公式(BP1)和公式(BP2)的结合,可以计算出每一层每一个神经元的error。
等式三:bias公式
\begin{eqnarray} \frac{\partial C}{\partial b^l_j} =
\delta^l_j.
\tag{BP3}\end{eqnarray}
或
\begin{eqnarray}
\frac{\partial C}{\partial b} = \delta,
\tag{31}\end{eqnarray}
等式四:weights公式
\begin{eqnarray}
\frac{\partial C}{\partial w^l_{jk}} = a^{l-1}_k \delta^l_j.
\tag{BP4}\end{eqnarray}
或
\begin{eqnarray} \frac{\partial
C}{\partial w} = a_{\rm in} \delta_{\rm out},
\tag{32}\end{eqnarray}