# 6.5.4 全连接MLP中的反向传播计算

|          |                  |                                            |
| -------- | ---------------- | ------------------------------------------ |
| 损失函数     | $L(\hat y, y)$   |                                            |
| 总代价      | J                | $J= L(\hat y, y) + \lambda \Omega(\theta)$ |
| 正则项      | $\Omega(\theta)$ | $\theta$包含所有参数（权重和偏置                       |
| 模型的权重矩阵  | $W^l$            |                                            |
| 模型的偏置参数  | $b^l$            |                                            |
| 程序的输出    | x                |                                            |
| 目标输出     | y                |                                            |
| 实际输出     | $\hat y$         |                                            |
| 网络的深度    | L                |                                            |
| 某一层的输出   | $h^l$            | <p>同时也是下一层的输入。<br>书上用的是h，我有时候会写成a</p>      |
| 某一层的加权输入 | $z^l$            | 书上用的是a，我更喜欢用z                              |

为了阐明反向传播的上述定义，让我们考虑一个与全连接的多层MLP相关联的特定图。

> **\[success]**\
> 这一节是反向传播算法在MLP上的具体应用。\
> 只是主要流程。具体的计算过程涉及到一些数学基础。

算法6.3首先给出了前向传播，它将参数映射到与单个训练样本（输入，目标）$(x,y)$相关联的监督损失函数$L(\hat{y}, y)$，其中$\hat{y}$是当$x$提供输入时神经网络的输出。

> **\[success]**\
> ![](https://3570579331-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M19WlNgQPCBzRz321ZJ%2F-M49IP9Ah_VjwrS9Pj_Q%2F-M49IQgP0OKf4o4laxaj%2F9.png?generation=1586089545422648\&alt=media)\
> 算法6.3通过正向传播计算每一层的unit的加权输入z和激活输出h。
>
> ```python
> h_0 = x
> for l = 1,...,L do:
>     z_k = b_k + W_k.dot(h_(k-1))
>     h_k = f(z_k)
> end for
> y_hat = h_L
> J = L(y_hat, y) + lambda * omega(theta)
> ```

算法6.4随后说明了将反向传播应用于该图所需的相关计算。

> **\[success]**\
> ![](https://3570579331-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-M19WlNgQPCBzRz321ZJ%2F-M49IP9Ah_VjwrS9Pj_Q%2F-M49IQ88K9GITOa-LoQc%2F10.png?generation=1586089544512282\&alt=media)\
> 算法6.4通过反向传播计算每一层的unit的h的偏导、z的偏导、w的偏导、b的偏导。\
> \&#xNAN;*书上的g有两个用处，为了区分，我把它的两个用处分别用gh和gz* gh为损失函数L(\hat y, y)对输出h^l的偏导。\
> gz为损失函数L(\hat y, y)对加权输入z^l的偏导。\
> w的偏导为总代价J对权重矩阵W^l的偏导。\
> b的偏导为总代价J对偏置参数|b^l的偏导。\
> **根据定义计算第L层的情况**\
> 第L层的特殊在于$h^L = \hat y$
>
> $$
> \begin{aligned}
> gh^L & = & \frac{\partial L(\hat y, y)}{\partial h^L}
> \= \nabla\_{\hat y}L(\hat y, y)\\
> gz^L & = & \frac{\partial L(\hat y, y)}{\partial z^L}
> \= \frac{\partial L(\hat y, y)}{\partial h^L}\frac{\partial h^L}{\partial z^L}
> \= gh^L \bigodot f'(z^L) \\
> \nabla\_{W^L}J & = & \frac{\partial L(\hat y, y)}{\partial W^L} +  \frac{\partial \lambda \Omega(\theta)}{\partial W^L}
> \= \frac{\partial L(\hat y, y)}{\partial z^L} \frac{\partial z^L}{\partial W^L}+ \frac{\partial \lambda \Omega(\theta)}{\partial W^L} \\
> & = & gz^L (x^L)^T + \lambda \nabla\_{W^L}\Omega(\theta)
> \= gz^L(h^{L-1})^T + \lambda \nabla\_{W^L}\Omega(\theta) \\
> \nabla\_{b^L}J & = & \frac{\partial L(\hat y, y)}{\partial b^L} +  \frac{\partial \lambda \Omega(\theta)}{\partial b^L} = \frac{\partial L(\hat y, y)}{\partial z^L} \frac{\partial z^L}{\partial b^L}+ \frac{\partial \lambda \Omega(\theta)}{\partial b^L} \\
> & = & gz^L + \lambda \nabla\_{b^L}\Omega(\theta)
> \end{aligned}
> $$
>
> **根据定义计算第l层的情况** `for l = L-1, ..., 1 do:`
>
> $$
> \begin{aligned}
> gh^l & = & \frac{\partial L(\hat y, y)}{\partial h^l}
> \= \sum\_i \frac{\partial L(\hat y, y)}{\partial z^{l+1}\_i}\frac{\partial z^{l+1}*i}{\partial h^l}
> \= \sum\_i \frac{\partial L(\hat y, y)}{\partial z^{l+1}*i}\frac{\partial z^{l+1}*i}{\partial x^{l+1}} \\
> & = &(W^{l+1})^Tgz^{l+1} \\
> gz^l & = &
> \frac{\partial L(\hat y, y)}{\partial z^l}
> \= \frac{\partial L(\hat y, y)}{\partial h^{l}*i}\frac{\partial h^{l}}{\partial z^l} = gh^l \bigodot f'(z^l) \\
> \nabla*{W^l}J & = & gz^l(h^{l-1})^T + \lambda \nabla*{W^l}\Omega(\theta) \\
> \nabla*{b^l}J & = & gz^l + \lambda \nabla*{b^l}\Omega(\theta)
> \end{aligned}
> $$
>
> 公式中的f'(z^l)是主要的计算量。\
> 计算出来的是一个矩阵，称为[Jacobian矩阵](https://windmising.gitbook.io/mathematics-basic-for-ml/xian-xing-dai-shu/special_matrix)。

算法6.3和算法6.4是简单而直观的演示。 然而，它们专门针对特定的问题。

现在的软件实现基于之后第6.5.6节中描述的一般形式的反向传播，它可以通过显式地操作表示符号计算的数据结构，来适应任何计算图。


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://windmising.gitbook.io/bible-deeplearning/0introduction/0introduction-2/4fullyconnectedmlp.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
