Skip to content

Commit

Permalink
Merge pull request #98 from YueZhengMeng/master
Browse files Browse the repository at this point in the history
加入了练习3.4.1第2问第2小问 softmax分布方差与二阶导数的匹配 的公式推导
  • Loading branch information
KMnO4-zx authored Jun 26, 2024
2 parents a638f73 + f982aed commit 9dfc91c
Show file tree
Hide file tree
Showing 6 changed files with 7,091 additions and 39 deletions.
29 changes: 26 additions & 3 deletions docs/ch03/ch03.md
Original file line number Diff line number Diff line change
Expand Up @@ -691,12 +691,35 @@ $$
$$
\begin{aligned}
\mathrm{Var}_{\mathrm{softmax}(\mathbf{o})}
&= \sum_{j=1}^q (\mathrm{softmax}(\mathbf{o})_j - E[\mathrm{softmax}(\mathbf{o})_j])^2 \\
&= \sum_{j=1}^q (\mathrm{softmax}(\mathbf{o})_j - \frac{1}{q}\sum_{k=1}^q \mathrm{softmax}(\mathbf{o})_k)^2 \\
&= \sum_{j=1}^q (\mathrm{softmax}(\mathbf{o})_j - \frac{1}{q})^2 \\
&= \frac{1}{q} \sum_{j=1}^q (\mathrm{softmax}(\mathbf{o})_j - E[\mathrm{softmax}(\mathbf{o})_j])^2 \\
&= \frac{1}{q} \sum_{j=1}^q (\mathrm{softmax}(\mathbf{o})_j - \frac{1}{q}\sum_{k=1}^q \mathrm{softmax}(\mathbf{o})_k)^2 \\
&= \frac{1}{q} \sum_{j=1}^q (\mathrm{softmax}(\mathbf{o})_j - \frac{1}{q})^2 \\
\end{aligned}
$$

  展开为:
$$
\begin{aligned}
\mathrm{Var}_{\mathrm{softmax}(\mathbf{o})}
&= \frac{1}{q} \sum_{j=1}^q (\mathrm{softmax}(\mathbf{o})_j - \frac{1}{q})^2 \\
&=\frac{1}{q}\left[(\mathrm{softmax}(\boldsymbol{o})_1-\frac{1}{q})^2+(\mathrm{softmax}(\boldsymbol{o})_2-\frac{1}{q})^2+\dots+(\mathrm{softmax}(\boldsymbol{o})_q-\frac{1}{q})^2\right]\\
&=\frac{1}{q}(\sum^q_{j=1}\mathrm{softmax}^2(\boldsymbol{o})_j-\frac{2}{q}\sum^q_{j=1}\mathrm{softmax}(\boldsymbol{o})_j+\sum_{j=1}^q \frac{1}{q^2})\\
&=\frac{1}{q}(\sum^q_{j=1}\mathrm{softmax}^2(\boldsymbol{o})_j -\frac{2}{q} +\frac{1}{q})\\
&=\frac{1}{q}\sum^q_{j=1}\mathrm{softmax}^2(\boldsymbol{o})_j -\frac{1}{q^2}
\end{aligned}
$$

  与二阶导数匹配为:
$$
\begin{aligned}
\mathrm{V\ ar}(o)&=\frac{1}{q}\sum^q_{j=1}\mathrm{softmax}^2(\boldsymbol{o})_j -\frac{1}{q^2}\\
&=-\frac{1}{q}(1-\sum^q_{j=1}\mathrm{softmax}^2(\boldsymbol{o})_j)+\frac{1}{q} -\frac{1}{q^2}\\
&=-\frac{1}{q}(\sum^q_{j=1}\mathrm{softmax}(\boldsymbol{o})_j-\sum^q_{j=1}\mathrm{softmax}^2(\boldsymbol{o})_j)+\frac{1}{q} -\frac{1}{q^2}\\
&=-\frac{1}{q}\sum^q_{j=1}(\mathrm{softmax}(\boldsymbol{o})_j-\mathrm{softmax}^2(\boldsymbol{o})_j)+\frac{1}{q} -\frac{1}{q^2}\\
&=-\frac{1}{q}\sum^q_{j=1}\partial_{o_j}^2 l(\mathbf{y}, \hat{\mathbf{y}}) +\frac{q-1}{q^2}\\
\end{aligned}
$$

### 练习3.4.2

假设我们有三个类发生的概率相等,即概率向量是$\displaystyle (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})$。
Expand Down
6 changes: 3 additions & 3 deletions docs/ch04/ch04.md
Original file line number Diff line number Diff line change
Expand Up @@ -1605,12 +1605,12 @@ $$
    根据链式子法则,后面两个式子的结果为:
$$
\frac{\partial J}{\partial \mathbf{b}^{(1)}}
= \text{prod}\left(\frac{\partial J}{\partial \mathbf{h}}, \frac{\partial \mathbf{h}}{\partial \mathbf{b}^{(2)}}\right)
= \frac{\partial J}{\partial \mathbf{h}}.
= \text{prod}\left(\frac{\partial J}{\partial \mathbf{z}}, \frac{\partial \mathbf{z}}{\partial \mathbf{b}^{(1)}}\right)
= \frac{\partial J}{\partial \mathbf{z}}.
$$

$$
\frac{\partial J}{\partial \mathbf{b}^{(1)}}
\frac{\partial J}{\partial \mathbf{b}^{(2)}}
= \text{prod}\left(\frac{\partial J}{\partial \mathbf{o}}, \frac{\partial \mathbf{o}}{\partial \mathbf{b}^{(2)}}\right)
= \frac{\partial J}{\partial \mathbf{o}}.
$$
Expand Down
120 changes: 119 additions & 1 deletion docs/ch08/ch08.md
Original file line number Diff line number Diff line change
Expand Up @@ -699,7 +699,125 @@ d2l.plot([zipf_one, zipf_two, zip_three],

![svg](output_69_0.svg)

**另解:**
TODO:由于该解答的新增图片会打乱原有的图片顺序,因此此处只给出代码
对于线性方程:
$$\log n_i = -\alpha \log i + c $$
我们可以采用最小二乘法来估计$\alpha$和$c$的值

```python
from d2l import torch as d2l
tokens = d2l.tokenize(d2l.read_time_machine())
# 因为每个文本行不一定是一个句子或一个段落,因此我们把所有文本行拼接到一起
corpus = [token for line in tokens for token in line]
vocab = d2l.Vocab(corpus)

freqs = [freq for token, freq in vocab.token_freqs]

bigram_tokens = [pair for pair in zip(corpus[:-1], corpus[1:])]
bigram_vocab = d2l.Vocab(bigram_tokens)
trigram_tokens = [triple for triple in zip(corpus[:-2], corpus[1:-1], corpus[2:])]
trigram_vocab = d2l.Vocab(trigram_tokens)
bigram_freqs = [freq for token, freq in bigram_vocab.token_freqs]
trigram_freqs = [freq for token, freq in trigram_vocab.token_freqs]
```

```python
import numpy as np

def estimate_coefficients(x, y):
# 计算x和y的均值
mean_x = np.mean(x)
mean_y = np.mean(y)

# 计算分子和分母
numerator = np.sum((x - mean_x) * (y - mean_y))
denominator = np.sum((x - mean_x)**2)

# 计算斜率a
a = numerator / denominator

# 计算截距c
c = mean_y - a * mean_x

return a, c

def compute_zipf(freqs):
freqs = np.array(freqs)
index = np.array(range(1, len(freqs) + 1))
a, c = estimate_coefficients(np.log(index), np.log(freqs))
return a, c
```

```python
# 一元语法的齐普夫定律指数
# 注意这里的alpha相当于公式里的负alpha
zipf_one_alpha, zipf_one_const = compute_zipf(freqs)
zipf_one_alpha, zipf_one_const
```

```python
# 验证一元语法的齐普夫定律指数
_freqs = np.exp(zipf_one_alpha * np.log(np.array(range(1, len(freqs) + 1))) + zipf_one_const)
d2l.plot([freqs,_freqs], xlabel='token: x',
ylabel='frequency: n(x)', xscale='log', yscale='log', legend=['real', 'fit'])
```

```python
# 二元语法的齐普夫定律指数
zipf_two_alpha, zipf_two_const = compute_zipf(bigram_freqs)
zipf_two_alpha, zipf_two_const
```

```python
# 验证二元语法的齐普夫定律指数
_bigram_freqs = np.exp(zipf_two_alpha * np.log(np.array(range(1, len(bigram_freqs) + 1))) + zipf_two_const)
d2l.plot([bigram_freqs,_bigram_freqs], xlabel='token: x',
ylabel='frequency: n(x)', xscale='log', yscale='log', legend=['real', 'fit'])
```

```python
# 三元语法的齐普夫定律指数
zipf_three_alpha, zipf_three_const = compute_zipf(trigram_freqs)
zipf_three_alpha, zipf_three_const
```

```python
# 验证三元语法的齐普夫定律指数
_trigram_freqs = np.exp(zipf_three_alpha * np.log(np.array(range(1, len(trigram_freqs) + 1))) + zipf_three_const)
d2l.plot([trigram_freqs,_trigram_freqs], xlabel='token: x',
ylabel='frequency: n(x)', xscale='log', yscale='log', legend=['real', 'fit'])
```
在二元语法和三元语法的情况下计算出的参数拟合效果不好,是因为低频词太多造成的影响
在二元语法的情况下仅保留前一半的高频词,在三元语法的情况下仅保留前四分之一的高频词,重新计算齐普夫定律指数,拟合效果较好

```python
# 去掉后一半的低频词,重新计算二元语法的齐普夫定律指数
fit_count = len(bigram_freqs) // 2
zipf_two_alpha, zipf_two_const = compute_zipf(bigram_freqs[:fit_count])
zipf_two_alpha, zipf_two_const
```

```python
# 验证重新计算的二元语法的齐普夫定律指数
_bigram_freqs = np.exp(zipf_two_alpha * np.log(np.array(range(1, len(bigram_freqs) + 1))) + zipf_two_const)
d2l.plot([bigram_freqs,_bigram_freqs], xlabel='token: x',
ylabel='frequency: n(x)', xscale='log', yscale='log', legend=['real', 'fit'])
```

```python
# 去掉后四分之三的低频词,重新计算三元语法的齐普夫定律指数
fit_count = len(trigram_freqs) // 4
zipf_three_alpha, zipf_three_const = compute_zipf(trigram_freqs[:fit_count])
zipf_three_alpha, zipf_three_const
```

```python
# 验证重新计算的三元语法的齐普夫定律指数
_trigram_freqs = np.exp(zipf_three_alpha * np.log(np.array(range(1, len(trigram_freqs) + 1))) + zipf_three_const)
d2l.plot([trigram_freqs,_trigram_freqs], xlabel='token: x',
ylabel='frequency: n(x)', xscale='log', yscale='log', legend=['real', 'fit'])
```

### 练习 8.3.4

Expand Down Expand Up @@ -2362,7 +2480,7 @@ M^k \cdot x
&= \lambda_i^k \cdot \sum_{i=1}^n \alpha_i v_i \\
&= \sum_{i=1}^n \lambda_i^k \alpha_i v_i
\end{align}$$
  又因为$M$的特征值$\lambda_i$满足$|\lambda_i| \geq |\lambda_{i+1}|$,因此$lambda_1^k >> lambda_i$,即$\lambda_1^k$的权重最大。
  又因为$M$的特征值$\lambda_i$满足$|\lambda_i| \geq |\lambda_{i+1}|$,因此$\lambda_1^k >> \alpha_i$,即$\lambda_1^k$的权重最大。
  因此$M^k \cdot x \approx \lambda_1^k \alpha_1 v_1$,即存在较高概率与特征向量$v_1$在一条直线上。

3.上述结果对于循环神经网络中的梯度意味着什么?
Expand Down
43 changes: 40 additions & 3 deletions notebooks/ch03/ch03.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2331,13 +2331,50 @@
"$$ \n",
"\\begin{aligned} \n",
"\\mathrm{Var}_{\\mathrm{softmax}(\\mathbf{o})} \n",
"&= \\sum_{j=1}^q (\\mathrm{softmax}(\\mathbf{o})_j - E[\\mathrm{softmax}(\\mathbf{o})_j])^2 \\\\ \n",
"&= \\sum_{j=1}^q (\\mathrm{softmax}(\\mathbf{o})_j - \\frac{1}{q}\\sum_{k=1}^q \\mathrm{softmax}(\\mathbf{o})_k)^2 \\\\ \n",
"&= \\sum_{j=1}^q (\\mathrm{softmax}(\\mathbf{o})_j - \\frac{1}{q})^2 \\\\\n",
"&= \\frac{1}{q} \\sum_{j=1}^q (\\mathrm{softmax}(\\mathbf{o})_j - E[\\mathrm{softmax}(\\mathbf{o})_j])^2 \\\\ \n",
"&= \\frac{1}{q} \\sum_{j=1}^q (\\mathrm{softmax}(\\mathbf{o})_j - \\frac{1}{q}\\sum_{k=1}^q \\mathrm{softmax}(\\mathbf{o})_k)^2 \\\\ \n",
"&= \\frac{1}{q} \\sum_{j=1}^q (\\mathrm{softmax}(\\mathbf{o})_j - \\frac{1}{q})^2 \\\\\n",
"\\end{aligned} \n",
"$$"
]
},
{
"cell_type": "markdown",
"source": [
"  展开为: \n",
"$$\n",
"\\begin{aligned}\n",
"\\mathrm{Var}_{\\mathrm{softmax}(\\mathbf{o})} \n",
"&= \\frac{1}{q} \\sum_{j=1}^q (\\mathrm{softmax}(\\mathbf{o})_j - \\frac{1}{q})^2 \\\\\n",
"&=\\frac{1}{q}\\left[(\\mathrm{softmax}(\\boldsymbol{o})_1-\\frac{1}{q})^2+(\\mathrm{softmax}(\\boldsymbol{o})_2-\\frac{1}{q})^2+\\dots+(\\mathrm{softmax}(\\boldsymbol{o})_q-\\frac{1}{q})^2\\right]\\\\\n",
"&=\\frac{1}{q}(\\sum^q_{j=1}\\mathrm{softmax}^2(\\boldsymbol{o})_j-\\frac{2}{q}\\sum^q_{j=1}\\mathrm{softmax}(\\boldsymbol{o})_j+\\sum_{j=1}^q \\frac{1}{q^2})\\\\\n",
"&=\\frac{1}{q}(\\sum^q_{j=1}\\mathrm{softmax}^2(\\boldsymbol{o})_j -\\frac{2}{q} +\\frac{1}{q})\\\\\n",
"&=\\frac{1}{q}\\sum^q_{j=1}\\mathrm{softmax}^2(\\boldsymbol{o})_j -\\frac{1}{q^2}\n",
"\\end{aligned}\n",
"$$"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"  与二阶导数匹配为: \n",
"$$\n",
"\\begin{aligned}\n",
"\\mathrm{V\\ ar}(o)&=\\frac{1}{q}\\sum^q_{j=1}\\mathrm{softmax}^2(\\boldsymbol{o})_j -\\frac{1}{q^2}\\\\\n",
"&=-\\frac{1}{q}(1-\\sum^q_{j=1}\\mathrm{softmax}^2(\\boldsymbol{o})_j)+\\frac{1}{q} -\\frac{1}{q^2}\\\\\n",
"&=-\\frac{1}{q}(\\sum^q_{j=1}\\mathrm{softmax}(\\boldsymbol{o})_j-\\sum^q_{j=1}\\mathrm{softmax}^2(\\boldsymbol{o})_j)+\\frac{1}{q} -\\frac{1}{q^2}\\\\\n",
"&=-\\frac{1}{q}\\sum^q_{j=1}(\\mathrm{softmax}(\\boldsymbol{o})_j-\\mathrm{softmax}^2(\\boldsymbol{o})_j)+\\frac{1}{q} -\\frac{1}{q^2}\\\\\n",
"&=-\\frac{1}{q}\\sum^q_{j=1}\\partial_{o_j}^2 l(\\mathbf{y}, \\hat{\\mathbf{y}}) +\\frac{q-1}{q^2}\\\\\n",
"\\end{aligned}\n",
"$$"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
11 changes: 4 additions & 7 deletions notebooks/ch04/ch04.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,7 @@
{
"cell_type": "markdown",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
"collapsed": true
},
"source": [
"# 第4章 多层感知机"
Expand Down Expand Up @@ -44529,12 +44526,12 @@
"    根据链式子法则,后面两个式子的结果为:\n",
"$$\n",
"\\frac{\\partial J}{\\partial \\mathbf{b}^{(1)}}\n",
"= \\text{prod}\\left(\\frac{\\partial J}{\\partial \\mathbf{h}}, \\frac{\\partial \\mathbf{h}}{\\partial \\mathbf{b}^{(2)}}\\right) \n",
"= \\frac{\\partial J}{\\partial \\mathbf{h}}.\n",
"= \\text{prod}\\left(\\frac{\\partial J}{\\partial \\mathbf{z}}, \\frac{\\partial \\mathbf{z}}{\\partial \\mathbf{b}^{(1)}}\\right) \n",
"= \\frac{\\partial J}{\\partial \\mathbf{z}}.\n",
"$$\n",
"\n",
"$$\n",
"\\frac{\\partial J}{\\partial \\mathbf{b}^{(1)}}\n",
"\\frac{\\partial J}{\\partial \\mathbf{b}^{(2)}}\n",
"= \\text{prod}\\left(\\frac{\\partial J}{\\partial \\mathbf{o}}, \\frac{\\partial \\mathbf{o}}{\\partial \\mathbf{b}^{(2)}}\\right) \n",
"= \\frac{\\partial J}{\\partial \\mathbf{o}}.\n",
"$$\n"
Expand Down
Loading

0 comments on commit 9dfc91c

Please sign in to comment.