diff --git a/docs/chapter4/chapter4.md b/docs/chapter4/chapter4.md index 53e66a9..4ede787 100644 --- a/docs/chapter4/chapter4.md +++ b/docs/chapter4/chapter4.md @@ -246,7 +246,6 @@ $$ 所以可进一步推得式(4.5) $$ - \operatorname{Gini}(D)=\sum_{k=1}^{|\mathcal{Y}|} \sum_{k^{\prime} \neq k} p_k p_{k^{\prime}}=1-\sum_{k=1}^{\mid \mathcal{Y |}} p_k^2 $$ @@ -412,7 +411,7 @@ $$ ### 4.4.1 式(4.7)的解释 -此式所表达的思想很简单,就是以每两个相邻取值的中点作为划分点。下面以"西瓜书"中表4.3中西瓜数据集3.0为例来说明此式的用法。对于"密度"这个连续属性,已观测到的可能取值为$\{0.243,0.245,0.343,\linebreak0.360,0.403,0.437,0.481,0.556,0.593,0.608,0.634,0.639,0.657,0.666,0.697,0.719,0.774\}$共17个值,根据式(4.7)可知,此时$i$依次取1到16,那么"密度"这个属性的候选划分点集合为 +此式所表达的思想很简单,就是以每两个相邻取值的中点作为划分点。下面以"西瓜书"中表4.3中西瓜数据集3.0为例来说明此式的用法。对于"密度"这个连续属性,已观测到的可能取值为$\{0.243,0.245,0.343,0.360,0.403,0.437,0.481,0.556,0.593,0.608,0.634,0.639,0.657,0.666,0.697,0.719,0.774\}$共17个值,根据式(4.7)可知,此时$i$依次取1到16,那么"密度"这个属性的候选划分点集合为 $$ \begin{aligned}