# ID3决策树的生成算法

## ID3算法

在决策树各个结点上应该信息增益准则选择特征，递归地构建决策树

### 输入

训练数据集D\
特征集A\
阈值$$\epsilon$$

### 输出

决策树T

### 过程

1. 若D中所有实例属于同一类$$C\_k$$，则T为单结点树，并将类$$C\_k$$作为该结点的类标记，返回T &#x20;
2. 若$$A=\emptyset$$，则T为单结点，并将D中实例数最大的类$$C\_k$$作为该结点的类标记，返回T &#x20;

   *决策树的深度每增加一层，这一层结点的特征就少一个，到了第n层的结点就没有任凭特征用于分类了。但此时结点的数据的标记可能仍不属于同一类。* &#x20;
3. 否则，按[信息增益的算法](https://windmising.gitbook.io/lihang-tongjixuexifangfa/decisiontree/2)计算A中各个特征对D的信息增益，选择信息特征最大的Ag &#x20;
4. 如果Ag的信息增益小于阈值$$\epsilon$$，则置T为单结点树，并将D中实例数最大的类$$C\_k$$作为该结点的类标记，返回T

   *阈值*$$\epsilon$$*是为了防止过拟合* &#x20;
5. 否则，对Ag的每一个可能的值ai，依Ag=ai将D分割为若干非空子集Di，将Di中实例数最大的类标记，构建子结点，由结点及其子结点构成树T，返回T &#x20;
6. 对第i个子结点，以Di为训练集，以A-{Ag}为特征集，递归地调用步（1）-步（5），得到子树Ti，返回Ti &#x20;

   *A-{Ag}表示两个集合相减* &#x20;

   *训练集Di中不包含特征Ag* &#x20;

## 代码

```python
def multi(y):
    ySet = set(y)
    bestCount = 0
    for yi in ySet:
        count = y.count(yi)
        if count > bestCount:
            bestCount = count
            bestyi = yi
    return bestyi

def ID3(X, y, epsilon):
    # 若D中所有实例属于同一类
    if len(set(y))==1:
        # 将类$$C_k$$作为该结点的类标记
        return y[0]
    # 若$$A=\emptyset$$
    if X.shape[1] == 0:
        # 实例数最大的类$$C_k$$作为该结点的类标记
        return multi(y)
    bestInfo = 0
    # 计算A中各个特征对D的信息增益
    for feature in range(X.shape[1]):
        info = svm(X, y, feature)
        # 选择信息特征最大的Ag
        if svm(X, y, feature) > bestInfo:
            bestInfo = info
            bestfeature = feature
    # 如果Ag的信息增益小于阈值$$\epsilon$$
    if bestInfo < epsilon:
        # 将D中实例数最大的类$$C_k$$作为该结点的类标记
        return multi(y)
    feature = bestfeature
    ret = {'feature':feature}
    # 对Ag的每一个可能的值ai
    a = set(X[:, feature])
    for ai in a:
        yai = y[X[:,feature] == ai]
        Xai = X[X[:,feature] == ai]
        ret[ai] = ID3(Xai, yai, epsilon)
    return ret
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://windmising.gitbook.io/lihang-tongjixuexifangfa/decisiontree/3.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
