# 4-1

2019-10-19

## KNN - K近邻算法 - K-Nearest Neighbors

![](http://windmissing.github.io/images_for_gitbook/liu_yu_bo_play_with_machine_learning/17.png)

本质：如果两个样本足够相似，它们有更高的概率属于同一个类别

## 代码实现KNN算法

假设原始训练数据如下：

```python
raw_data_X = [[3.39, 2.33],
              [3.11, 1.78],
              [1.34, 3.36],
              [3.58, 4.67],
              [2.28, 2.86],
              [7.42, 4.69],
              [5.74, 3.53],
              [9.17, 2.51],
              [7.79, 3.42],
              [7.93, 0.79]
             ]
raw_data_y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
```

待求数据如下：

```python
x = np.array([8.09, 3.36])
```

### 数据准备

```python
import numpy as np
import matplotlib.pyplot as plt

X_train = np.array(raw_data_X)
y_train = np.array(raw_data_y)

plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], color = 'g')
plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], color = 'r')
plt.scatter(x[0], x[1], color = 'b')
plt.show()
```

效果：\
![](http://windmissing.github.io/images_for_gitbook/liu_yu_bo_play_with_machine_learning/18.png)

### KNN过程

#### 欧拉距离

假设有a, b两个点，平面中两个点之间的欧拉距离为:

$$
\sqrt {(x^{(a)}-x^{(b)})^2+(y^{(a)}-y^{(b)})^2}
$$

立体中两个点的欧拉距离为:

$$
\sqrt {(x^{(a)}-x^{(b)})^2+(y^{(a)}-y^{(b)})^2+(z^{(a)}-z^{(b)})^2}
$$

任意维度中两个点的欧拉距离为:

$$
\sqrt {(X^{(a)}\_1-X^{(b)}\_1)^2+ (X^{(a)}\_2-X^{(b)}\_2)^2+...+(X^{(a)}\_n-X^{(b)}\_n)^2}
$$

或

$$
\sqrt {\sum^n\_{i=1} (X^{(a)}\_i-X^{(b)}\_i)^2}
$$

其中上标a, b代码第a, b个数据。下标1, 2代码数据的第1, 2个特征

代码如下：

```python
distances = [np.sum((x_train - x) ** 2) for x_train in X_train]
nearest = np.argsort(distances)
topK_y = [y_train[i] for i in nearest[:k]]
from collections import Counter
votes = Counter(topK_y)
predict_y = votes.most_common(1)[0][0]
```

运行结果：predict\_y = 1


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://windmising.gitbook.io/liu-yu-bo-play-with-machine-learning/src/chapter4-1/4-1.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
