# 5-4 参数计算向量化

5-3中计算a, b的实现方法性能较低，使用向量化运算能提高性能\
即把以下公式向量化： ![](http://windmissing.github.io/images/2019/48.png)

向量化的依据：

![](http://windmissing.github.io/images/2019/51.png)

## 向量化计算a, b

```python
import numpy as np

class SimpleLinearRegression2:
    def __init__(self):
        """初始化Single Linear Regression模型"""
        self.a_ = None
        self.b_ = None

    def fit(self, x_train, y_train):
        """根据训练数据集X_train, y_train训练Single Linear Regression模型"""
        assert x_train.ndim == 1, "Simple Linear Regressor can only solve single feature training data"
        assert len(x_train) == len(y_train), "the size of x_train must be equal to the size of y_train"

        x_mean = np.mean(x_train)
        y_mean = np.mean(y_train)

        num = (x_train - x_mean).dot(y_train - y_mean)
        d = (x_train - x_mean).dot(x_train - x_mean)

        self.a_ = num / d
        self.b_ = y_mean - self.a_ * x_mean

    def predict(self, x_predict):
        """给定待测数据集X_predict，返回表示x_predict的结果向量"""
        assert x_predict.ndim == 1, "Simple Linear Regressor can only solve single feature training data"
        assert self.a_ is not None and self.b_ is not None, "must fit before predict"
        return [self._predict(x) for x in x_predict]

    def _predict(self, x_single):
        """给定单个待预测数据s_single，返回x_single的预测结果"""
        return self.a_ * x_single + self.b_

    def __repr__(self):
        return "SimpleLinearRegression2()"
```

### 绘制结果

```python
import numpy as np
import matplotlib.pyplot as plt

x = np.array([1., 2., 3., 4., 5.])
y = np.array([1., 3., 2., 3., 5.])

reg2 = SimpleLinearRegression2()
reg2.fit(x, y)

y_hat2 = reg2.predict(x)

plt.scatter(x, y)
plt.plot(x, y_hat2, color='r')
plt.axis([0, 6, 0, 6])
plt.show()
```

输出结果： ![](http://windmissing.github.io/images/2019/50.png)

## 向量化实现的性能测试

```python
m = 1000000
big_x = np.random.random(size = m)
big_y = big_x * 3.0 + 2.0 + np.random.normal(size = m)

%timeit reg1.fit(big_x, big_y)    # reg1见5-4
%timeit reg2.fit(big_x, big_y)
```

输出结果：\
1.15 s ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\
25.2 ms ± 2.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

可见向量化计算能大幅度地提高性能，因此能用向量化计算的地方尽量用向量化计算


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://windmising.gitbook.io/liu-yu-bo-play-with-machine-learning/src/chapter5-5/5-4.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
