# 7-5 高维数据向低维数据映射

定义：X是样本数据，每一行是一个数据，它有m个数据，每个数据有n个特征

$$
X =
\begin{bmatrix}
X\_1^{(1)} && X\_1^{(2)} && \cdots && X\_1^{(n)}   \\
X\_2^{(1)} && X\_2^{(2)} && \cdots && X\_2^{(n)}   \\
\cdots && \cdots && \cdots && \cdots \\
X\_m^{(1)} && X\_m^{(2)} && \cdots && X\_m^{(n)}
\end{bmatrix}
$$

$W\_k$是求得的前k个主成分矩阵，每一行是一个主成分的单位方向，它有k个主成分方向，每个主成分的方向有n个维度

$$
X =
\begin{bmatrix}
W\_k^{(1)} && W\_1^{(2)} && \cdots && W\_1^{(n)}   \\
W\_2^{(1)} && W\_2^{(2)} && \cdots && W\_2^{(n)}   \\
\cdots && \cdots && \cdots && \cdots \\
W\_k^{(1)} && W\_k^{(2)} && \cdots && W\_k^{(n)}
\end{bmatrix}
$$

问：如何将样本X从N维转换成K维？\
答：降维：把所有样本映射到K个主成分上

$$
X \cdot W\_k^T = X\_k
$$

还原：把降维后的数据还原到原坐标空间

$$
X\_k \cdot W\_k = X\_m
$$

还原后的X与原X不同。

## 把PCA封装成类

```python
import numpy as np

class PCA:
    def __init__(self, n_components):
        """初始化PCA"""
        assert n_components >= 1, "n_components must be valid"
        self.n_components = n_components
        self.components_ = None

    def fit(self, X, eta=0.01, n_iters=1e4):
        """获取数据集的前n个主成分"""
        assert self.n_components <= X.shape[1], "n_components must not be greater than the feature number of X"

        def demean(X):
            return X - np.mean(X, axis=0)

        def f(w, X):
            return np.sum((X.dot(w)**2)) / len(X)

        def df(w, X):
            return X.T.dot(X.dot(w)) * 2. / len(X)

        # 把向量单位化
        def direction(w):
            return w / np.linalg.norm(w)

        def first_component(X, initial_w, eta, n_iters=1e4, epsilon=1e-8):
            w = direction(initial_w)
            cur_iter = 0
            while cur_iter < n_iters:
                gradient = df(w, X)
                last_w = w
                w = w + eta * gradient
                w = direction(w)
                if(abs(f(w, X)) - abs(f(last_w, X)) < epsilon):
                   break
                cur_iter += 1
            return w

        X_pca = demean(X)
        self.components_ = np.empty(shape = (self.n_components, X.shape[1]))
        for i in range(self.n_components):
            initial_w = np.random.random(X.shape[1])
            eta = 0.001
            w = first_component(X_pca, initial_w, eta)
            self.components_[i, :] = w
            X_pca = X_pca - X_pca.dot(w).reshape(-1, 1) * w
        return self

    def transform(self, X):
        """将给定的X，映射到各个主成分分量中"""
        assert X.shape[1] == self.components_.shape[1]
        return X.dot(self.components_.T)

    def inverse_transform(self, X):
        """将给定的X反向映射回原来的特征空间"""
        assert X.shape[1] == self.components_.shape[0]
        return X.dot(self.components_)

    def __repr__(self):
        return "PCA(n_components=%d)" % self.n_components
```

## 使用PCA降维

### 准备数据

```python
import numpy as np
import matplotlib.pyplot as plt

X = np.empty((100,2))
X[:,0] = np.random.uniform(0., 100., size=100)
X[:,1] = 0.75 * X[:, 0] + 3. + np.random.normal(0, 10., size=100)
```

### 训练模型1

```python
pca = PCA(n_components=2)
pca.fit(X)
```

输入：`pca.components_`\
输出：array(\[\[ 0.75366776, 0.65725559], \[-0.65723751, 0.75368352]])

### 训练模型2：降维

```python
pca = PCA(n_components=1)
pca.fit(X)

X_reduction = pca.transform(X)
X_restore = pca.inverse_transform(X_reduction)
```

输入：`X_reduction.shape`\
输出：(100, 1)

输入：`X_restore.shape`\
输出：(100, 2)

### 对比原始数据与降维再恢复后的数据

```python
plt.scatter(X[:, 0], X[:, 1], color='b', alpha=0.5)
plt.scatter(X_restore[:, 0], X_restore[:, 1], color='r', alpha=0.5)
plt.show()
```

![](http://windmissing.github.io/images/2019/111.png)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://windmising.gitbook.io/liu-yu-bo-play-with-machine-learning/src/chapter7-5/7-5.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
