在SCIKIT中做PCA 逆運算 -- 新舊特征轉換

阿新 • • 發佈：2017-05-08

3.0 arr example self ipc bsp var 組合 print

PCA（Principal Component Analysis）是一種常用的數據分析方法。PCA通過線性變換將原始數據變換為一組各維度線性無關的表示，可用於提取數據的主要特征分量，常用於高維數據的降維。

在Scikit中運用PCA很簡單：

import numpy as np
from sklearn import decomposition
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
y = iris.target

pca = decomposition.PCA(n_components=3)
pca.fit(X)
X  
= pca.transform(X)

以上代碼是將含有4個特征的數據經過PCA壓縮為3個特征。PCA的壓縮由如下特點：

新的3個特征並不是隨便刪除一個特征後留下的，而是4個特征的線性組合。
新的3個特征保留了原有4個特征的絕大部分特征，換句話說就是略有損失。

那麽PCA的損失到底是什麽? 新特征能否轉回舊特征？

這要從PCA過程說起，我把過程縮減如下，畢竟本文重點不是說PCA過程：

PCA過程

1.均值化矩陣X

2.通過一系列矩陣運算得出特征矩陣P

3.矩陣運算 Y = P * X

Y 即為原始數據降維後的結果，也就是說，得到矩陣P後，我們還可以通過Y=P * X這個算式，反推回X：

Y = P * X ==> P(-1) * Y = P(-1) * P * X, P（-1）是P的逆矩陣, 即 P(-1) * P = 1

==> P(-1) * Y = X

需要註意的是，程序一開始就已經將原始數據均值化，所以實際上， P(-1)*Y的結果需要去均值化才是原來的樣子

在Scikit中，pca.components_就是P的逆矩陣. 從源代碼就可以看出（行號33）

 1    def transform(self, X, y=None):
 2         """Apply dimensionality reduction to X.
 
 3 
 4         X is projected on the first principal components previously extracted
 5         from a training set.
 6 
 7         Parameters
 8         ----------
 9         X : array-like, shape (n_samples, n_features)
10             New data, where n_samples is the number of samples
11             and n_features is the number of features.
12 
13         Returns
14         -------
15         X_new : array-like, shape (n_samples, n_components)
16 
17         Examples
18         --------
19 
20         >>> import numpy as np
21         >>> from sklearn.decomposition import IncrementalPCA
22         >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
23         >>> ipca = IncrementalPCA(n_components=2, batch_size=3)
24         >>> ipca.fit(X)
25         IncrementalPCA(batch_size=3, copy=True, n_components=2, whiten=False)
26         >>> ipca.transform(X) # doctest: +SKIP
27         """
28         check_is_fitted(self, [‘mean_‘, ‘components_‘], all_or_any=all)
29         print self.mean_
30         X = check_array(X)
31         if self.mean_ is not None:
32             X = X - self.mean_
33         X_transformed = fast_dot(X, self.components_.T)
34         if self.whiten:
35             X_transformed /= np.sqrt(self.explained_variance_)
36         return X_transformed

回到開頭的壓縮代碼增加一些輸出語句：

iris = datasets.load_iris()
X = iris.data
y = iris.target

print X[0]
pca = decomposition.PCA(n_components=3)
pca.fit(X)
X = pca.transform(X)

a = np.matrix(X)
b = np.matrix(pca.components_)
c = a * b
mean_of_data = np.matrix([5.84333333, 3.054,       3.75866667,  1.19866667])

print c[0]
print c[0] + mean_of_data

程序打印出原始數據中的第一行，然後將降維後的數據乘上特征矩陣的逆矩陣，加上均值還原回原來的4特征。

輸出如下：

1 [ 5.1  3.5  1.4  0.2]
2 
3 [[-0.74365254  0.44632609 -2.35818399 -0.99942241]]
4 
5 [[ 5.09968079  3.50032609  1.40048268  0.19924426]]

由此可看，經還原後的特征值（行號5）和原來（行號1）相比是略有損失的。

如果維度不降，我們可以再看看結果

pca = decomposition.PCA(n_components=4)
pca.fit(X)
X = pca.transform(X)

a = np.matrix(X)
b = np.matrix(pca.components_)
c = a * b
mean_of_data = np.matrix([5.84333333, 3.054,       3.75866667,  1.19866667])

print c[0]
print c[0] + mean_of_data

完美還原：

1 [ 5.1  3.5  1.4  0.2]
2 
3 [[-0.74333333  0.446      -2.35866667 -0.99866667]]
4 
5 [[ 5.1  3.5  1.4  0.2]]

在SCIKIT中做PCA 逆運算 -- 新舊特征轉換

3.0 arr example self ipc bsp var 組合 print PCA（Principal Component Analysis）是一種常用的數據分析方法。PCA通過線性變換將原始數據變換為一組各維度線性無關的表示，可用於提取數據的主要特征分量，常用於高

在SCIKIT中做PCA 逆運算 -- 新舊特征轉換

在SCIKIT中做PCA 逆運算 -- 新舊特征轉換

[轉載]Scikit-learn介紹幾種常用的特征選擇方法

穩壓電源中的諧振變頻器的特征

誰動了我的特征？——sklearn特征轉換行為全記錄

機器學習特征表達——日期與時間特征做離散處理（數字到分類的映射），稀疏類分組（相似特征歸檔），創建虛擬變量（提取新特征）本質就是要麽多變少，或少變多

數組中的對象的特征值提取生成新對象實現方法

pickle在新舊版本python中的問題

Java 9 中的 9 個新特性

創建一個對象都在內存中做了什麽事情

二階線性差分方程中的根/特征值的討論

人臉識別中的harr特征提取（轉）

httpclient新舊版本分割點4.3

django靜態html中做動態變化

Unity中做放大鏡效果

scikit-learn：4.2. Feature extraction（特征提取，不是特征選擇）

scikit-learn：4. 數據集預處理（clean數據、reduce降維、expand增維、generate特征提取）

Matlab中特征向量間距離矩陣的並行mex程序

特征降維-PCA的數學原理

在EF中做數據索引

語音信號中的特征提取

在SCIKIT中做PCA 逆運算 -- 新舊特征轉換

相關推薦