Notes—Random Forest-feature importance隨機森林對特徵排序

阿新 • • 發佈：2019-01-06

two methods:
1.Mean decrease impurity
大概是對於每顆樹，按照impurity（gini /entropy /information gain）給特徵排序，然後整個森林取平均

2.Mean decrease accuracy
大概就是measure一下對每個特徵加躁，看對結果的準確率的影響。影響小說明這個特徵不重要，反之重要
具體步驟如下：
在隨機森林中某個特徵X的重要性的計算方法如下：
1：對於隨機森林中的每一顆決策樹,使用相應的OOB(袋外資料)資料來計算它的袋外資料誤差,記為errOOB1.
2: 隨機地對袋外資料OOB所有樣本的特徵X加入噪聲干擾(就可以隨機的改變樣本在特徵X處的值),再次計算它的袋外資料誤差,記為errOOB2.
3：假設隨機森林中有Ntree棵樹,那麼對於特徵X的重要性=∑(errOOB2-errOOB1)/Ntree,之所以可以用這個表示式來作為相應特徵的重要性的度量值是因為：若給某個特徵隨機加入噪聲之後,袋外的準確率大幅度降低,則說明這個特徵對於樣本的分類結果影響很大,也就是說它的重要程度比較高。
ref：

https://www.cnblogs.com/justcxtoworld/p/3447231.html
http://www.cnblogs.com/justcxtoworld/p/3434266.html

1.Mean decrease impurity
sklearn中實現如下：

from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
import numpy as np
#Load boston housing dataset as an example
boston = load_boston()
X = boston["data" 
]
print type(X),X.shape
Y = boston["target"]
names = boston["feature_names"]
print names
rf = RandomForestRegressor()
rf.fit(X, Y)
print "Features sorted by their score:"
print sorted(zip(map(lambda x: round(x, 4), rf.feature_importances_), names), reverse=True)

結果如下：

Features sorted by their score:
[(0.5104 
, 'RM'), (0.2837, 'LSTAT'), (0.0812, 'DIS'), (0.0303, 'CRIM'), (0.0294, 'NOX'), (0.0176, 'PTRATIO'), (0.0134, 'AGE'), (0.0115, 'B'), (0.0089, 'TAX'), (0.0077, 'INDUS'), (0.0051, 'RAD'), (0.0006, 'ZN'), (0.0004, 'CHAS')]

2.Mean decrease accuracy
sklearn中並沒有……具體實現見連結

3.Spark中的feature importance使用
ml.classification.RandomForestClassificationModel中有featureImportance可以直接呼叫
定義如下：
lazy val featureImportances:Vector
呼叫：
Model.featureImportances.toArray.mkString(“,”)

Notes—Random Forest-feature importance隨機森林對特徵排序

Notes—Random Forest-feature importance隨機森林對特徵排序

利用隨機森林對特徵重要性進行評估

利用隨機森林進行特徵選擇

【Machine Learning】使用隨機森林進行特徵選擇

隨機森林進行特徵選取

【機器學習】隨機森林 Random Forest 得到模型後，評估參數重要性

隨機森林（Random Forest）--- 轉載

3. 集成學習（Ensemble Learning）隨機森林（Random Forest）

3. 整合學習（Ensemble Learning）隨機森林（Random Forest）

Julia機器學習實戰——使用Random Forest隨機森林進行字元影象識別

1.3.1 Julia機器學習實戰——使用Random Forest隨機森林進行字元影象識別

[Machine Learning & Algorithm] 隨機森林（Random Forest）

隨機森林(Random Forest)通俗教程

隨機森林（Random forest,RF）的生成方法以及優缺點

隨機森林迴歸（Random Forest）演算法原理及Spark MLlib呼叫例項（Scala/Java/python）

spark mllib原始碼分析之隨機森林(Random Forest)（二）

隨機森林（Random Forest）入門與實戰

OpenCV 隨機森林(Random Forest)手勢識別應用---樣本選擇問題

spark mllib原始碼分析之隨機森林(Random Forest)（三）

機器學習：隨機森林（Random Forest）

Notes—Random Forest-feature importance隨機森林對特徵排序

相關推薦