集成學習（Random Forest）——實踐

阿新 • • 發佈：2018-03-16

ron 加載 n-2 個數 res span 特征 oob gre

對於集成學習，由於是多個基學習期共同作用結果，因此在做參數調節時候就有基學習器的參數和集成學習的參數兩類

在scikit-learn中，RF的分類類是RandomForestClassifier，回歸類是RandomForestRegressor

官方文檔：http://scikit-learn.org/stable/modules/ensemble.html#ensemble

RandomForestClassifier ： http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier

RandomForestRegressor ：http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor

技術分享圖片

1、RF框架參數

參數說明：

1、n_estimators:隨機森林的基學習器數量

2、oob_score:是否使用袋外數據來評價模型

2、基學習器參數

基學習器由於可選的很多，sklearn裏面的好像默認是決策樹

參數說明：

1、max_features：這個最重要：在做決策樹時候選擇的特征數量。默認是"None",意味著劃分時考慮所有的特征數；如果是"log2"意味著劃分時最多考慮 $l o g_{2} N$

log2N

2、criterion：決策樹劃分的衡量，gini是基尼系數，回歸樹種是mse均方誤差

3、max_depth：決策樹的最大深度

4、min_samples_split:最小劃分的樣本數，如果低於這個樣本數，決策樹不做劃分

實例說明：

技術分享圖片

#導入庫
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import 
 RandomForestRegressor
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV
#加載數據
iris = load_iris()
x = iris.data
y = iris.target
x_train,x_test,y_train,y_test = train_test_split(x,y,random_state = 1)
#使用網格搜索確定要建立的基學習器個數
clf = GridSearchCV(RandomForestClassifier(max_features=‘log2‘),param_grid=({‘n_estimators‘:range(1,101,10)}),cv=10)
clf.fit(x_train,y_train)
print(clf.best_params_)
#再使用網格搜索來確定決策樹的參數
clf2 = GridSearchCV(RandomForestClassifier(n_estimators=11),param_grid=({‘max_depth‘:range(1,10)}))
clf2.fit(x_train,y_train)
print(clf2.best_params_)
#根據最大層數3，最多棵樹11，建立最終的隨機森林來預測
rf = RandomForestClassifier(n_estimators=11,max_depth=3,max_features=‘log2‘)
rf.fit(x_train,y_train)
y_hat = rf.predict(x_test)
print(accuracy_score(y_hat,y_test))

集成學習（Random Forest）——實踐

ron 加載 n-2 個數 res span 特征 oob gre 對於集成學習，由於是多個基學習期共同作用結果，因此在做參數調節時候就有基學習器的參數和集成學習的參數兩類在scikit-learn中，RF的分類類是RandomForestClassifier，回歸類是R

集成學習（Random Forest）——實踐

1、RF框架參數

2、基學習器參數

集成學習（Random Forest）——實踐

3. 集成學習（Ensemble Learning）隨機森林（Random Forest）

2. 集成學習（Ensemble Learning）Bagging

5. 集成學習（Ensemble Learning）GBDT

3. 整合學習（Ensemble Learning）隨機森林（Random Forest）

機器學習：隨機森林（Random Forest）

理解PeopleSoft集成代理（Integration Broker）-第1部分

隨機森林（Random Forest）--- 轉載

[Machine Learning & Algorithm] 隨機森林（Random Forest）

隨機森林迴歸（Random Forest）演算法原理及Spark MLlib呼叫例項（Scala/Java/python）

隨機森林（Random Forest）入門與實戰

集成學習算法總結----Boosting和Bagging（轉）

git學習5 ecipse集成git（轉載）

集成學習實戰——Boosting（GBDT，Adaboost，XGBoost）

機器學習(十三) 集成學習和隨機森林（上）

機器學習(十三) 集成學習和隨機森林（下）

eclipse 集成Maven（轉自:http://blog.csdn.net/wode_dream/article/details/38052639）

軟工實踐學習（第一次）

solr搜索之demo和集成IKAnalyzer（二）

springcloud集成redis（單機模式+哨兵模式）

集成學習（Random Forest）——實踐

1、RF框架參數

2、基學習器參數

相關推薦