【Machine Learning】【Python】選擇最優引數(Decision Tree, Random Forest, Adaboost, GBDT)
阿新 • • 發佈:2019-02-07
之前訓練SVM用了PSO太慢了。
這次比較幸運看到一篇關於調參的部落格。
給了很大啟發。
不具體針對某種分類演算法詳細說了,這個真的需要大量實踐經驗,這也是我欠缺的。
我參考上面部落格做了一些實驗,準確率是在逐步提升,但因為我特徵處理這塊做的不好,準確率提升不明顯。
再一點,我沒用文章中作者使用的調優介面GridSearchCV,我是單純的使用for迴圈來做的。
我的程式碼中只有adaboost的函式,其他方法可以參考我上一篇部落格自己修改。
#!/usr/bin/env python2 # -*- coding: utf-8 -*- """ Created on Tue Jan 23 11:24:32 2018 @author: hans """ from sklearn.externals import joblib from sklearn.ensemble import AdaBoostClassifier from sklearn import tree filename = '_feature_rgb.pkl' train_list = "train_all.txt" test_list = "test_all.txt" def adaboost(n): clf = AdaBoostClassifier(tree.DecisionTreeClassifier(criterion='gini',max_depth=11, min_samples_split=400, \ min_samples_leaf=30,max_features=30,random_state=10), \ algorithm="SAMME", n_estimators=n, learning_rate=0.001,random_state=10) return clf def findBestParam(): X_train = joblib.load(train_list.split('.')[0]+filename) y_train = joblib.load(train_list.split('.')[0]+'_label.pkl') X_test = joblib.load(test_list.split('.')[0]+filename) y_test = joblib.load(test_list.split('.')[0]+'_label.pkl') best_test_score=0 best_train_score=0 best_param=0 for n in range(1500,1501,10): clf = adaboost(n) clf = clf.fit(X_train, y_train) train_score = clf.score(X_train, y_train) test_score = clf.score(X_test, y_test) print ("--------------------------------\nCurrent train score: %.4f" %train_score) print ("Current test score: %.4f" %test_score) print ("Current param: %d" %n) if test_score > best_test_score: best_test_score = test_score best_train_score = train_score best_param = n print ("--------------------------------\nBest train score: %.4f" %best_train_score) print ("Best test score: %.4f" %best_test_score) print ("Best param: %d" %best_param) if __name__ == '__main__': findBestParam()