1. 程式人生 > >機器學習知識點查漏補缺(隨機森林和extraTrees)

機器學習知識點查漏補缺(隨機森林和extraTrees)

efault 生成 xtra lac use sample strategy default lin

隨機森林 對數據樣本及特征隨機抽取,進行多個決策樹訓練,防止過擬合,提高泛化能力 一般隨機森林的特點: 1、有放回抽樣(所以生成每棵樹的時候,實際數據集會有重復), 2、以最優劃分分裂 Given a standard training set D of size n, bagging generates m new training sets D_i, each of size n′, by sampling from D uniformly and with replacement. This kind of sample is known as a bootstrap sample. The m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification). ExtraTrees算法多一層隨機性,在對連續變量特征選取最優分裂值時,不會計算所有分裂值的效果,來選擇分裂特征。 而是對每一個特征,在它的特征取值範圍內,隨機生成一個split value,再計算看選取哪一個特征來進行分裂。 1、Empirical good default values are max_features=n_features for regression problems, and max_features=sqrt(n_features) for classification tasks (where n_features is the number of features in the data). 2、In addition, note that in random forests, bootstrap samples are used by default (bootstrap=True) while the default strategy for extra-trees is to use the whole dataset (bootstrap=False).

機器學習知識點查漏補缺(隨機森林和extraTrees)