1. 程式人生 > >History of pruning algorithm development and python implemention(Continuous Updating)

History of pruning algorithm development and python implemention(Continuous Updating)

name of tree inventer name of article year
ID3 Ross Quinlan 《Discovering rules by induction from large collections of examples》 1979
ID3 Ross Quinlan Another origin:《Learning efficient classification procedures and their application to chess end games》 1983a
CART Leo Breiman 《Classification and Regression Trees》 1984
C4.5 Ross Quinlan 《C4. 5: Programming for machine learning》 1993
C5.0 Ross Quinlan Commercial Edition of C4.5 ,no relevant papers -

name of post-pruning algorithm name of article or book year inventer the tree pruned Remark
Pessimistic Pruning 《Simplifying Decision Trees》part2.3 1986b(也有1987b的說法,這裡以論文上寫的時間為準) Quinlan C4.5 Ross Quinlan invented “Pessimistic Pruning”,John Mingers rename it as “Pessimistic Error Pruning” in his article《An Empirical Comparison of Pruning Methods for Decision Tree induction》
Reduced Error Pruning 《Simplifying Decision Trees》part2.2 1986b Quinlan C4.5 需要額外的驗證集才能剪枝
Cost-Complexity Pruning 《Classification and Regression Trees》3.3節 1984 L Breiman CART 針對分類樹剪枝
Error-Complexity Pruning 《Classification and Regression Trees》8.5.1節 1984 L Breiman CART 針對迴歸樹剪枝,ECP是在CCP的基礎上發展而來
Critical Value Pruning 《Expert System-Rule Induction with Statistical Data》,還有一說是:《An Empirical Comparison of Pruning Methods for Decision Tree Induction》但是該文作者自己說是引用自1987年的論文 1987a John Mingers 論文中沒有明說哪一種 關於CVP演算法的出處眾說紛紜,這裡的出處是以《An Empirical Comparison of Pruning Methods for Ensemble Classifiers》P212提到的為準
Minimum-Error Pruning 《Learning decision rules in noisy domains》 1986 Niblett and Bratko Can Not be Downloaded from Internet
Error-Based Pruning 《C4.5: Programs for Machine Learning 》4.2節 1993 Quinlan C4.5 EBP is an evolution of PEP

分類樹剪枝目的:
1.犧牲預測精度在可以接受的情況下,簡化決策樹(以便於提取知識);
2.提高驗證集精度(REP)
迴歸樹剪枝目的:減緩、消除過擬合

剪枝程式碼彙總:
------------REP-finished--------
REP原理與具體例項:
https://blog.csdn.net/appleyuchi/article/details/83041047
REP剪枝程式碼實現:
https://github.com/appleyuchi/Decision_Tree_Prune/tree/master/ID3-REP-post_prune-Python-draw
----------------PEP-finished-----------------
PEP剪枝演算法發展歷史、原理和舉例:
https://blog.csdn.net/appleyuchi/article/details/83795521
https://blog.csdn.net/appleyuchi/article/details/83902998
PEP-python-implemention:
https://blog.csdn.net/appleyuchi/article/details/83961060
-------------EBP-finished---------------------
EBP剪枝完整演算法原理、C語言實現與具體例項:
https://blog.csdn.net/appleyuchi/article/details/83863469
EBP剪枝演算法的python實現(其實是基於quinlan的EBP剪枝的python介面):
https://github.com/appleyuchi/Decision_Tree_Prune/tree/master/Quinlan-C4.5-Release8_and_python_interface_for_EBP/
這裡有人會質疑為何不直接採用weka中的J48的python介面?
注意,weka是以quinlan的C語言版本程式碼為準的,在某些資料集中,例如使用hypo資料集,weka的效果是非常糟糕的。
因為決策樹的目的是幫助分類,生成知識,
十分龐大的決策樹顯然是不利於使用的。
----------------------------------

Do we need test sets when pruning?
Attention,here test sets are actually “validation datasets”!

Pruning Algorithm Need extra test datasets? Pruning Style
REP(Reduced Error Pruning) yes bottom-up
CCP(Cost Complexity Pruning)
ECP(Error Complexity Pruning)
CVP(Critical Value Pruning)
MEP(Minimum Error Pruning) no bottom-up
PEP(Pessimistirc Error Pruning) no up-bottom
EBP(Error Based Pruning) no bottom-up

markdown tables generation table
https://tool.lu/tables