Pessimistic error pruning illustration with C4.5-python implemention

阿新 • • 發佈：2018-11-12

------------------get the datasets-----------------------------------
We use the following datasets:
https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data

Target:
predict the age of abalone:
counts of ring+1.5

Of course we know we can use regression methods to reach the above target.
But to test our PEP pruning algorithm,we decided to use C4.5 to predict its ring(the final column of the above datasets).

There are totally 28 values of “ring”,they are
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,29

I have made sure no data item belongs to “28”,which is lacked in the above line.

Then,
1st step: reorder the above datasets according rings
2nd step:we select the former 200 datasets as our final train sets.

------------------get the model-----------------------------------
Use the simplified Tree(after EBP pruning) gotten from after running
http://www.rulequest.com/Personal/c4.5r8.tar.gz

running instruction is on:
https://github.com/appleyuchi/Decision_Tree_Prune

then transform the following model:

unpruned Decision Tree C4.5－Ｒｅｌｅａｓｅ8

to:

{‘Viscera’: {’>0.0145’: {‘Shell’: {’<=0.0345’: {‘Viscera’: {’<=0.0285’: ’ 5 (50.0/9.0)’, ‘>0.0285’: ’ 4 (3.0)’}}, ‘>0.0345’: {‘Sex’: {’=M’: ’ 6 (6.0/3.0)’, ‘=F’: ’ 5 (3.0)’, ‘=I’: ’ 5 (59.0/12.0)’}}}}, ‘<=0.0145’: {‘Shucked’: {’>0.007’: ’ 4 (66.0/31.0)’, ‘<=0.007’: {‘Shucked’: {’>0.0045’: {‘Shucked’: {’>0.005’: {‘Height’: {’<=0.02’: ’ 4 (2.0)’, ‘>0.02’: ’ 3 (4.0)’}}, ‘<=0.005’: ’ 4 (3.0)’}}, ‘<=0.0045’: {‘Height’: {’<=0.025’: ’ 1 (2.0/1.0)’, ‘>0.025’: ’ 3 (2.0)’}}}}}}}}

replace ’ with "
and then get the views of above model on link:http://www.bejson.com/


{
    "Viscera": {
        ">0.0145": {
            "Shell": {
                "<=0.0345": {
                    "Viscera": {
                        "<=0.0285": " 5 (50.0/9.0)",
                        ">0.0285": " 4 (3.0)"
                    }
                },
                ">0.0345": {
                    "Sex": {
                        "=M": " 6 (6.0/3.0)",
                        "=F": " 5 (3.0)",
                        "=I": " 5 (59.0/12.0)"
                    }
                }
            }
        },
        "<=0.0145": {
            "Shucked": {
                ">0.007": " 4 (66.0/31.0)",
                "<=0.007": {
                    "Shucked": {
                        ">0.0045": {
                            "Shucked": {
                                ">0.005": {
                                    "Height": {
                                        "<=0.02": " 4 (2.0)",
                                        ">0.02": " 3 (4.0)"
                                    }
                                },
                                "<=0.005": " 4 (3.0)"
                            }
                        },
                        "<=0.0045": {
                            "Height": {
                                "<=0.025": " 1 (2.0/1.0)",
                                ">0.025": " 3 (2.0)"
                            }
                        }
                    }
                }
            }
        }
    }
}

------------------Start to Prune------------------------------------
Now let’s prune it with PEP Algorithm，before pruning,the C4.5 decision tree is:
在這裡插入圖片描述

After being pruned,the C4.5 Tree is:
在這裡插入圖片描述
the sub-trees under orange “X” in the first picture is replaced(pruned) with leaf who has the most items of a same class,
and then,we get second picture.

Here are two models before and after being pruned with PEP:

unpruned_model= {‘Viscera’: {’<=0.0145’: {‘Shucked’: {’>0.007’: ’ 4 (66.0/31.0)’, ‘<=0.007’: {‘Shucked’: {’<=0.0045’: {‘Height’: {’<=0.025’: ’ 1 (2.0/1.0)’, ‘>0.025’: ’ 3 (2.0)’}}, ‘>0.0045’: {‘Shucked’: {’>0.005’: {‘Height’: {’<=0.02’: ’ 4 (2.0)’, ‘>0.02’: ’ 3 (4.0)’}}, ‘<=0.005’: ’ 4 (3.0)’}}}}}}, ‘>0.0145’: {‘Shell’: {’<=0.0345’: {‘Viscera’: {’<=0.0285’: ’ 5 (50.0/9.0)’, ‘>0.0285’: ’ 4 (3.0)’}}, ‘>0.0345’: {‘Sex’: {’=M’: ’ 6 (6.0/3.0)’, ‘=F’: ’ 5 (3.0)’, ‘=I’: ’ 5 (59.0/12.0)’}}}}}}

pruned_model= {‘Viscera’: {’>0.0145’: ‘5(121/28)’, ‘<=0.0145’: {‘Shucked’: {’>0.007’: ’ 4 (66.0/31.0)’, ‘<=0.007’: {‘Shucked’: {’>0.0045’: {‘Shucked’: {’>0.005’: {‘Height’: {’<=0.02’: ’ 4 (2.0)’, ‘>0.02’: ’ 3 (4.0)’}}, ‘<=0.005’: ’ 4 (3.0)’}}, ‘<=0.0045’: ‘3(4/2)’}}}}}}

unpruned_accuracy,pruned_accuracy=(0.72, 0.695)

compared with EBP(Error Based Pruning) with the same 200 items of abalone
(use Quinlan’s implemention http://www.rulequest.com/Personal/c4.5r8.tar.gz),
we get:
Evaluation on training data (200 items):

 Before Pruning                      After Pruning
----------------                    ---------------------------
Size      Errors     Size      Errors   Estimate

  20   56(28.0%)     17        57(28.5%)    (36.1%)   <<

Attention please that the EBP(Error based Pruning) and PEP(Pessimistic Error Pruning) are targeted at to simplify C4.5 trees when the accuracy do not lose too much,instead of improving accuracy only.
Because simplified tree is much easier for user to extract classification rules(knowledge) from huge datasets.

The python-implemention of PEP is available at
https://github.com/appleyuchi/Decision_Tree_Prune

Note that EBP is an evolution of PEP,both of which are invented by Ross Quinlan.

You may also want to learn Principles of PEP with examples in details:
https://blog.csdn.net/appleyuchi/article/details/83902998
https://blog.csdn.net/appleyuchi/article/details/83795521

Pessimistic error pruning illustration with C4.5-python implemention

Pessimistic error pruning illustration with C4.5-python implemention

Pessimistic Error Pruning example of C4.5

MEP(minimum error pruning) principle with python implemention

ID3的REP（Reduced Error Pruning）剪枝程式碼詳細解釋+周志華《機器學習》決策樹圖4.5、圖4.6、圖4.7繪製

ERROR 3009 (HY000): Column count of mysql.user is wrong. Expected 45, found 43. Created with MySQL 5

決策樹（ID3 C4,5 減枝 CART演算法）以及Python實現

決策樹ID3;C4.5詳解和python實現與R語言實現比較

【用python實現《統計學習方法》】之決策樹C4.5/ID3

配置(9) 解決"setuptools pip wheel failed with error code 1" 用anaconda的python建立virtual environments

C4.5決策樹演算法（Python實現）

InnoDB: Error: io_setup() failed with EAGAIN after 5 attempts#Olivia丶長歌#

機器學習筆記（6）——C4.5決策樹中的剪枝處理和Python實現

SSM使用高版本的spring（我用的5.0.8），報錯Error creating bean with name 'requestMappingHandlerAdapter' defined in

機器學習筆記（5）——C4.5決策樹中的連續值處理和Python實現

python機器學習案例系列教程——決策樹（ID3、C4.5、CART）

python：類5——Python 的類的下劃線命名有什麽不同？

【Visual Studio】“rc.exe”已退出，代碼為 5 ("rc.exe" exited with code 5.)

Error creating bean with name

分類算法：決策樹（C4.5）(轉)

5.Python是怎麽解釋的？

Pessimistic error pruning illustration with C4.5-python implemention

相關推薦