Pessimistic Error Pruning example of C4.5

阿新 • • 發佈：2018-11-12

This example is from 《An Empirical Comparison of Pruning Methods
for Decision Tree Induction》
在這裡插入圖片描述
How to read these node and leaves?
For example:
node 30:
15 are classified as “class1”
2 are mis-classified as “class1”
you can reduce the rest nodes or leaves from above

criterion :
$n$

′ ( T t ) + S E ( n

′ ( T t ) ) < n ′

( t ) ① n'(T_t)+SE(n'(Tt))<n'(t)①

n^{'} (T_{t}) + S E (n^{'} (T t)) < n^{'} (t) ①

where

SE(n&#x27;(Tt))=\sqrt{\frac{n&#x27;(T_t)·（N(t)-n&#x27;(T_t)）}{N(t)}}

Be short :
Errors when unpruned<Errors after pruned

when ① is satisfied ,the current tree remains,
otherwise, it will be pruned.

The principle why above Algorithm always take effect
B(n,p)->N( np,np(1-p) )
在這裡插入圖片描述
Picture Reference
:https://stats.stackexchange.com/questions/213966/why-does-the-continuity-correction-say-the-normal-approximation-to-the-binomia/213995

when in reverse,we set a continuity corretion for binomial distribution:
we use “x+0.5” to make these two curse closer(of course this is not accurate enough)，then you can use theory of Normal distribution with x+0.5
of course 0.5 is not rigorous,here is just approximation

Why the standard error occur in the criterion?
$n'(T_t)+SE(n'(Tt))<n'(t)$
$<=>n'(T_t)+\sqrt{\frac{n'(T_t)·（N(t)-n'(T_t)）}{N(t)}}<n'(t)$
Let’s see an example:
$Y=X_1+X_2+X_3+X_4$
$X_i$ will fluctuate and Y will fluctuate(I mean they are all variables,Not Constant).
then ,when does Y reach maximum?
Now if we have 4 values Y ever have produced.
1,2,1,1 ②
then average Y̅= $\frac{1}{4}(1+2+1+1)=1.25$
Standard Deviation= $\sqrt{\frac{1}{4}\{(1-1.25)^2+(2-1.25)^2+(1-1.25)^2+(1-1.25)^2\}}$ =0.43
so when
Y̅+Standard Deviation=1.25+0.43=1.68≈2.0

Conclusion 1：
All above means that when Y̅+Standard Deviation,we’ll get a value nearest to the maximum in②
------------------------------------------
Let’s come back to Errors we focus just now:
regard Y as the total number of Errors of un-pruned Tree:
Assume（Such Assumption is of course Not rigorous～！）:
Y̅= $n'(T_t)$
$X_i$ :Error number of the $i_{th}$ leaf
Standard Deviation: $SE(n'(Tt))$

just like the conclusion 1:
$n'(T_t)+SE(n'(Tt))$ means that:
we’ll get a value nearest to the maximum number among possible values of “errors of un-pruned tree”.
Attention please that we assume “errors of un-pruned tree” as a variable,Not constant,
which is used to get the " maximum possible error numbers".
The reason why we call it"pessimistic" is just from $SE(n'(Tt))$
this item means:“pessimistic Error counts”

Note:
There’s a complaint from part2.2.5 of《An Empirical Comparison of Pruning Methods for Decision Tree Induction》for PEP that:
"The statistical justification of this method is somewhat dubious"☺
So the principle of PEP is Not rigorous.

After Principle ，Computation comes:
For pruned-tree,Error counts: $n'(t)=15+0.5$
For un-pruned-tree,Error counts:
$n'(T_t)+SE(n'(Tt))$
$n^{'} (T_{t}) = 2 (n o d e 30) + 0 (n o d e 31) + 6 (n o d e 28) + 2 (n o d e 29) + c o n t i n u a t$

Pessimistic Error Pruning example of C4.5

Pessimistic Error Pruning example of C4.5

Pessimistic error pruning illustration with C4.5-python implemention

some understanding of《Improved Use of Continuous Attributes in C4.5》

ID3的REP（Reduced Error Pruning）剪枝程式碼詳細解釋+周志華《機器學習》決策樹圖4.5、圖4.6、圖4.7繪製

Two Examples of Minimum Error Pruning(reprint)

分類算法：決策樹（C4.5）(轉)

MySQL Error: Illegal mix of collations for operation 'concat'

linux 下出現 SHELL syntax error:unexpected end of file 提示錯誤

shell 報錯：syntax error: unexpected end of file

An Example of SignalR

PHP錯誤Parse error: syntax error, unexpected end of file in test.php on line 12解決方法

一個奇怪的問題：Last_Errno: 1264 Error 'Out of range value for column 0x322E36343030

Parse error: syntax error, unexpected end of file

Example of assigning attributes directly to an object name

C++：error C2228: left of '.str' must have class/struct/union

決策樹演算法（ID3，C4.5，CART）

C4.5最新版本Release8與ＭＤＬ的關係的詳細解讀

npm install 報錯：error Unexpected end of JSON input while parsing near '...https://registry.npmj'的解決辦法

機器學習總結（八）決策樹ID3，C4.5演算法，CART演算法

R_針對churn資料用id3、cart、C4.5和C5.0建立決策樹模型進行判斷哪種模型更合適

Pessimistic Error Pruning example of C4.5

相關推薦