1. 程式人生 > >U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》

U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》

when calculating
U C F U_{CF} (e,N)
CF: Confidence Level(here is 25%)
e:misclassifying counts of current subtree we focus on
N:counts of sub-datasets relevant to current subtree who is under judgment whether to be pruned or not.

“Pr” is short for “Pessimistic error rate”.

U C F U_{CF} (e,N)=Pr= e

+ 0.5 + C o e f f
2
2
+ C o e f f 2 { ( e + 0.5 [ 1 e + 0.5 N ] + C o e f f 2 4 } N + C o e f f 2 \frac{e+0.5+ \frac{Coeff^2}{2}+ \sqrt{ Coeff^2·\{ (e+0.5)[1- \frac{e+0.5}{N} ]+ \frac{Coeff^2}{4} \} } } {N+Coeff^2}①

The method to get   C o e f f \ Coeff is as follows:

C o e f f D e v i a t i o n [ i 1 ] D e v i a t i o n [ i ] D e v i a t i o n [ i 1 ] = C o n f i d e n c e   L e v e l v a l [ i 1 ] v a l [ i ] v a l [ i 1 ] \frac{Coeff-Deviation[i-1]}{Deviation[i]-Deviation[i-1]}=\frac{Confidence \ Level-val[i-1]}{val[i]-val[i-1]}②

Now Let’s analyse the meaning of ② and how to get “i” in formula ②.

What is Deviation and Val?
Val[] = { 0, 0.001, 0.005, 0.01, 0.05, 0.10, 0.20, 0.40, 1.00},
Deviation[] = {4.0, 3.09, 2.58, 2.33, 1.65, 1.28, 0.84, 0.25, 0.00};

for Normal Distribution:
P{X>Deviation[i]}=val[i],
for example:
P{X>2.58}=0.005
--------------------

Now we analysis the whole meaning of formula ②

the red line is P ( X > x ) = x + 1 2 π e x 2 2 d x P(X>x)=\int_x^{+∞}\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}dx

the blue line is f ( x ) = 1 2 π e x 2 2 f(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}
and the two points are:
(deviation[i],val[i])=(0.25,0.4)
(deviation[i-1,val[i-1])=(0.84,0.2)]

Let’s amplify the above figure to the following picture:
在這裡插入圖片描述
so tan C= V a l [ i ] V a l [ i 1 ] D e v i a t i o n [ i 1 ] D e v i a t i o n [ i ] = C o n f i d e n c e   L e v e l V a l [ i 1 ] D e v i a t i o n [ i 1 ] C o e f f \frac{Val[i]-Val[i-1]}{Deviation[i-1]-Deviation[i]}=\frac{Confidence\ Level-Val[i-1]}{Deviation[i-1]-Coeff}

then we can get ②and compute C o e f f Coeff and C o e f f 2 Coeff^2

but how to get “i” in above formula?
when V a l [ i 1 ] C o n f i d e n c e   l e v e l V a l [ i ] Val[i-1]<Confidence \ level≤Val[i]
is satisfied,
we get “i”.
when Confidence Level=0.25
i=7
val[i-1]=0.2
val[i]=0.4
deviation[i-1]=0.84
deviation[i]=0.25
substitute the above values in ②
we get Coeff=0.6925
then,
C o e f f 2 = 0.479556 Coeff^2=0.479556
--------------------

**Now Let’s calculate **
U 0.25 ( 1 , 16 ) U_{0.25}(1,16)
CF=0.25
e=1
N=16
C o e f f 2 = 0.479556 Coeff^2=0.479556
We’ll get :
Pr=

1 + 0.5 + 0.479556 2 + 0.479556 { ( 1 + 0.5 [ 1 1 + 0.5 16 ] + 0.479556 4 } 16 + 0.479556 \frac{1+0.5+ \frac{0.479556}{2}+ \sqrt{ 0.479556·\{ (1+0.5)[1- \frac{1+0.5}{16} ]+ \frac{0.479556}{4} \} } } {16+0.479556}