U25%(1,16) and U25%(1,168)on《C4.5:programs for machine learning》
when calculating
(e,N)
CF: Confidence Level(here is 25%)
e:misclassifying counts of current subtree we focus on
N:counts of sub-datasets relevant to current subtree who is under judgment whether to be pruned or not.
“Pr” is short for “Pessimistic error rate”.
(e,N)=Pr=
The method to get is as follows:
Now Let’s analyse the meaning of ② and how to get “i” in formula ②.
What is Deviation and Val?
Val[] = { 0, 0.001, 0.005, 0.01, 0.05, 0.10, 0.20, 0.40, 1.00},
Deviation[] = {4.0, 3.09, 2.58, 2.33, 1.65, 1.28, 0.84, 0.25, 0.00};
for Normal Distribution:
P{X>Deviation[i]}=val[i],
for example:
P{X>2.58}=0.005
--------------------
Now we analysis the whole meaning of formula ②
the red line is
the blue line is
and the two points are:
(deviation[i],val[i])=(0.25,0.4)
(deviation[i-1,val[i-1])=(0.84,0.2)
Let’s amplify the above figure to the following picture:
so tan C=
then we can get ②and compute and
but how to get “i” in above formula?
when
is satisfied,
we get “i”.
when Confidence Level=0.25
i=7
val[i-1]=0.2
val[i]=0.4
deviation[i-1]=0.84
deviation[i]=0.25
substitute the above values in ②
we get Coeff=0.6925
then,
--------------------
**Now Let’s calculate **
CF=0.25
e=1
N=16
We’ll get :
Pr=