1. 程式人生 > >基於R的資料探勘方法與實踐(3)——決策樹分析

基於R的資料探勘方法與實踐(3)——決策樹分析

決策樹構建的目的有兩個——探索與預測。探索方面,參與決策樹聲場的資料為訓練資料,待樹長成後即可探索資料所隱含的資訊。預測方面,可以藉助決策樹推匯出的規則預測未來資料。由於需要考慮未來資料進入該模型的分類表現,因此在基於訓練資料構建決策樹之後,可以用測試資料來衡量該模型的穩健性和分類表現。通過一連串的驗證過程,最後得到最佳的分類規則,用作未來資料的預測。

1決策樹構建理論

決策樹的建立步驟包括資料準備、決策樹生長、決策樹修剪及規則提取。

1.1資料準備

決策樹的分析資料包括兩類變數:一是根據問題所決定的目標變數;二是根據問題背景與環境所選擇的各種屬性變數作為分支變數。分支變數是否容易理解與解釋將決定決策樹分析結果。

(1)二元屬性:其測試條件可以產生兩種結果。

(2)名目屬性:名目屬性結果的多少可以用不同屬性值來表示,如血型可以分為A、B、AB、O四種類別。

(3)順序屬性:可以生成二元或二元以上的分割,其屬性可以是群組,但群組必須符合屬性值順序特徵。如年齡可以分為青年、中年、老年,

(4)連續屬性:連續屬性的條件可以表示成x<a或x>=a的關係。決策樹必須考慮到所有可能的分割點y,再從中選出最好的分割。

取得資料後,將所蒐集的資料分為訓練資料集和測試資料集,資料分割可參照如下方法:

資料分割是將資料分成訓練資料集、測試資料集和驗證資料集。訓練資料集用來建立模型,測試資料集用來評估模型是否過度複雜及其通用性,驗證資料集則用以衡量模型的好壞,例如分類錯誤率、均方誤差。一個好的訓練模式應該對於未知的資料仍有很好的適配度,若當模式複雜程度越來越高,而測試資料的誤差卻越來越大,表示該訓練模型有過度配適的問題。

資料分割的比例有不同的定義,均應代表原來的資料。一種方法是抽取80%的資料作為訓練資料集構建模型,剩下的20%用於模型的效度檢驗。另一種方法是k-fold互動驗證。該方法首先將資料分為k等份,每次抽取k-1份資料進行模式訓練,剩下的1份資料用於測試模型,如此重複k次,使每筆資料都能成為訓練資料集與測試資料集,最後的平均結果則用來代表模型的效度。該方法適合於樣本數較少的情況,可以有效涵蓋整個資料,但缺點是計算時間很長。

在決策樹構建過程中,如果一個決策樹模型僅在訓練資料中有很低的錯誤率,但在測試資料集中有很高的錯誤率,則說明該決策樹模型過度配適,造成模型無法用於估計其他資料。因此建立決策樹模型後,應根據估計測試資料的分類表現,適當地修剪決策樹,增加其分類或預測的爭取性,避免過度配適。

1.2決策樹分支準則

決策樹的分支準則決定了樹的規模大小,包括寬度和深度。常見的分支準則有:資訊增益、Gini係數、卡方統計量、資訊增益比等。

假設訓練資料集有k個類別,分別為C1、C2、……、Ck,屬性A有l中不同的數值,A1、A2、……、Al。

屬性

類別

C1

C2

Ck

總和

A1

x11

x12

x1k

x1.

A2

x21

x22

x2k

x2.

Al

xl1

xl2

xlk

xl.

總和

x.1

x.2

x.k

N

(1)資訊增益

資訊增益是根據不同資訊的似然值或概率衡量不同條件下的資訊量。

若每個類別的資料個數定義為x.j,N為資料集合中所有資料的個數,類別出現的概率為pj = x.j/N,根據資訊理論可以知道,各類別的資訊為-log2(pj),因此各類別C1、C2、……、Ck所帶來的資訊總和Info(D)為:

Info(D)= - (x.1/N)*log2(x.1/N) - (x.2/N)*log2(x.2/N) - … - (x.k/N)*log2(x.k/N)

Info(D)又稱為熵,常用以衡量資料離散程度。當各類別出現的概率相等,則Info(D)=1,表示該分類的資訊複雜程度最高。

假設該資料集D要根據屬性A進行分割,產生共L各資料分割集Di,其中xi.為各屬性值Ai下的分割資料總個數xij為屬性值Ai下且為類別Cj的個數,因此可計算屬性Ai下的資訊Info(Ai):

Info(Ai)= - (xi1/ xi.)*log2(xi1/ xi.) - (xi2/ xi.)*log2(xi2/ xi.) - … - (xik/ xi.)*log2(xik/xi.)

屬性A的資訊則根據各屬性值下資料個數多寡決定:

InfoA(D)= (x1./N)*Info(A1) + (x2./N)*Info(A2) + … + (xl./N)*Info(Al)

資訊增益可以表示為原始資料的總資訊量減去分之後的總資訊量,以表示屬性A作為分支屬性對資訊的貢獻程度。以此類推可以計算出各個屬性作為分支變數能帶來的資訊貢獻度,比較後可找出具有最佳資訊增益的資訊屬性。

(2)Gini係數

Gini係數是衡量資料集合對於所有類別的不純度。

Gini(D)= 1 – sum(j = 1, ….,k, pj^2)

各屬性值Ai下資料集合的不純度Gini(Ai)為:

Gini(Ai)= 1 – (xi1/xi.)^2 – (xi2/xi.)^2 - ……, – (xik/xi.)^2

屬性A的總資料不純度為:

GiniA(D)= (x1./N)*Gini(A1) + (x2./N)*Gini(A2) + … + (xl./N)*Gini(Al)

屬性A對不純度減少的貢獻:

deltaGini(A)= Gini(D) –GiniA(D)

(3)卡方統計量

卡方統計量是用列聯表來計算兩列變數之間的相依程度,當計算出的樣本卡方統計值越大,表示兩變數之間的相依程度越高。

(4)資訊增益比

資訊增益比是考慮候選屬性本身所攜帶的資訊,在將這些資訊轉移至決策樹,經由計算增益與分支屬性的資訊量的比值來找出最適合的分支屬性。

(5)方差縮減

當目標變數為連續時,可採用放假縮減作為分支依據。

1.3 決策樹修剪

決策樹的修剪方式包括事前修剪和事後修剪。事前修剪應用於一開始決策樹的生長過程中,實現設定停止決策樹生長的門檻值,常見的設定門檻如分割的評估值沒達到此門檻值時,就會停止決策樹的生長,例如資訊增益值要大於0.1或是節點中包含足夠的樣本數目。事前修剪的優點在於具有執行效率,但可能會有過度修剪的缺點。事後修剪法雖然效率較低,但對於解決決策樹的過度配適問題相當具有正面效益。

1.4 規則提取

完成決策樹的生長及修剪之後,即可利用決策樹提取資料中隱含的資訊。

2、決策樹演算法

演算法

CART

C4.5/C5.0

CHAID

處理資料形態

離散、連續

離散、連續

離散

連續型資料分支方式

只分2支

無限制

無法處理

分支準則

類別型相依變數

Gini分散度指標

資訊增益比

卡方檢驗

連續型相依變數

方差縮減

方差縮減

卡方檢定或F檢定(需先轉化為類別變數)

分支方法

類別型獨立變數

二元分支

多元分支

多元分支

連續型獨立變數

二元分支

二元分支

多元分支(需轉化為類別變數)

修剪方法

成本複雜性修剪

基於錯誤的修剪

3、模型評估

決策樹分類模型可以從兩個方面評估其分類及預測表現:(1)以測試組資料的結果來客觀評估較佳的決策樹模型,例如分類錯誤率;(2)由於分類規則的提取隨著問題而異,因此在客觀評估後,通常均需由該領域專家根據問題背景選出最適合的決策樹模型。

4、決策樹應用

4.1 CART決策樹

載入包和資料集

> library(MASS)
> data("Pima.tr")
> str(Pima.tr)
'data.frame':	200 obs. of  8 variables:
 $ npreg: int  5 7 5 0 0 5 3 1 3 2 ...
 $ glu  : int  86 195 77 165 107 97 83 193 142 128 ...
 $ bp   : int  68 70 82 76 60 76 58 50 80 78 ...
 $ skin : int  28 33 41 43 25 27 31 16 15 37 ...
 $ bmi  : num  30.2 25.1 35.8 47.9 26.4 35.6 34.3 25.9 32.4 43.3 ...
 $ ped  : num  0.364 0.163 0.156 0.259 0.133 ...
 $ age  : int  24 55 35 26 23 52 25 24 63 31 ...
 $ type : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 1 1 2 ...

其中,Pima資料已經被分為兩部分,Pima.tr為訓練集、Pima.te為測試集。

> #首先以不修剪的方法進行決策樹的構建,因而將複雜係數cp設定為0
> cart_tree1 = rpart(type~., Pima.tr, control = rpart.control(cp = 0))
> summary(cart_tree1)
Call:
rpart(formula = type ~ ., data = Pima.tr, control = rpart.control(cp = 0))
  n= 200 

          CP nsplit rel error    xerror       xstd
1 0.22058824      0 1.0000000 1.0000000 0.09851844
2 0.16176471      1 0.7794118 0.9852941 0.09816108
3 0.07352941      2 0.6176471 0.8235294 0.09337946
4 0.05882353      3 0.5441176 0.7941176 0.09233140
5 0.01470588      4 0.4852941 0.6176471 0.08470895
6 0.00000000      7 0.4411765 0.7500000 0.09064718

Node number 1: 200 observations,    complexity param=0.2205882
  predicted class=No   expected loss=0.34  P(node) =1
    class counts:   132    68
   probabilities: 0.660 0.340 
  left son=2 (109 obs) right son=3 (91 obs)
  Primary splits:
      glu   < 123.5  to the left,  improve=19.624700, (0 missing)
      age   < 28.5   to the left,  improve=15.016410, (0 missing)
      npreg < 6.5    to the left,  improve=10.465630, (0 missing)
      bmi   < 27.35  to the left,  improve= 9.727105, (0 missing)
      skin  < 22.5   to the left,  improve= 8.201159, (0 missing)
  Surrogate splits:
      age   < 30.5   to the left,  agree=0.685, adj=0.308, (0 split)
      bp    < 77     to the left,  agree=0.650, adj=0.231, (0 split)
      npreg < 6.5    to the left,  agree=0.640, adj=0.209, (0 split)
      skin  < 32.5   to the left,  agree=0.635, adj=0.198, (0 split)
      bmi   < 30.85  to the left,  agree=0.575, adj=0.066, (0 split)

Node number 2: 109 observations,    complexity param=0.01470588
  predicted class=No   expected loss=0.1376147  P(node) =0.545
    class counts:    94    15
   probabilities: 0.862 0.138 
  left son=4 (74 obs) right son=5 (35 obs)
  Primary splits:
      age   < 28.5   to the left,  improve=3.2182780, (0 missing)
      npreg < 6.5    to the left,  improve=2.4578310, (0 missing)
      bmi   < 33.5   to the left,  improve=1.6403660, (0 missing)
      bp    < 59     to the left,  improve=0.9851960, (0 missing)
      skin  < 24     to the left,  improve=0.8342926, (0 missing)
  Surrogate splits:
      npreg < 4.5    to the left,  agree=0.798, adj=0.371, (0 split)
      bp    < 77     to the left,  agree=0.734, adj=0.171, (0 split)
      skin  < 36.5   to the left,  agree=0.725, adj=0.143, (0 split)
      bmi   < 38.85  to the left,  agree=0.716, adj=0.114, (0 split)
      glu   < 66     to the right, agree=0.688, adj=0.029, (0 split)

Node number 3: 91 observations,    complexity param=0.1617647
  predicted class=Yes  expected loss=0.4175824  P(node) =0.455
    class counts:    38    53
   probabilities: 0.418 0.582 
  left son=6 (35 obs) right son=7 (56 obs)
  Primary splits:
      ped  < 0.3095 to the left,  improve=6.528022, (0 missing)
      bmi  < 28.65  to the left,  improve=6.473260, (0 missing)
      skin < 19.5   to the left,  improve=4.778504, (0 missing)
      glu  < 166    to the left,  improve=4.104532, (0 missing)
      age  < 39.5   to the left,  improve=3.607390, (0 missing)
  Surrogate splits:
      glu   < 126.5  to the left,  agree=0.670, adj=0.143, (0 split)
      bp    < 93     to the right, agree=0.659, adj=0.114, (0 split)
      bmi   < 27.45  to the left,  agree=0.659, adj=0.114, (0 split)
      npreg < 9.5    to the right, agree=0.648, adj=0.086, (0 split)
      skin  < 20.5   to the left,  agree=0.637, adj=0.057, (0 split)

Node number 4: 74 observations
  predicted class=No   expected loss=0.05405405  P(node) =0.37
    class counts:    70     4
   probabilities: 0.946 0.054 

Node number 5: 35 observations,    complexity param=0.01470588
  predicted class=No   expected loss=0.3142857  P(node) =0.175
    class counts:    24    11
   probabilities: 0.686 0.314 
  left son=10 (9 obs) right son=11 (26 obs)
  Primary splits:
      glu  < 90     to the left,  improve=2.3934070, (0 missing)
      bmi  < 33.4   to the left,  improve=1.3714290, (0 missing)
      bp   < 68     to the right, improve=0.9657143, (0 missing)
      ped  < 0.334  to the left,  improve=0.9475564, (0 missing)
      skin < 39.5   to the right, improve=0.7958592, (0 missing)
  Surrogate splits:
      ped < 0.1795 to the left,  agree=0.8, adj=0.222, (0 split)

Node number 6: 35 observations,    complexity param=0.05882353
  predicted class=No   expected loss=0.3428571  P(node) =0.175
    class counts:    23    12
   probabilities: 0.657 0.343 
  left son=12 (27 obs) right son=13 (8 obs)
  Primary splits:
      glu   < 166    to the left,  improve=3.438095, (0 missing)
      ped   < 0.2545 to the right, improve=1.651429, (0 missing)
      skin  < 25.5   to the left,  improve=1.651429, (0 missing)
      npreg < 3.5    to the left,  improve=1.078618, (0 missing)
      bp    < 73     to the right, improve=1.078618, (0 missing)
  Surrogate splits:
      bp < 94.5   to the left,  agree=0.8, adj=0.125, (0 split)

Node number 7: 56 observations,    complexity param=0.07352941
  predicted class=Yes  expected loss=0.2678571  P(node) =0.28
    class counts:    15    41
   probabilities: 0.268 0.732 
  left son=14 (11 obs) right son=15 (45 obs)
  Primary splits:
      bmi   < 28.65  to the left,  improve=5.778427, (0 missing)
      age   < 39.5   to the left,  improve=3.259524, (0 missing)
      npreg < 6.5    to the left,  improve=2.133215, (0 missing)
      ped   < 0.8295 to the left,  improve=1.746894, (0 missing)
      skin  < 22     to the left,  improve=1.474490, (0 missing)
  Surrogate splits:
      skin < 19.5   to the left,  agree=0.839, adj=0.182, (0 split)

Node number 10: 9 observations
  predicted class=No   expected loss=0  P(node) =0.045
    class counts:     9     0
   probabilities: 1.000 0.000 

Node number 11: 26 observations,    complexity param=0.01470588
  predicted class=No   expected loss=0.4230769  P(node) =0.13
    class counts:    15    11
   probabilities: 0.577 0.423 
  left son=22 (19 obs) right son=23 (7 obs)
  Primary splits:
      bp    < 68     to the right, improve=1.6246390, (0 missing)
      bmi   < 33.4   to the left,  improve=1.6173080, (0 missing)
      npreg < 6.5    to the left,  improve=0.9423077, (0 missing)
      skin  < 39.5   to the right, improve=0.6923077, (0 missing)
      ped   < 0.334  to the left,  improve=0.4923077, (0 missing)
  Surrogate splits:
      glu < 94.5   to the right, agree=0.808, adj=0.286, (0 split)
      ped < 0.2105 to the right, agree=0.808, adj=0.286, (0 split)

Node number 12: 27 observations
  predicted class=No   expected loss=0.2222222  P(node) =0.135
    class counts:    21     6
   probabilities: 0.778 0.222 

Node number 13: 8 observations
  predicted class=Yes  expected loss=0.25  P(node) =0.04
    class counts:     2     6
   probabilities: 0.250 0.750 

Node number 14: 11 observations
  predicted class=No   expected loss=0.2727273  P(node) =0.055
    class counts:     8     3
   probabilities: 0.727 0.273 

Node number 15: 45 observations
  predicted class=Yes  expected loss=0.1555556  P(node) =0.225
    class counts:     7    38
   probabilities: 0.156 0.844 

Node number 22: 19 observations
  predicted class=No   expected loss=0.3157895  P(node) =0.095
    class counts:    13     6
   probabilities: 0.684 0.316 

Node number 23: 7 observations
  predicted class=Yes  expected loss=0.2857143  P(node) =0.035
    class counts:     2     5
   probabilities: 0.286 0.714 
> par(xpd = TRUE); plot(cart_tree1); text(cart_tree1)


> #對測試集進行預測分析,並得到預測精度
> pre_cart_tree1 = predict(cart_tree1, Pima.te, type = "class")
> matrix1 = table(Type = Pima.te$type, predict = pre_cart_tree1)
> matrix1
     predict
Type   No Yes
  No  223   0
  Yes 109   0
> accuracy_tree1 = sum(diag(matrix1))/sum(matrix1)
> accuracy_tree1
[1] 0.6716867
> #對建成的決策樹模型進行剪枝,將cp設為0.03
> cart_tree2 = prune(cart_tree1, cp = 0.03)
> par(xpd = TRUE); plot(cart_tree2); text(cart_tree2)

> #基於剪枝後的模型對測試集進行預測分析,並得到預測精度
> pre_cart_tree2 = predict(cart_tree2, Pima.te, type = "class")
> matrix2 = table(Type = Pima.te$type, predict = pre_cart_tree2)
> matrix2
     predict
Type   No Yes
  No  223   0
  Yes 109   0
> accuracy_tree2 = sum(diag(matrix2))/sum(matrix2)
> accuracy_tree2
[1] 0.6716867
> #對建成的決策樹模型進行進一步剪枝,將cp設為0.1
> cart_tree3 = prune(cart_tree2, cp = 0.1)
> par(xpd = TRUE); plot(cart_tree3); text(cart_tree3)

> #基於剪枝後的模型對測試集進行預測分析,並得到預測精度
> pre_cart_tree3 = predict(cart_tree3, Pima.te, type = "class")
> matrix3 = table(Type = Pima.te$type, predict = pre_cart_tree3)
> matrix3
     predict
Type   No Yes
  No  223   0
  Yes 109   0
> accuracy_tree3 = sum(diag(matrix3))/sum(matrix3)
> accuracy_tree3
[1] 0.6716867

顯然當cp為0.03時可以獲得較高的準確率,而cp設為0.1時,模型二道了極大的簡化且準確率基本並未過多損失。

4.2 C5.0決策樹

> #C5.0決策樹分析
> library(C50)
> library(MASS)
> data("Pima.tr")
> str(Pima.tr)
'data.frame':	200 obs. of  8 variables:
 $ npreg: int  5 7 5 0 0 5 3 1 3 2 ...
 $ glu  : int  86 195 77 165 107 97 83 193 142 128 ...
 $ bp   : int  68 70 82 76 60 76 58 50 80 78 ...
 $ skin : int  28 33 41 43 25 27 31 16 15 37 ...
 $ bmi  : num  30.2 25.1 35.8 47.9 26.4 35.6 34.3 25.9 32.4 43.3 ...
 $ ped  : num  0.364 0.163 0.156 0.259 0.133 ...
 $ age  : int  24 55 35 26 23 52 25 24 63 31 ...
 $ type : Factor w/ 2 levels "No","Yes": 1 2 1 1 1 2 1 1 1 2 ...
> C50_tree2 = C5.0(type~., Pima.tr, control=C5.0Control(noGlobalPruning = TRUE)) #不對樹進行剪枝
> summary(C50_tree2)

Call:
C5.0.formula(formula = type ~ ., data = Pima.tr, control = C5.0Control(noGlobalPruning = TRUE))


C5.0 [Release 2.07 GPL Edition]  	Sat Sep 16 12:12:54 2017
-------------------------------

Class specified by attribute `outcome'

Read 200 cases (8 attributes) from undefined.data

Decision tree:

glu <= 123: No (109/15)
glu > 123:
:...bmi > 28.6:
    :...ped <= 0.344: No (29/12)
    :   ped > 0.344: Yes (41/5)
    bmi <= 28.6:
    :...age <= 32: No (11)
        age > 32:
        :...bp > 80: No (3)
            bp <= 80:
            :...ped <= 0.162: No (2)
                ped > 0.162: Yes (5)


Evaluation on training data (200 cases):

	    Decision Tree   
	  ----------------  
	  Size      Errors  

	     7   32(16.0%)   <<


	   (a)   (b)    <-classified as
	  ----  ----
	   127     5    (a): class No
	    27    41    (b): class Yes


	Attribute usage:

	100.00%	glu
	 45.50%	bmi
	 38.50%	ped
	 10.50%	age
	  5.00%	bp


Time: 0.0 secs
> plot(C50_tree2)


> C50_tree3 = C5.0(type~., Pima.tr, control=C5.0Control(noGlobalPruning = FALSE)) #對樹進行剪枝
> summary(C50_tree3)

Call:
C5.0.formula(formula = type ~ ., data = Pima.tr, control = C5.0Control(noGlobalPruning = FALSE))


C5.0 [Release 2.07 GPL Edition]  	Sat Sep 16 12:14:14 2017
-------------------------------

Class specified by attribute `outcome'

Read 200 cases (8 attributes) from undefined.data

Decision tree:

glu <= 123: No (109/15)
glu > 123:
:...bmi <= 28.6: No (21/5)
    bmi > 28.6:
    :...ped <= 0.344: No (29/12)
        ped > 0.344: Yes (41/5)


Evaluation on training data (200 cases):

	    Decision Tree   
	  ----------------  
	  Size      Errors  

	     4   37(18.5%)   <<


	   (a)   (b)    <-classified as
	  ----  ----
	   127     5    (a): class No
	    32    36    (b): class Yes


	Attribute usage:

	100.00%	glu
	 45.50%	bmi
	 35.00%	ped


Time: 0.0 secs
> plot(C50_tree3)


> pre_C50_Cla2 = predict(C50_tree2, Pima.te, type = "class")
> matrix2 = table(Type = Pima.te$type, predict = pre_C50_Cla2)
> matrix2
     predict
Type   No Yes
  No  193  30
  Yes  58  51
> accuracy_tree2 = sum(diag(matrix2))/sum(matrix2)
> accuracy_tree2
[1] 0.7349398
> pre_C50_Cla3 = predict(C50_tree3, Pima.te, type = "class")
> matrix3 = table(Type = Pima.te$type, predict = pre_C50_Cla3)
> matrix3
     predict
Type   No Yes
  No  195  28
  Yes  60  49
> accuracy_tree3 = sum(diag(matrix3))/sum(matrix3)
> accuracy_tree3
[1] 0.7349398

我們發現修剪和不修剪對模型正確率沒有影響,但修剪之後的模型顯然更容易解釋。

4.3 CHAID決策樹

> #CHAID決策樹分析
> #CHAID決策樹只能對離散型屬性進行處理,因此需要將資料中的連續型資料都轉化為離散型,不用考慮時候修剪的問題。
> install.packages("CHAID")#如果找不到,則可以從https://r-forge.r-project.org/R/?group_id=343下載後安裝
> library(CHAID)
> #載入訓練和測試資料集
> data("Pima.tr")
> data("Pima.te")
> #將資料集合並
> Pima = rbind(Pima.tr, Pima.te)
> #對資料進行離散化處理,並輸出離散化的屬性
> level_name = {}
> for(i in 1:7)
+ {
+   Pima[,i] = cut(Pima[,i], breaks = 3, ordered_result = TRUE, include.lowest = TRUE)
+   level_name <- rbind(level_name, levels(Pima[,i]))
+ }
> level_name = data.frame(level_name)
> row.names(level_name) = colnames(Pima)[1:7]
> colnames(level_name) = paste("L",1:3,sep="")
> level_name
                  L1           L2          L3
npreg  [-0.017,5.67]  (5.67,11.3]   (11.3,17]
glu       [55.9,104]    (104,151]   (151,199]
bp       [23.9,52.7]  (52.7,81.3]  (81.3,110]
skin     [6.91,37.7]  (37.7,68.3] (68.3,99.1]
bmi      [18.2,34.5]  (34.5,50.8] (50.8,67.1]
ped   [0.0827,0.863] (0.863,1.64] (1.64,2.42]
age        [20.9,41]      (41,61]   (61,81.1]
> #以前200個數據為訓練集,剩下的332個數據為測試集
> Pima.tr = Pima[1:200,]
> Pima.te = Pima[201:nrow(Pima),]
> CHAID_tree = chaid(type~., Pima.tr)
> CHAID_tree

Model formula:
type ~ npreg + glu + bp + skin + bmi + ped + age

Fitted party:
[1] root
|   [2] glu in [55.9,104]
|   |   [3] age in [20.9,41]: No (n = 50, err = 6.0%)
|   |   [4] age in (41,61], (61,81.1]: No (n = 10, err = 40.0%)
|   [5] glu in (104,151]
|   |   [6] age in [20.9,41]: No (n = 86, err = 27.9%)
|   |   [7] age in (41,61], (61,81.1]: Yes (n = 15, err = 26.7%)
|   [8] glu in (151,199]: Yes (n = 39, err = 33.3%)

Number of inner nodes:    3
Number of terminal nodes: 5
> plot(CHAID_tree)



> #對測試集分別進行預測分析,並得到預測精度
> pre_CHAID_tree = predict(CHAID_tree, Pima.te)
> matrix = table(Type = Pima.te$type, predict = pre_CHAID_tree)
> matrix
     predict
Type   No Yes
  No  199  24
  Yes  47  62
> accuracy_tree = sum(diag(matrix))/sum(matrix)
> accuracy_tree
[1] 0.7861446