1. 程式人生 > >caret包應用之四:模型預測與檢驗

caret包應用之四:模型預測與檢驗

原文地址:http://xccds.github.io/2011/09/caret_9105.html/

模型建立好後,我們可以利用predict函式進行預測,例如預測檢測樣本的前五個

predict(gbmFit1, newdata = testx)[1:5]
為了比較不同的模型,還可用裝袋決策樹建立第二個模型,命名為gbmFit2
gbmFit2= train(trainx, trainy,method = "treebag",trControl = fitControl)
models = list(gbmFit1, gbmFit2)
另一種得到預測結果的方法是使用extractPrediction函式,得到的部分結果如下顯示


predValues = extractPrediction(models,testX = testx, testY = testy)
head(predValues)

     obs     pred model dataType  object
1 Active Active gbm Training Object1
2 Active Active gbm Training Object1
3 Active Inactive gbm Training Object1
4 Active Active gbm Training Object1
5 Active Active gbm Training Object1 
6 Active Active gbm Training Object1

從中可提取檢驗樣本的預測結果
testValues = subset(predValues, dataType == "Test")
如果要得到預測概率,則使用extractProb函式
probValues = extractProb(models,testX = testx, testY = testy)
testProbs = subset(probValues, dataType == "Test")
對於分類問題的效能檢驗,最重要的是觀察預測結果的混淆矩陣
Pred1 = subset(testValues, model == "gbm")
Pred2 = subset(testValues, model == "treebag")
confusionMatrix(Pred1p
red,Pred1
obs)
confusionMatrix(Pred2pred,Pred2obs)
結果如下,可見第一個模型在準確率要比第二個模型略好一些

          Reference
Prediction Active Inactive
Active 65 12
Inactive 9 45

Accuracy : 0.8397

Reference
Prediction Active Inactive
Active 63 12
Inactive 11 45

Accuracy : 0.8244

最後是利用ROCR包來繪製ROC圖
prob1 = subset(testProbs, model == "gbm")
prob2 = subset(testProbs, model == "treebag")
library(ROCR)
prob1lable=ifelse(prob1obs=='Active',yes=1,0)
pred1 = prediction(prob1Active,prob1lable)
perf1 = performance(pred1, measure="tpr", x.measure="fpr" )
plot( perf1 )