1. 程式人生 > >文章翻譯第七章10-12

文章翻譯第七章10-12

rst steps 興趣 eas win 輸入 ast 一個 apply

10 Measuring prediction performance using ROCR以使測量的預測性能

A receiver operating characteristic (ROC) curve is a plot that illustrates the performance of a binary classifier system, and plots the true positive rate against the false positive rate for different cut points. We most commonly use this plot to calculate the area under curve (AUC) to measure the performance of a classification model. In this recipe, we will demonstrate how to illustrate an ROC curve and calculate the AUC to measure the performance of a classification model.受試者工作特征(ROC)曲線是一個圖,示出了二進制分類器系統的性能,並繪制真正的陽性率對不同切割點的假陽性率。我們通常使用這個圖來計算曲線下面積(AUC)來衡量分類模型的性能。在這個食譜中,我們將演示如何說明一個ROC曲線和計算AUC來衡量分類模型的性能。

Getting ready準備

In this recipe, we will continue using the telecom churn dataset as our example dataset.在這個食譜中,我們將繼續使用電信流失數據集作為我們的示例數據集。

How to do it...怎麽做

Perform the following steps to generate two different classification examples with

different costs:執行下列步驟以生成兩個不同的分類示例不同的成本:

1. First, you should install and load the ROCR package:首先,你應該安裝並加載使包

> install.packages("ROCR")

> library(ROCR)

2. Train the svm model using the training dataset with a probability equal to TRUE:訓練SVM模型使用的訓練數據集的概率等於真

> svmfit=svm(churn~ ., data=trainset, prob=TRUE)

3. Make predictions based on the trained model on the testing dataset with the

probability set as TRUE:預測的基礎上受過訓練的模型的測試數據集與概率集為真:

>pred=predict(svmfit,testset[, !names(testset) %in% c("churn")],

probability=TRUE)

4. Obtain the probability of labels with yes:得到標簽的概率是:

> pred.prob = attr(pred, "probabilities")

> pred.to.roc = pred.prob[, 2]

5. Use the prediction function to generate a prediction result:使用預測函數生成預測結果:

> pred.rocr = prediction(pred.to.roc, testset$churn)

6. Use the performance function to obtain the performance measurement:使用性能函數獲得性能測量:

> perf.rocr = performance(pred.rocr, measure = "auc", x.measure =

"cutoff")

> perf.tpr.rocr = performance(pred.rocr, "tpr","fpr")

7. Visualize the ROC curve using the plot function:利用圖函數可視化ROC曲線:

> plot(perf.tpr.rocr, colorize=T,main=paste("AUC:",([email protected]

values)))

Figure 6: The ROC curve for the svm classifier performance支持向量機分類器性能的ROC曲線

How it works...怎麽做

In this recipe, we demonstrated how to generate an ROC curve to illustrate the performance of a binary classifier. First, we should install and load the library, ROCR. Then, we use svm, from the e1071 package, to train a classification model, and then use the model to predict labels for the testing dataset. Next, we use the prediction functio(from the package, ROCR) to generate prediction results. We then adapt the performance function to obtain theperformance measurement of the true positive rate against the false positive rate. Finally, we use the plot function to visualize the ROC plot, and add the value of AUC on the title. In this example, the AUC value is 0.92, which indicates that the svm classifier performs well in classifying telecom user churn datasets.在這個配方中,我們演示了如何生成一個ROC曲線來說明性能的二進制分類器。首先,我們應該安裝和加載庫,ROCR。然後,我們使用支持向量機,從e1071包,訓練分類模型,然後使用模型預測的測試數據集的標簽。接下來,我們使用的預測功能(從包裝,使生成的預測結果)。然後,我們適應的性能函數,得到真正的陽性率對假陽性率的性能測量。最後,我們使用的情節功能可視化的ROC圖,並添加值的AUC的標題。在這個例子中,AUC值為0.92,這表明,SVM分類器進行分類以及電信用戶流失數據集。

See also參見

ff For those interested in the concept and terminology of ROC, you can refer to FF對於那些感興趣的概念和術語的ROC,可以參考

http://en.wikipedia.org/wiki/Receiver_operating_characteristic

11Comparing an ROC curve using the caret package使用插入符號包ROC曲線比較

In previous chapters, we introduced many classification methods; each method has its own advantages and disadvantages. However, when it comes to the problem of how to choose the best fitted model, you need to compare all the performance measures generated from different prediction models. To make the comparison easy, the caret package allows us to generate and compare the performance of models. In this recipe, we will use the function provided by the caret package to compare different algorithm trained models on the same dataset.在前面的章節中,我們介紹了許多分類方法,每種方法都有自己的優點和缺點。然而,當談到如何選擇最佳擬合模型的問題,你需要比較不同的預測模型所產生的所有性能指標。為了使比較容易,插入包允許我們生成和比較模型的性能。在這個食譜中,我們將使用由符號打包提供比較不同算法訓練模型在同一數據庫的功能

Getting ready準備

Here, we will continue to use telecom dataset as our input data source.在這裏,我們將繼續使用電信數據集作為我們的輸入數據源。

How to do it...怎麽做

Perform the following steps to generate an ROC curve of each fitted model:執行下列步驟來生成每個擬合模型的ROC曲線

1. Install and load the library, pROC:安裝和加載庫

> install.packages("pROC")

> library("pROC")

2. Set up the training control with a 10-fold cross-validation in 3 repetitions:建立訓練控制與10倍交叉驗證在3次重復

> control = trainControl(method = "repeatedcv",

+ number = 10,

+ repeats = 3,

+ classProbs = TRUE,

+ summaryFunction = twoClassSummary)

3. Then, you can train a classifier on the training dataset using glm:然後,你可以訓練一個分類器的訓練數據集使用GLM

> glm.model= train(churn ~ .,

+ data = trainset,

1. Resample the three generated models:

重采樣三生成的模型:

> cv.values = resamples(list(glm = glm.model, svm=svm.model, rpart

= rpart.model))

2. Then, you can obtain a summary of the resampling result:

然後,可以獲取重采樣結果的摘要:

> summary

Call:

summary.resamples(object = cv.values)

Models: glm, svm, rpart

Number of resamples: 30

ROC

Min. 1st Qu. Median Mean 3rd Qu. Max. NA‘s

glm 0.7206 0.7847 0.8126 0.8116 0.8371 0.8877 0

svm 0.8337 0.8673 0.8946 0.8929 0.9194 0.9458 0

rpart 0.2802 0.7159 0.7413 0.6769 0.8105 0.8821 0

Sens

Min. 1st Qu. Median Mean 3rd Qu. Max. NA‘s

glm 0.08824 0.2000 0.2286 0.2194 0.2517 0.3529 0

svm 0.44120 0.5368 0.5714 0.5866 0.6424 0.7143 0

rpart 0.20590 0.3742 0.4706 0.4745 0.5929 0.6471 0

Spec

Min. 1st Qu. Median Mean 3rd Qu. Max. NA‘s

glm 0.9442 0.9608 0.9746 0.9701 0.9797 0.9949 0

svm 0.9442 0.9646 0.9746 0.9740 0.9835 0.9949 0

rpart 0.9492 0.9709 0.9797 0.9780 0.9848 0.9949 0

3. Use dotplot to plot the resampling result in the ROC metric:

使用dotplot在ROC度量采樣結果圖

> dotplot(cv.values, metric = "ROC")

4. Also, you can use a box-whisker plot to plot the resampling result:

此外,您可以使用一個方塊圖繪制重采樣結果

> bwplot(cv.values, layout = c(3, 1))

How it works...它如何工作

In this recipe, we demonstrate how to measure the performance differences among three fitted models using the resampling method. First, we use the resample function to generate the statistics of each fitted model (svm.model, glm.model, and rpart.model). Then, we can use the summary function to obtain the statistics of these three models in the ROC, sensitivity and specificity metrics. Next, we can apply a dotplot on the resampling result to see how ROC varied between each model. Last, we use a box-whisker plot on the resampling results to show the box-whisker plot of different models in the ROC, sensitivity and specificity metrics on a single plot.

在這個食譜中,我們展示了如何衡量三個擬合模型的性能差異使用重采樣方法。首先,我們使用重采樣函數生成各擬合模型的統計(svm.model,glm.model,和rpart。模型)。然後,我們可以使用匯總功能,以獲得這三個模型在ROC的統計,敏感性和特異性度量。接下來,我們可以應用在重采樣的結果怎麽看ROC dotplot之間變化,每個模型。最後,我們使用的重采樣結果顯示不同的模型在ROC,靈敏度和特異性指標在一個單一的地塊盒晶須圖的盒晶須情節。

See also參見

ff Besides using dotplot and bwplot to measure performance differences, one can use densityplot, splom, and xyplot to visualize the performance differences of each fitted model in the ROC, sensitivity, and specificity metrics.FF除了使用dotplot和bwplot測量性能的差異,可以使用splom densityplot,可視化,和xyplot在ROC,各擬合模型的性能差異的敏感性和特異性的指標。

12 Measuring performance differences between models with the caret package測量性能差異並封裝模型

In the previous recipe, we introduced how to generate ROC curves for each generated model, and have the curve plotted on the same figure. Apart from using an ROC curve, one can use the resampling method to generate statistics of each fitted model in ROC, sensitivity and specificity metrics. Therefore, we can use these statistics to compare the performance differences between each model. In the following recipe, we will introduce how to measure performance differences between fitted models with the caret package.在以前的配方中,我們介紹了如何生成ROC曲線生成的模型,並將曲線繪制在同一圖形上。除了使用ROC曲線,可以使用重采樣的方法來生成統計的每個擬合模型在ROC,靈敏度特異性度量。因此,我們可以使用這些統計數據來比較性能各模型之間的差異。在下面的食譜中,我們將介紹如何測量之間的擬合模型並封裝性能的差異。

Getting ready準備

One needs to have completed the previous recipe by storing the glm fitted model, svm fitted model, and the rpart fitted model into glm.model, svm.model, and rpart.model, respectively人們需要通過存儲GLM擬合模型完成之前的食譜,支持向量機的安裝<br>模型,並為glm.model rpart擬合模型,svm.model,和rpart.model

How to do it...怎樣做…

Perform the following steps to measure performance differences between each fitted model:執行下列步驟來測量每個擬合模型之間的性能差異: --------摘自百度翻譯

葉新潁

文章翻譯第七章10-12