1. 程式人生 > >How To Estimate Model Accuracy in R Using The Caret Package

How To Estimate Model Accuracy in R Using The Caret Package

When you are building a predictive model, you need a way to evaluate the capability of the model on unseen data.

This is typically done by estimating accuracy using data that was not used to train the model such as a test set, or using cross validation. The

caret package in R provides a number of methods to estimate the accuracy of a machines learning algorithm.

In this post you discover 5 approaches for estimating model performance on unseen data. You will also have access to recipes in R using the caret package for each method, that you can copy and paste into your own project, right now.

Estimating Model Accuracy

We have considered model accuracy before in the configuration of test options in a test harness. You can read more in the post: How To Choose The Right Test Options When Evaluating Machine Learning Algorithms.

In this post you can going to discover 5 different methods that you can use to estimate model accuracy.

They are as follows and each will be described in turn:

  • Data Split
  • Bootstrap
  • k-fold Cross Validation
  • Repeated k-fold Cross Validation
  • Leave One Out Cross Validation

Generally, I would recommend Repeated k-fold Cross Validation, but each method has its features and benefits, especially when the amount of data or space and time complexity are considered. Consider which approach best suits your problem.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Data Split

Data splitting involves partitioning the data into an explicit training dataset used to prepare the model and an unseen test dataset used to evaluate the models performance on unseen data.

It is useful when you have a very large dataset so that the test dataset can provide a meaningful estimation of performance, or for when you are using slow methods and need a quick approximation of performance.

The example below splits the iris dataset so that 80% is used for training a Naive Bayes model and 20% is used to evaluate the models performance.

Data Split in R R
123456789101112131415161718 # load the librarieslibrary(caret)library(klaR)# load the iris datasetdata(iris)# define an 80%/20% train/test split of the datasetsplit=0.80trainIndex<-createDataPartition(iris$Species,p=split,list=FALSE)data_train<-iris[trainIndex,]data_test<-iris[-trainIndex,]# train a naive bayes modelmodel<-NaiveBayes(Species~.,data=data_train)# make predictionsx_test<-data_test[,1:4]y_test<-data_test[,5]predictions<-predict(model,x_test)# summarize resultsconfusionMatrix(predictions$class,y_test)

Bootstrap

Bootstrap resampling involves taking random samples from the dataset (with re-selection) against which to evaluate the model. In aggregate, the results provide an indication of the variance of the models performance. Typically, large number of resampling iterations are performed (thousands or tends of thousands).

The following example uses a bootstrap with 10 resamples to prepare a Naive Bayes model.

Data Bootstrap in R R
12345678910 # load the librarylibrary(caret)# load the iris datasetdata(iris)# define training controltrain_control<-trainControl(method="boot",number=100)# train the modelmodel<-train(Species~.,data=iris,trControl=train_control,method="nb")# summarize resultsprint(model)

k-fold Cross Validation

The k-fold cross validation method involves splitting the dataset into k-subsets. For each subset is held out while the model is trained on all other subsets. This process is completed until accuracy is determine for each instance in the dataset, and an overall accuracy estimate is provided.

It is a robust method for estimating accuracy, and the size of k and tune the amount of bias in the estimate, with popular values set to 3, 5, 7 and 10.

The following example uses 10-fold cross validation to estimate Naive Bayes on the iris dataset.

k-fold Cross Validation in R R
123456789101112 # load the librarylibrary(caret)# load the iris datasetdata(iris)# define training controltrain_control<-trainControl(method="cv",number=10)# fix the parameters of the algorithmgrid<-expand.grid(.fL=c(0),.usekernel=c(FALSE))# train the modelmodel<-train(Species~.,data=iris,trControl=train_control,method="nb",tuneGrid=grid)# summarize resultsprint(model)

Repeated k-fold Cross Validation

The process of splitting the data into k-folds can be repeated a number of times, this is called Repeated k-fold Cross Validation. The final model accuracy is taken as the mean from the number of repeats.

The following example uses 10-fold cross validation with 3 repeats to estimate Naive Bayes on the iris dataset.

Repeated k-fold Cross Validation in R R
12345678910 # load the librarylibrary(caret)# load the iris datasetdata(iris)# define training controltrain_control<-trainControl(method="repeatedcv",number=10,repeats=3)# train the modelmodel<-train(Species~.,data=iris,trControl=train_control,method="nb")# summarize resultsprint(model)

Leave One Out Cross Validation

In Leave One Out Cross Validation (LOOCV), a data instance is left out and a model constructed on all other data instances in the training set. This is repeated for all data instances.

The following example demonstrates LOOCV to estimate Naive Bayes on the iris dataset.

Leave One Out Cross Validation in R R
12345678910 # load the librarylibrary(caret)# load the iris datasetdata(iris)# define training controltrain_control<-trainControl(method="LOOCV")# train the modelmodel<-train(Species~.,data=iris,trControl=train_control,method="nb")# summarize resultsprint(model)

Summary

In this post you discovered 5 different methods that you can use to estimate the accuracy of your model on unseen data.

Those methods were: Data Split, Bootstrap, k-fold Cross Validation, Repeated k-fold Cross Validation, and Leave One Out Cross Validation.

You can learn more about the caret package in R at the caret package homepage and the caret package CRAN page. If you would like to master the caret package, I would recommend the book written by the author of the package, titled: Applied Predictive Modeling, especially Chapter 4 on overfitting models.


Frustrated With Your Progress In R Machine Learning?

Master Machine Learning With R

Develop Your Own Models in Minutes

…with just a few lines of R code

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.


相關推薦

How To Estimate Model Accuracy in R Using The Caret Package

Tweet Share Share Google Plus When you are building a predictive model, you need a way to evalua

How to setup Assigned Access in Windows 10 (Kiosk Mode) 設置分配的訪問權限(Kiosk模式)

win tar mode ctr assigned all oos rsquo eve Let’s say you’re building some sort of ingenious mechanical contraption to be dis

How to Catch Ctrl-C in Shell Script

con func sigint -c for r script init form target ref: https://stackpointer.io/script/how-to-catch-ctrl-c-in-shell-script/248/ #!/

在pycharm中調試ryu應用(How to debug Ryu applications in Pycharm or other IDEs)

source deb python程序 mail log span cmd end pos 想要在IDE中使用IDE的調試功能來調試Ryu應用,可以這樣做: 新建一個python程序: 1 #!/usr/bin/env python 2 # -*- coding

How To Enable EPEL Repository in RHEL/CentOS 7/6/5?

1.7 like ons 64 bit pac PE ise nbsp ike What is EPEL EPEL (Extra Packages for Enterprise Linux) is open source and free community based r

[Selenium+Java] How to Upload & Download a File using Selenium Webdriver

HR erb 14. ava inter pub was cape googl Original source: https://www.guru99.com/upload-download-file-selenium-webdriver.html Uploadin

How to fix Error: listen EADDRINUSE while using nodejs

highlight nod node row light end eat test event If I run a server with the port 80, and I try to use xmlHTTPrequest i get this error: Err

How to execute sudo command in remote host via SSH

sed exec rac base should -s mach back sage Question: I have an interactive shell script, that at one place needs to ssh to another machin

How to setup kernel debug in Virtual Machine and redirect usermode debug sessions

轉載自:http://blog.sina.com.cn/s/blog_65e729050100m7on.html 在Windows高效排錯中提到了除錯重定向。書中沒有詳細介紹。今天恰好有機會在虛擬機器上從頭開始配置了一下,所以把詳細的內容記錄在這裡,算是補充。 文章本身使用英文寫的。由於書中是用

How to remove ROM cfg in MAME

/usr/share/games/mame/roms/ /usr/local/share/games/mame/roms/ sudo rm /usr/local/share/games/mame/roms/* sudo cp -r /home/cuthead/Downloads/* /usr/local

How to get current timestamps in Java

How to get current timestamps in Java Timestamp timestamp = new Timestamp(System.currentTimeMillis());//2016-11-16 06:43:19.77 Here are two Java example

How To Handle Click Events In Android RecyclerViews

According to the documentation, a RecyclerView is a flexible view for providing a limited window into a large data set. If you have done any android dev

How to Generate SQL Trace In OAF

1. Profile 'FND: Diagnostics' = Yes at user level.This will make 'Diagnostics' menu display.2. Login to Personal Home Page as that user an

How To Use Retrofit Library In Your Android App

Retrofit library is a Type-safe REST client for android and Java, courtesy of Square Inc. Most modern android apps make HTTP requests to some remote s

How To Create Custom Dialog In Android With Validation

Let’s learn how to create custom dialog in android and while we are at it, let us also do simple validation of the data the user entered before clicking

How to make a GroupBox in website development by VS.NET2005

Sometimes we need to make a GroupBox on my webpage.Using the HTML object(fieldset ,legend)  we can make it out! source: <fieldset style

How to split a string in C++

Java has String.split(), Python has string.split(), Perl has split. There is no simple string-splitting method in C++, but there are plenty of way

How to get browser information in JSP?

The following jsp will output your ip address and user-agent: Your user-agent is: <%=request.getHeader("user-agent")%><br/> Your IP address i

[iOS] How to limit character input in UIAlertView UITextField

When you initialize the alert view: [[alertView textFieldAtIndex:0] setDelegate:self]; Now, self here is your view controller. So you need to add <UITe

How to Disable Directory Browsing in WordPress

發現,預設的apache 引數是允許瀏覽目錄的,修改方法是到 /etc/apache2/sites-available/ 目錄下,把自己的網站增加下面紅字部份的引數: <Directory /var/www/html/> Options -Indexes AllowOverride All &l