1. 程式人生 > >How To Get Started With Machine Learning Algorithms in R

How To Get Started With Machine Learning Algorithms in R

R is the most popular platform for applied machine learning. When you want to get serious with applied machine learning you will find your way into R.

It is very powerful because so many machine learning algorithms are provided. A problem is that the algorithms are all provided by third parties, which makes their usage very inconsistent. This slows you down, a lot, because you have to learn how to model data and how to make predicts with each algorithm in each package, again and again.

In this post, you will discover how you can overcome this difficulty with machine learning algorithms in R, with pre-prepared recipes that follow a consistent structure.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Lots of Algorithms, Little Consistency

The R ecosystem is enormous. Open source third party packages provide this power, allowing academics and professionals to get the most powerful algorithms available into the hands of us practitioners.

A problem that I experienced when starting out with R was that the usage to each algorithm differs from package to package. This inconsistency also extends to the documentation, with some providing worked example for classification but ignoring regression and others not providing examples at all.

All this means that if you want to try a few different algorithms from different packages, you must spend time figuring out how to fit and make predictions with each method in turn. This takes a lot of time, especially with the spotty examples and vignettes.

I summarize these difficulties as follows:

  • Inconsistent: Algorithm implementations vary in the way a model is fit to data and the way a model is used to generate predictions. This means that you have to study each package and each algorithm implementation just to put together a working example, let alone adapt it to your problem.
  • Decentralized: Algorithm are implemented across different packages and it can be hard to locate which packages provide an implementation of the algorithm you need, let alone which package provides the most popular implementation. Additionally, the documentation for one package may be spread across multiple help files, website and vignettes. This means you have to do a lot of searching just to locate an algorithm, let alone compile a list of algorithms from which you can choose.
  • Incomplete: Algorithm documentation is almost always partially complete. An example usage may or may not be provided, if it is, it may or may not be demonstrated on a canonical problem. This means you have no obvious way to quickly understand how to use an implementation.
  • Complexity: Algorithms vary in their complexity of implementation and description. This can take it’s toll on you as you jump from package to package. You want to focus on how to get the most from the algorithm and its parameters, and not burn energy on parsing reams of PDFs just to get a hello world.

Build an Algorithm Recipe Book

You could get a lot more done if you had an algorithm recipe book you could look up and find examples of machine learning algorithms in R that you could copy-and-paste and adapt for your specific problem.

For this the recipe book approach to work, it would have to confirm to some key principles:

  • Standalone: Each code example must be standalone, complete and ready to execute.
  • Just Code: Each recipe must focuses on the code with minimal exposition on machine learning theory (there are amazing books for that, don’t mix these concerns).
  • Simplicity: Each recipe must be presented in the most common use case, which is probably what you are looking to do when you look it up. You want to consult the official documentation only to look up the parameters so that you can get the most from the algorithm.
  • Portable: All recipes must be provided in a single reference that can be searched and printed, browsed and looked up (a recipe book).
  • Consistent: All code examples are presented consistently and follow the same code structure and style conventions (load data, fit model, make prediction).

An algorithm recipe book would give you the ability to wield the R platform for machine learning and solve complex problems.

  • You could apply algorithms and features directly.
  • You could discover the code you need.
  • You could understand what is going on with a glance.
  • You could own the recipes and use and organize them the way you want.
  • You could get the most out of the algorithms and features.

Algorithm Recipes in R

I have already blocked out examples of what these recipes could look like.

I have provided example machine learning recipes in R, grouped by algorithm type or similarity, as follows:

  • Linear Regression: Ordinary Least Squares Regression, Stepwise Regression, Principal Component Regression and Partial Least Squares Regression.
  • Penalized-Linear Regression: Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO) and ElasticNet
  • Non-Linear Regression: Multivariate Adaptive Regression Spines (MARS), Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Neural Network.
  • Non-Linear Decision Tree Regression: Classification and Regression Trees (CART), Conditional Decision Trees, Modal Trees, Rule Systems, Bagging CART, Random Forest, Gradient Boosted Machines (GBM) and Cubist.
  • Linear Classification: Logistic Regression, Linear Discriminant Analysis (LDA) and Partial Least Squares Discriminant Analysis.
  • Non-Linear Classification: Mixture Discriminant Analysis (MDA), Quadratic Discriminant Analysis (QDA), Regularized Discriminant Analysis (RDA), Neural Network, Flexible Discriminant Analysis (FDA), Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Naive Bayes.
  • Non-Linear Decision Tree Classification: Classification and Regression Trees (CART), C4.5, PART, Bagging CART, Random Forest, Gradient Boosted Machines (GBM) and Boosted C5.0.

I think these recipes really fit the bill of this mission.

Summary

In this post, you discovered the popularity and power of machine learning in R, but the cost of that power is the time required to harness it.

You discovered that one approach to addressing this limitation in R is to devise a recipe book of complete and standalone machine learning algorithms that you can look up and apply to your specific problems, as needed.

Finally, you saw examples of machine learning algorithm recipes in R for a wide range of algorithm type.

If you found this approach useful, I’d love to hear about it.


Frustrated With Your Progress In R Machine Learning?

Master Machine Learning With R

Develop Your Own Models in Minutes

…with just a few lines of R code

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.


相關推薦

How To Get Started With Machine Learning Algorithms in R

Tweet Share Share Google Plus R is the most popular platform for applied machine learning. When

How to Get Started with Machine Learning in Python

Tweet Share Share Google Plus The Python conference PyCon2014 has held recently and the videos f

How To Get Started With Machine Learning in R (get results in one weekend)

Tweet Share Share Google Plus How do you get started with machine learning in R? R is a large an

How to Get Started with Deep Learning for Natural Language Processing (7

Tweet Share Share Google Plus Deep Learning for NLP Crash Course. Bring Deep Learning methods to

How to Get Started With Conversational AI

An ever-expanding list of benefits and a growing demand for voice interfaces has placed Conversational AI high on the list as a key component for any digit

Cool Factor: How to Steal Styles with Machine Learning, Turi Create, and ResNet

Turi Style TransferFirst of all, follow the Turi Create installation instructions on GitHub. It’s imperative to create a Python 2.7 environment with the sp

How to Build an Ensemble Of Machine Learning Algorithms in R (ready to use boosting, bagging and stacking)

Tweet Share Share Google Plus Ensembles can give you a boost in accuracy on your dataset. In thi

How To Get Better At Machine Learning

Tweet Share Share Google Plus Colorado Reed from Metacademy wrote a great post recently titled “

Spot Check Machine Learning Algorithms in R (algorithms to try on your next project)

Tweet Share Share Google Plus Spot checking machine learning algorithms is how you find the best

How to Better Understand Your Machine Learning Data in Weka

Tweet Share Share Google Plus It is important to take your time to learn about your data when st

Tune Machine Learning Algorithms in R (random forest case study)

Tweet Share Share Google Plus It is difficult to find a good machine learning algorithm for your

How To Get Started In Machine Learning: A Self

Tweet Share Share Google Plus Specifically, the original poster of the question had completed t

How to Clean Text for Machine Learning with Python

Tweet Share Share Google Plus You cannot go straight from raw text to fitting a machine learning

Learn How to Code and Deploy Machine Learning Models on Spark Structured Streaming

This post is a token of appreciation for the amazing open source community of Data Science, to which I owe a lot of what I have learned. For last few month

4 Practical Steps to Get Started with Artificial Intelligence CLEARPRISM

With so many technologies and use cases, getting started with artificial intelligence (AI) initiatives and deployments can be a daunting task for business

Convert unstructured data to structured data with machine learning

They stream movies and send texts and pictures to the other side of the world. Each second, a huge amount of data is created and collected. But, still, bus

5 Ways to Get Started with Marketing AI Today

Mike Kaput is a senior consultant at PR 20/20 who is passionate about AI's potential to transform marketing. At PR 20/20, he creates measurable marketing

How to Assess Startups Using Machine Learning: Part II

The GASPBecause there is no standard industry practice in venture capital to assess startups, we took it on ourselves to design a framework that can be use

How to Get AXU with the Argentas Wallet?

How to Get AXU with the Argentas Wallet?This brief article walks you through the steps to get some XLM (Stellar lumens) unless you already have, and to get

Getting Started With Machine Learning

Getting Started With Machine LearningWhat are the fundamentals of machine learning, and what are the necessary tools to evaluate risk and other concerns in