1. 程式人生 > >Step Methodology To The Best Machine Learning Algorithm

Step Methodology To The Best Machine Learning Algorithm

How do you choose the best algorithm for your dataset?

Machine learning is a problem of induction where general rules are learned from specific observed data from the domain.

It infeasible (impossible?) to know what representation or what algorithm to use to best learn from the data on a specific problem before hand, without knowing the problem so well that you probably don’t need machine learning to begin with.

So what algorithm should you use on a given problem? It’s a question of trial and error, or searching for the best representation, learning algorithm and algorithm parameters.

In this post, you will discover the simple 3-step methodology for finding the best algorithm for your problem proposed by some of the best predictive modelers in the business.

Steps To The Best Machine Learning Algorithm

Steps To The Best Machine Learning Algorithm
Photo by David Goehring, some rights reserved.

3-Step Methodology

Max Kuhn is the creator and owner of the caret package for that provides a suite of tools for predictive modeling in R. It might be the best R package and the one reason why R is the top choice for serious competitive and applied machine learning.

In their excellent book, “Applied Predictive Modeling“, Kuhn and Johnson outline a process to select the best model for a given problem.

I paraphrase their suggested approach as:

  1. Start with the least interpretable and most flexible models.
  2. Investigate simpler models that are less opaque.
  3. Consider using the simplest model that reasonably approximates the performance of the more complex models.

They comment:

Using this methodology, the modeler can discover the “performance ceiling” for the data set before settling on a model. In many cases, a range of models will be equivalent in terms of performance so the practitioner can weight the benefits of different methodologies…

For example, here is a general interpretation of this methodology that you could use on your next one-off modeling project:

  1. Investigate a suite of complex models and establish a performance ceiling, such as:
    1. Support Vector Machines
    2. Gradient Boosting Machines
    3. Random Forest
    4. Bagged Decision Trees
    5. Neural Networks
  2. Investigate a suite of simpler more interpretable models, such as:
    1. Generalized Linear Models
    2. LASSO and Elastic-Net Regularized Generalized Linear Models
    3. Multivariate Adaptive Regression Splines
    4. k-Nearest Neighbors
    5. Naive Bayes
  3. Select the model from (2) that best approximates the accuracy from (1).

Quick One-Off Models

I think this is a great methodology to use for a one-off project where you need a good result quickly, such as within minutes or hours.

  • You have a good idea of the spread of accuracy on a problem across models
  • You have a model that is easier to understand and explain to others.
  • You have a reasonably high quality model very quickly (maybe top 10-to-25% of what is achievable on the problem if you spent days or weeks)

I don’t think this is the best methodology for all problems. Perhaps some down-sides to methodology are:

  • More complex methods are slower to run and return a result.
  • Sometimes you want the complex mode over the simpler models (e.g. domains where accuracy trumps explainability).
  • The performance ceiling is pursued first, rather than last when there might be time and pressure and motivation to extract the most from the best methods.

For more information on this strategy, checkout Section 4.8 Choosing Between Models, page 78 of Applied Predictive Modeling. A must have book for any serious machine learning practitioners using R.

Do you have a methodology for finding the best machine learning algorithm for a problem? Leave a comment and share the broader strokes.

Have you used this methodology? Did it work for you?

Any questions? Leave a comment.

相關推薦

Step Methodology To The Best Machine Learning Algorithm

Tweet Share Share Google Plus How do you choose the best algorithm for your dataset? Machine lea

The Best Machine Learning Algorithm

Tweet Share Share Google Plus What is the best machine learning algorithm? I get this question a

6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study

This goes back to what I originally stated. If you don't understand the basics, don't tackle an algorithm from scratch. For the Perceptron, let's go ahead

How to Implement a Machine Learning Algorithm

Tweet Share Share Google Plus Implementing a machine learning algorithm in code can teach you a

How to Learn a Machine Learning Algorithm

Tweet Share Share Google Plus The question of how to learn a machine learning algorithm has come

6 Questions To Understand Any Machine Learning Algorithm

Tweet Share Share Google Plus There are a lot of machine learning algorithms and each algorithm

How to Tune a Machine Learning Algorithm in Weka

Tweet Share Share Google Plus Weka is the perfect platform for learning machine learning. It pro

The Best Resources I Used to Teach Myself Machine Learning

The Best Resources I Used to Teach Myself Machine LearningThe field of machine learning is becoming more and more mainstream every year. With this growth c

How To Investigate Machine Learning Algorithm Behavior

Tweet Share Share Google Plus Machine learning algorithms are complex systems that require study

解決VM提示:VMware Workstation cannot connect to the virtual machine. Make sure you have rights to run the program, access all directories the program uses

問題: 在開啟虛擬機器的時候報: VMware Workstation cannot connect to the virtual machine. Make sure you have rights to run the program, access all directories the progr

開啟虛擬機器所報的錯誤:VMware Workstation cannot connect to the virtual machine. Make sure you have rights to run the program, access all directories the program

當我們開啟虛擬機器時出現錯誤: VMware Workstation cannot connect to the virtual machine. Make sure you have rights to run the program, access all directories the program

How To Load CSV Machine Learning Data in Weka (如何在Weka中載入CSV機器學習資料)

How To Load CSV Machine Learning Data in Weka 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/load-csv-machine-learning-data-weka/

嵌入式 - VMware虛擬機器常見問題處理(VMware Workstation cannot connect to the virtual machine.無法開啟核心裝置“\\.\Global\)

通過VMware虛擬機器開啟系統時,彈出對話方塊,提示:VMware Workstation cannot connect to the virtual machine. Make sure you have rights to run the program, access all

[Machine Learning & Algorithm] 隨機森林(Random Forest)

閱讀目錄 回到頂部 1 什麼是隨機森林?   作為新興起的、高度靈活的一種機器學習演算法,隨機森林(Random Forest,簡稱RF)擁有廣泛的應用前景,從市場營銷到醫療保健保險,既可以用來做市場營銷模擬的建模,統計客戶來源,保留和流失,也可用來預測疾病的風險和病患

機器學習_論文筆記_1: A few useful things to know about machine learning

> 翻譯總結by joey周琦 希望把自己閱讀到的,覺得有營養的論文,總結筆記和自己想法,留給自己,也分享給大家。因為英文論文中一些專有,有難度的詞句,會給出英文原文。 這篇文章總結了有關機器學習的12條重要,簡單,明瞭的經驗。本文面對分類問題總結,但不限於分類問題。

How to Apply Industrial Machine Learning

The concept of machine learning is becoming better understood as we increasingly interact with it every day. From Netflix and Amazon recommendations, to Si

step Time Series Forecasting with Machine Learning for Household Electricity Consumption

Given the rise of smart electricity meters and the wide adoption of electricity generation technology like solar panels, there is a wealth of electricity

How to become a machine learning engineer: A cheat sheet

Machine learning engineers--i.e., advanced programmers who develop artificial intelligence (AI) machines and systems that can learn and apply knowledge--ar

How to deliver on Machine Learning projects

As Machine Learning (ML) is becoming an important part of every industry, the demand for Machine Learning Engineers (MLE) has grown dramatically. MLEs comb