1. 程式人生 > >How to Improve Machine Learning Results

How to Improve Machine Learning Results

Having one or two algorithms that perform reasonably well on a problem is a good start, but sometimes you may be incentivised to get the best result you can given the time and resources you have available.

In this post, you will review methods you can use to squeeze out extra performance and improve the results you are getting from machine learning algorithms.

When tuning algorithms you must have a high confidence in the results given by your test harness. This means that you should be using techniques that reduce the variance of the performance measure you are using to assess algorithm runs. I suggest cross validation with a reasonably high number of folds (the exact number of which depends on your dataset).

Tuning Fork

Tuning Fork
Photo attributed to eurok, some rights reserved

The three strategies you will learn about in this post are:

  • Algorithm Tuning
  • Ensembles
  • Extreme Feature Engineering

Algorithm Tuning

The place to start is to get better results from algorithms that you already know perform well on your problem. You can do this by exploring and fine tuning the configuration for those algorithms.

Machine learning algorithms are parameterized and modification of those parameters can influence the outcome of the learning process. Think of each algorithm parameter as a dimension on a graph with the values of a given parameter as a point along the axis. Three parameters would be a cube of possible configurations for the algorithm, and n-parameters would be an n-dimensional hypercube of possible configurations for the algorithm.

The objective of algorithm tuning is to find the best point or points in that hypercube for your problem. You will be optimizing against your test harness, so again you cannot underestimate the importance of spending the time to build a trusted test harness.

You can approach this search problem by using automated methods that impose a grid on the possibility space and sample where good algorithm configuration might be. You can then use those points in an optimization algorithm to zoom in on the best performance.

You can repeat this process with a number of well performing methods and explore the best you can achieve with each. I strongly advise that the process is automated and reasonably coarse grained as you can quickly reach points of diminishing returns (fractional percentage performance increases) that may not translate to the production system.

The more tuned the parameters of an algorithm, the more biased the algorithm will be to the training data and test harness. This strategy can be effective, but it can also lead to more fragile models that overfit your test harness and don’t perform as well in practice.

Ensembles

Ensemble methods are concerned with combining the results of multiple methods in order to get improved results. Ensemble methods work well when you have multiple “good enough” models that specialize in different parts of the problem.

This may be achieved through many ways. Three ensemble strategies you can explore are:

  • Bagging: Known more formally as Bootstrapped Aggregation is where the same algorithm has different perspectives on the problem by being trained on different subsets of the training data.
  • Boosting: Different algorithms are trained on the same training data.
  • Blending: Known more formally as Stacked Aggregation or Stacking is where a variety of models whose predictions are taken as input to a new model that learns how to combine the predictions into an overall prediction.

It is a good idea to get into ensemble methods after you have exhausted more traditional methods. There are two good reasons for this, they are generally more complex than traditional methods and the traditional methods give you a good base level from which you can improve and draw from to create your ensembles.

Ensemble Learning

Ensemble Learning
Photo attributed to ancasta1901, some rights reserved

Extreme Feature Engineering

The previous two strategies have looked at getting more from machine learning algorithms. This strategy is about exposing more structure in the problem for the algorithms to learn. In data preparation learned about feature decomposition and aggregation in order to better normalize the data for machine learning algorithms. In this strategy, we push that idea to the limits. I call this strategy extreme feature engineering, when really the term “feature engineering” would suffice.

Think of your data as having complex multi-dimensional structures embedded in it that machine learning algorithms know how to find and exploit to make decisions. You want to best expose those structures to algorithms so that the algorithms can do their best work. A difficulty is that some of those structures may be too dense or too complex for the algorithms to find without help. You may also have some knowledge of such structures from your domain expertise.

Take attributes and decompose them widely into multiple features. Technically, what you are doing with this strategy is reducing dependencies and non-linear relationships into simpler independent linear relationships.

This is might be a foreign idea, so here are three examples:

  • Categorical: You have a categorical attribute that had the values [red, green blue], you could split that into 3 binary attributes of red, green and blue and give each instance a 1 or 0 value for each.
  • Real: You have a real valued quantity that has values ranging from 0 to 1000. You could create 10 binary attributes, each representing a bin of values (0-99 for bin 1, 100-199 for bin 2, etc.) and assign each instance a binary value (1/0) for the bins.

I recommend performing this process one step at a time and creating a new test/train dataset for each modification you make and then test algorithms on the dataset. This will start to give you an intuition for attributes and features in the database that are exposing more or less information to the algorithms and the effects on the performance measure. You can use these results to guide further extreme decompositions or aggregations.

Summary

I this post you learned about three strategies for getting improved results from machine learning algorithms on your problem:

  • Algorithm Tuning where discovering the best models is treated like a search problem through model parameter space.
  • Ensembles where the predictions made by multiple models are combined.
  • Extreme Feature Engineering where the attribute decomposition and aggregation seen in data preparation is pushed to the limits.

Resources

If you are looking to dive deeper into this subject, take a look at the resources below.

Update

For 20 tips and tricks for getting more from your algorithms, see the post:

相關推薦

How to Improve Machine Learning Results

Tweet Share Share Google Plus Having one or two algorithms that perform reasonably well on a pro

How to Use Machine Learning Results

Tweet Share Share Google Plus Once you have found and tuned a viable model of your problem it is

How to Scale Machine Learning Data From Scratch With Python

Tweet Share Share Google Plus Many machine learning algorithms expect data to be scaled consiste

How to Evaluate Machine Learning Algorithms with R

Tweet Share Share Google Plus What algorithm should you use on your dataset? This is the most co

How to Evaluate Machine Learning Algorithms

Tweet Share Share Google Plus Once you have defined your problem and prepared your data you need

How To Investigate Machine Learning Algorithm Behavior

Tweet Share Share Google Plus Machine learning algorithms are complex systems that require study

How Microsoft Uses Machine Learning to Help You Build Machine Learning Pipelines

Last week at its Ignite Conference, Microsoft unveiled the preview version of Automated Machine Learning(ML), a component of Azure ML that allows non-data

How I Used Machine Learning to Inspire Physical Paintings

In recent years, I haven’t had the same leeway to paint in public. There was a greater cultural acceptance of street art when I lived abroad. Painting on w

4 ways to use machine learning to improve customer experience 7wData

In a digital business environment, providing a quality customer experience -- on multiple digital fronts -- is not only a crucial aspect in modern business

how to study reinforcement learning(answered by Sergio Valcarcel Macua on Quora)

work asi -a recommend practical man glob alua iteration link: https://www.quora.com/What-are-the-best-books-about-reinforcement-learning

A Gentle Introduction to Applied Machine Learning as a Search Problem (譯文)

​ A Gentle Introduction to Applied Machine Learning as a Search Problem 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/applied-m

機器學習專案開發過程(End-to-End Machine Learning Project)

引言:之前對於機器學習的認識停留在演算法的分析上,這篇文章主要從專案開發的角度分析機器學習的應用。這篇文章主要解釋實際專案過程中的大致方針,每一步涉及的技術不會介紹很細緻。機器學習專案開發步驟如下: 1. Look at the big picture. 2. Get the dat

How AI and Machine Learning Are Redefining Cybersecurity

Cybersecurity has been emerging as one of the most important sectors of the digital world. The last few years have seen a lot of cyber attacks all around t

Steak & chips: how IoT and machine learning will disrupt risk in animal insurance

On the face of it, the connection between the internet of things (IoT) and animals is not an obvious one. However, a number of trials and larger-scale impl

Facebook and Udacity want to give you a scholarship to master machine learning

Facebook may be willing to foot the bill. On Tuesday, Facebook and Udacity announced the PyTorch Scholarship Challenge, offering students the opportunity t

Metrics to measure machine learning model performance

How to read: It depends on what you want to measure. For accuracy, a value closer to 1 (or 100%) is better.Gains chartsThis metric measures how a model per

How to improve React Native list performance 5x times

Recently my team have started working on the first big React Native app. Quite soon I had to implement a page with filters list:ProblemIn the initial versi

Why are enterprises slow to adopt machine learning?

Machine learning has the potential to transform the way organisations interact with the world, to move faster and to provide better customer experience. Bu

Nvidia looks to transform machine learning with GPUs

Nvidia is no stranger to data crunching applications of its GPU architecture. It's been dominating the AI deep learning development space for years and sat

The Path to Understanding Machine Learning

The Path to Understanding Machine LearningArtificial Intelligence has been the center of media hype. Promises of self-driving cars, virtual assistants, and