1. 程式人生 > >How to Define Your Machine Learning Problem

How to Define Your Machine Learning Problem

The first step in any project is defining your problem. You can use the most powerful and shiniest algorithms available, but the results will be meaningless if you are solving the wrong problem.

In this post you will learn the process for thinking deeply about your problem before you get started. This is unarguably the most important aspect of applying machine learning.

What is the problem?

What is the problem?
Photo attributed to Eleaf, some rights reserved

Problem Definition Framework

I use a simple framework when defining a new problem to address with machine learning. The framework helps me to quickly understand the elements and motivation for the problem and whether machine learning is suitable or not.

The framework involves answering three questions to varying degrees of thoroughness:

  • Step 1: What is the problem?
  • Step 2: Why does the problem need to be solved?
  • Step 3: How would I solve the problem?

Step 1: What is the Problem

The first step is defining the problem. I use a number of tactics to collect this information.

Informal description

Describe the problem as though you were describing it to a friend or colleague. This can provide a great starting point for highlighting areas that you might need to fill. It also provides the basis for a one sentence description you can use to share your understanding of the problem.

For example: I need a program that will tell me which tweets will get retweets.

Formalism

In a previous blog post defining machine learning you learned about Tom Mitchell’s machine learning formalism. Here it is again to refresh your memory.

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Use this formalism to define the T, P, and E for your problem.

For example:

  • Task (T): Classify a tweet that has not been published as going to get retweets or not.
  • Experience (E): A corpus of tweets for an account where some have retweets and some do not.
  • Performance (P): Classification accuracy, the number of tweets predicted correctly out of all tweets considered as a percentage.

Assumptions

Create a list of assumptions about the problem and it’s phrasing. These may be rules of thumb and domain specific information that you think will get you to a viable solution faster.

It can be useful to highlight questions that can be tested against real data because breakthroughs and innovation occur when assumptions and best practice are demonstrated to be wrong in the face of real data. It can also be useful to highlight areas of the problem specification that may need to be challenged, relaxed or tightened.

For example:

  • The specific words used in the tweet matter to the model.
  • The specific user that retweets does not matter to the model.
  • The number of retweets may matter to the model.
  • Older tweets are less predictive than more recent tweets.
question everything

Question Everything!
Photo attributed to dullhunk, some rights reserved

Similar problems

What other problems have you seen or can you think of that are like the problem you are trying to solve? Other problems can inform the problem you are trying to solve by highlighting limitations in your phrasing of the problem such as time dimensions and conceptual drift (where the concept being modeled changes over time). Other problems can also point to algorithms and data transformations that could be adopted to spot check performance.

For example: A related problem would be email spam discrimination that uses text messages as input data and needs binary classification decision.

Step 2: Why does the the problem need to be solved?

The second step is to think deeply about why you want or need the problem solved.

Motivation

Consider your motivation for solving the problem. What need will be fulfilled when the problem is solved?

For example, you may be solving the problem as a learning exercise. This is useful to clarify as you can decide that you don’t want to use the most suitable method to solve the problem, but instead you want to explore methods that you are not familiar with in order to learn new skills.

Alternatively, you may need to solve the problem as part of a duty at work, ultimately to keep your job.

Solution Benefits

Consider the benefits of having the problem solved. What capabilities does it enable?

It is important to be clear on the benefits of the problem being solved to ensure that you capitalize on them. These benefits can be used to sell the project to colleagues and management to get buy in and additional time or budget resources.

If it benefits you personally, then be clear on what those benefits are and how you will know when you have got them. For example, if it’s a tool or utility, then what will you be able to do with that utility that you can’t do now and why is that meaningful to you?

Solution Use

Consider how the solution to the problem will be used and what type of lifetime you expect the solution to have. As programmers we often think the work is done as soon as the program is written, but really the project is just beginning it’s maintenance lifetime.

The way the solution will be used will influence the nature and requirements of the solution you adopt.

Consider whether you are looking to write a report to present results or you want to operationalize the solution. If you want to operationalize the solution, consider the functional and nonfunctional requirements you have for a solution, just like a software project.

Step 3: How would I solve the problem?

In this third and final step of the problem definition, explore how you would solve the problem manually.

List out step-by-step what data you would collect, how you would prepare it and how you would design a program to solve the problem. This may include prototypes and experiments you would need to perform which are a gold mine because they will highlight questions and uncertainties you have about the domain that could be explored.

This is a powerful tool. It can highlight problems that actually can be solved satisfactorily using a manually implemented solution. It also flushes out important domain knowledge that has been trapped up until now like where the data is actually stored, what types of features would be useful and many other details.

Collect all of these details as they occur to you and update the previous sections of the problem definition. Especially the assumptions and rules of thumb.

We have considered a manually specified solution before when describing complex problems in why machine learning matters.

Summary

In this post you learned the value of being clear on the problem you are solving. You discovered a three step framework for defining your problem with practical tactics at at step:

  • Step 1: What is the problem? Describe the problem informally and formally and list assumptions and similar problems.
  • Step 2: Why does the problem need to be solve? List your motivation for solving the problem, the benefits a solution provides and how the solution will be used.
  • Step 3: How would I solve the problem? Describe how the problem would be solved manually to flush domain knowledge.

How do you define your problem for machine learning? Have you used any of the above tactics and if so, what were your experiences? Leave a comment.

相關推薦

How to Define Your Machine Learning Problem

Tweet Share Share Google Plus The first step in any project is defining your problem. You can us

How To Load Your Machine Learning Data Into R

Tweet Share Share Google Plus You need to be able to load data into R when working on a machine

How to Transform Your Machine Learning Data in Weka

Tweet Share Share Google Plus Often your raw data for machine learning is not in an ideal form f

How To Load CSV Machine Learning Data in Weka (如何在Weka中載入CSV機器學習資料)

How To Load CSV Machine Learning Data in Weka 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/load-csv-machine-learning-data-weka/

QA: How Reliable Are Your Machine Learning Systems?

In this post, you will learn about different aspects of creating a Machine Learning system with high reliability. It should be noted that system reliabilit

How to Apply Industrial Machine Learning

The concept of machine learning is becoming better understood as we increasingly interact with it every day. From Netflix and Amazon recommendations, to Si

How to become a machine learning engineer: A cheat sheet

Machine learning engineers--i.e., advanced programmers who develop artificial intelligence (AI) machines and systems that can learn and apply knowledge--ar

How to deliver on Machine Learning projects

As Machine Learning (ML) is becoming an important part of every industry, the demand for Machine Learning Engineers (MLE) has grown dramatically. MLEs comb

How to unit test machine learning code.

How to unit test machine learning code.Note: The popularity of this post has inspired me to write a machine learning test library. Go check it out!Over the

How to Implement a Machine Learning Algorithm

Tweet Share Share Google Plus Implementing a machine learning algorithm in code can teach you a

How To Become A Machine Learning Engineer: Learning Path

How To Become A Machine Learning Engineer: Learning PathWe will walk you through all the aspects of machine learning from simple linear regressions to the

How to Learn a Machine Learning Algorithm

Tweet Share Share Google Plus The question of how to learn a machine learning algorithm has come

How To Get Better Machine Learning Performance

Tweet Share Share Google Plus 32 Tips, Tricks and Hacks That You Can Use To Make Better Predicti

Quick and Dirty Data Analysis for your Machine Learning Problem

Tweet Share Share Google Plus A part of having a good understanding of the machine learning prob

How To Learn Any Machine Learning Tool

Tweet Share Share Google Plus Machine learning tools save you time by automating aspects of a ma

How to Tune a Machine Learning Algorithm in Weka

Tweet Share Share Google Plus Weka is the perfect platform for learning machine learning. It pro

How to Normalize and Standardize Your Machine Learning Data in Weka

Tweet Share Share Google Plus Machine learning algorithms make assumptions about the dataset you

How to Layout and Manage Your Machine Learning Project

Tweet Share Share Google Plus Project layout is critical for machine learning projects just as i

How to Better Understand Your Machine Learning Data in Weka

Tweet Share Share Google Plus It is important to take your time to learn about your data when st

Your Guide to AI and Machine Learning at re:Invent 2018

re:Invent 2018 is almost here! As you plan your agenda, artificial intelligence (AI) is undoubtedly a hot topic on your list. This year w