1. 程式人生 > >How To Get Baseline Results And Why They Matter

How To Get Baseline Results And Why They Matter

In my courses and guides, I teach the preparation of a baseline result before diving into spot checking algorithms.

A student of mine recently asked:

If a baseline is not calculated for a problem, will it make the results of other algorithms questionable?

He went on to ask:

If other algorithms do not give better accuracy than the baseline, what lesson should we take from it? Does it indicate that the data set does not have prediction capability?

These are great questions, they get to the heart of why we create a baseline in the first place and the filtering power it provides.

In this post, you will learn why we create a baseline prediction result, how to create a baseline in general and for specific problem types, and how you can use it to inform you on the data you have available and the algorithms you are using.

Baseline Machine Learning Results

Baseline Machine Learning Results
Photo by tracy the astonishing

, some rights reserved

Finding Data You Can Model

When you are practicing machine learning, each problem is unique. You very likely have not seen it before and you cannot know what algorithms to use, what data attributes will be useful or even whether the problem can be effectively modeled.

I personally find this the most exciting time.

If you are in this situation, you are very likely collecting the data together yourself from disparate sources and selecting attributes that you think might be valuable. Feature selection and feature engineering will be required.

During this process, you need to get some idea that the problem that you are iteratively trying to define and gather data for provides a useful base for making predictions.

A Useful Point For Comparison

You need to spot check algorithms on the problem to see if you have a useful basis for modeling your prediction problem. But how do you know the results are any good?

You need a basis for comparison of results. You need a meaningful reference point to which to compare.

Once you start collecting results from different machine learning algorithms, a baseline result can tell you whether a change is adding value.

It is so simple, yet so powerful. Once you have a baseline, you can add or change the data attributes, the algorithms you are trying or the parameters of the algorithms, and know whether you have improved your approach or solution to the problem.

Calculate a Baseline Result

There are common ways that you can use to calculate a baseline result.

A baseline result is the simplest possible prediction. For some problems, this may be a random result, and in others in may be the most common prediction.

  • Classification: If you have a classification problem, you can select the class that has the most observations and use that class as the result for all predictions. In Weka this is called ZeroR. If the number of observations is equal for all classes in your training dataset, you can select a specific class or enumerate each class and see which gives the better result in your test harness.
  • Regression: If you are working on a regression problem, you can use a central tendency measure as the result for all predictions, such as the mean or the median.
  • Optimization: If you are working on an optimization problem, you can use a fixed number of random samples in the domain.

It can be a valuable use of your time to brainstorm all of the simplest possible results that you can test for your problem, and then go ahead and evaluate them. The results can be a very effective filtering method. If more advanced modeling methods cannot outperform simple central tendencies then you know you have work to do, most likely better defining or reframing the problem.

The accuracy score you use matters. You must select the accuracy score you plan to use before you calculate your baseline. The score must be related and inform the question you set out to answer by working on the problem in the first place.

If you are working on a classification problem, you may want to look at the Kappa statistic, which gives you an accuracy score that is normalized by the baseline. The baseline accuracy is 0 and scores above zero show an improvement over the baseline.

Compare Results to the Baseline

It is OK if your baseline is a poor result. It may indicate a particular difficulty with the problem or it may mean that your algorithms have a lot of room for improvement.

It does matter if you cannot get an accuracy better than your baseline. It suggests that the problem may be difficult.

You may need to collect more or different data from which to model. You may need to look into using different and perhaps more powerful machine learning algorithms or algorithm configurations. Ultimately, after rounds of these types of changes, you may have a problem that is resistant to prediction and may need to be re-framed.

Action Steps

Your action step for this post is to start investigating your next data problem with a baseline from which you can compare all results.

If you are already working on a problem, include a baseline result and use that to interpret all other results.

Share your results, what is your problem and what baseline are you using?

相關推薦

How To Get Baseline Results And Why They Matter

Tweet Share Share Google Plus In my courses and guides, I teach the preparation of a baseline re

6 Barriers to Crypto Adoption and Why They Matter

Cryptocurrency and its foundational technology, blockchain, have the potential to be incredibly transformative for many industries. But like many other tec

How to Get Reproducible Results with Keras

Tweet Share Share Google Plus Neural network algorithms are stochastic. This means they make use

How to Get Good Results Fast with Deep Learning for Time Series Forecasting

Tweet Share Share Google Plus 3 Strategies to Design Experiments and Manage Complexity on Your P

How to get the first date and last date of the previous month? (Java)

Calendar aCalendar = Calendar.getInstance(); // add -1 month to current month aCalendar.add(Calendar.MONTH, -1); // set DATE to 1, so first date of pr

A Trip to the Library: Static vs. Dynamic Libraries and Why They’re Awesome

A Trip to the Library: Static vs. Dynamic Libraries and Why They’re AwesomeIn programming, a library is a collection of object code that can be used as a s

How To Get Started With Machine Learning in R (get results in one weekend)

Tweet Share Share Google Plus How do you get started with machine learning in R? R is a large an

question 002: dev c++ 當中如何調整字體大小?How to get the first program with C++? c++屬於什麽軟件?

space 什麽 pil get ctrl+鼠標 iostream 系統 using clu 方法:按住ctrl+鼠標滑輪滾動 c++屬於系統軟件還是應用軟件? 說哪個都不對,編譯之前屬於應用軟件,after compile ,it belongs to system so

TED - How To Get Better At The Things You Care About

things idea cte model ora through phi som 但是 TED01 - How To Get Better At The Things You Care About 昨天我發布了攻克英語口語的宣言,今天就行動。TED是我們學習口語的好地方,

How to Get the Length of File in C

code class clas body position pre -c set == How to get length of file in C //=== int fileLen(FILE *fp) { int nRet = -1; int nPosB

How to Get What You Want 如何得到你想要的

body wid share post left ear for smi 翻譯 【1】If you want something, give it away. 【2】When a farmer wants more seeds, he takes his seeds

How to get bitting code with SEC-E9 key cutting machine

sec e9 key cutter sec e9 key machine sec-e9 automatic key cutting machine sec-e9 cnc automatic key machine sec-e9 key cutting machine There

How to get Pycharm

sta 安裝 環境 框架 jet 幫助 版本控制 自動完成 change PyCharm是一種Python IDE,帶有一整套可以幫助用戶在使用Python語言開發時提高其效率的工具,比如:代碼跳轉、智能提示、自動完成、單元測試、版本控制。此外,該IDE提供了一些高級功能,

How to program BMW KOMBI and NBTwith ENET E sys cable ICOM A2

cau was member obd 6.4 ren lease better entry This is how to set up Router or DHCP server for BMW KOMBI and NBT programming with Enet e

How to use Kata Containers and CRI (containerd plugin) with Kubernetes

bsp use k8s doc ner blob ber uber net https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-contain

css:Media Queries: How to target desktop, tablet and mobile?

snippet .com void trick keyword val geo moto itl <!doctype html> <html> <head> <meta name="viewport" content="w

How To Add Google Apps and ARM Support to Genymotion v2.0+

How To Add Google Apps and ARM Support to Genymotion v2.0+ Original Source: [GUIDE] Genymotion | Installing ARM Translation and GApps - XDA-Develop

【轉】How to check HBA host and its corresponding WWPN on RHEL 5, 6 or 7?

https://access.redhat.com/solutions/55334  SOLUTION UNVERIFIED - 已更新 2018年二月6日05:35 -  English  環境 Red Hat En

how to install node.js and npm on Ubuntu

To install Node.js, type the following command in your terminal: sudo apt-get install nodejs Then install the Node package manager, npm: sud

How to write a comparison and contrast essay?

The purpose of a compare and contrast essay is to analyze the differences and/or the similarities of two distinct subjects. A good compare/contr