1. 程式人生 > >How to Tune a Machine Learning Algorithm in Weka

How to Tune a Machine Learning Algorithm in Weka

Weka is the perfect platform for learning machine learning. It provides a graphical user interface for exploring and experimenting with machine learning algorithms on datasets, without you having to worry about the mathematics or the programming.

In a previous post we looked at how to design and run an experiment with 3 algorithms on a dataset and how to analyse and report the results.

Manhattan Skyline

Manhattan Skyline, because we are going to be looking at using Manhattan distance with the k-nearest neighbours algorithm.
Photo by Tim Pearce, Los Gatos, some rights reserved.

In this post you will discover how to use Weka Experimenter to improve your results and get the most out of a machine learning algorithm. If you follow along the step-by-step instructions, you will design and run your an algorithm tuning machine learning experiment in under five minutes.

Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

1. Download Weka and Install

Visit the Weka Download page and locate a version of Weka suitable for your computer (Windows, Mac or Linux).

Weka requires Java. You may already have Java installed and if not, there are versions of Weka listed on the download page (for Windows) that include Java and will install it for you. I’m on a Mac myself, and like everything else on Mac, Weka just works out of the box.

If you are interested in machine learning, then I know you can figure out how to download and install software into your own computer.

2. Start Weka

Start Weka. This may involve finding it in program launcher or double clicking on the weka.jar file. This will start the Weka GUI Chooser.

Weka GUI Chooser

Weka GUI Chooser

The Weka GUI Chooser lets you choose one of the Explorer, Experimenter, KnowledgeExplorer and the Simple CLI (command line interface).

Click the “Experimenter” button to launch the Weka Experimenter.

The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze the results. It’s a powerful tool.

3. Design Experiment

Click the “New” button to create a new experiment configuration.

Test Options

The experimenter configures the test options for you with sensible defaults. The experiment is configured to use Cross Validation with 10 folds. It is a “Classification” type problem and each algorithm + dataset combination is run 10 times (iteration control).

Ionosphere Dataset

Let’s start out by selecting the dataset.

  1. In the “Datasets” select click the “Add new…” button.
  2. Open the “data“directory and choose the “ionosphere.arff” dataset.

The Ionosphere Dataset is a classic machine learning dataset. The problem is to predict the presence (or not) of free electron structure in the ionosphere given radar signals. It is comprised of 16 pairs of real-valued radar signals (34 attributes) and a single class attribute with two values: good and bad radar returns.

Tuning k-Nearest Neighbour

In this experiment we are interested in tuning the k-nearest neighbor algorithm (kNN) on the dataset. In Weka this algorithm is called IBk (Instance Based Learner).

The IBk algorithm does not build a model, instead it generates a prediction for a test instance just-in-time. The IBk algorithm uses a distance measure to locate k “close” instances in the training data for each test instance and uses those selected instances to make a prediction.

In this experiment, we are interested to locate which distance measure to use in the IBk algorithm on the Ionosphere dataset. We will add 3 versions of this algorithm to our experiment:

Euclidean Distance

  1. Click “Add new…” in the “Algorithms” section.
  2. Click the “Choose” button.
  3. Click “IBk” under the “lazy” selection.
  4. Click the “OK” button on the “IBk” configuration.

This will add the IBk algorithm with Euclidean distance, the default distance measure.

Manhattan Distance

  1. Click “Add new…” in the “Algorithms” section.
  2. Click the “Choose” button.
  3. Click “IBk” under the “lazy” selection.
  4. Click on the name of the “nearestNeighborSearchAlgorithm” in the configuration for IBk.
  5. Click the “Choose” button for the “distanceFunction” and select “ManhattanDistance“.
  6. Click the “OK” button on the “nearestNeighborSearchAlgorithm” configuration.
  7. Click the “OK” button on the “IBk” configuration.
Select a distance measures for IBk

Select a distance measures for IBk

This will add the IBk algorithm with Manhattan Distance, also known as city block distance.

Chebyshev Distance

  1. Click “Add new…” in the “Algorithms” section.
  2. Click the “Choose” button.
  3. Click “IBk” under the “lazy” selection.
  4. Click on the name of the “nearestNeighborSearchAlgorithm” in the configuration for IBk.
  5. Click the “Choose” button for the “distanceFunction” and select “ChebyshevDistance“.
  6. Click the “OK” button on the “nearestNeighborSearchAlgorithm” configuration.
  7. Click the “OK” button on the “IBk” configuration.

This will add the IBk algorithm with Chebyshev Distance, also known as city chessboard distance.

4. Run Experiment

Click the “Run” tab at the top of the screen.

Run the experiment in weka

This tab is the control panel for running the currently configured experiment.

Click the big “Start” button to start the experiment and watch the “Log” and “Status” sections to keep an eye on how it is doing.

5. Review Results

Click the “Analyse” tab at the top of the screen.

This will open up the experiment results analysis panel.

Algorithm Rank

The first thing we want to know is which algorithm was the best. We can do that by ranking the algorithms by the number of times a given algorithm beat the other algorithms.

  1. Click the “Select” button for the “Test base” and choose “Ranking“.
  2. Now Click the “Perform test” button.

The ranking table shows the number of statistically significant wins each algorithm has had against all other algorithms on the dataset. A win, means an accuracy that is better than the accuracy of another algorithm and that the difference was statistically significant.

Algorithm ranking in the Weka explorer for the Ionosphere dataset

Algorithm ranking in the Weka explorer for the Ionosphere dataset

We can see the Manhattan Distance variation is ranked at the top and that the Euclidean Distance variation is ranked down the bottom. This is encouraging, it looks like we have found a configuration that is better than the algorithm default for this problem.

Algorithm Accuracy

Next we want to know what scores the algorithms achieved.

  1. Click the “Select” button for the “Test base” and choose the “IBk” algorithm with “Manhattan Distance” in the list and click the “Select” button.
  2. Click the check-box next to “Show std. deviations“.
  3. Now click the “Perform test” button.

In the “Test output” we can see a table with the results for 3 variations of the IBk algorithm. Each algorithm was run 10 times on the dataset and the accuracy reported is the mean and the standard deviation in rackets of those 10 runs.

Table of algorithm classification accuracy on the Ionosphere dataset in the Weka Explorer

Table of algorithm classification accuracy on the Ionosphere dataset in the Weka Explorer

We can see that IBk with Manhattan Distance achieved an accuracy of 90.74% (+/- 4.57%) which was better than the default of Euclidean Distance that had an accuracy of 87.10% (+/- 5.12%).

The little *” next to the result for IBk with Euclidean Distance tells us that the accuracy results for the Manhattan Distance and Euclidean Distance variations of IBk were drawn from different populations, that the difference in the results is statistically significant.

We can also see that there is no “*” for the results of IBk with Chebyshev Distance indicating that the difference in the results between the Manhattan Distance and Chebyshev Distance variations of IBk was not statistically significant.

Summary

In this post you discovered how to configure a machine learning experiment with one dataset and three variations of an algorithm in Weka. You discovered how you can use the Weka experimenter to tune the parameters of machine learning algorithm on a dataset and analyze the results.

If you made it this far, why not:

  • See if you can further tune IBk and get a better result (and leave a comment to tell us)
  • Design and run an experiment to tune the k parameter of IBk.

Want Machine Learning Without The Code?

Master Machine Learning With Weka

Develop Your Own Models in Minutes

…with just a few a few clicks

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring The Machine Learning To
Your Own Projects

Skip the Academics. Just Results.


相關推薦

How to Tune a Machine Learning Algorithm in Weka

Tweet Share Share Google Plus Weka is the perfect platform for learning machine learning. It pro

How to Implement a Machine Learning Algorithm

Tweet Share Share Google Plus Implementing a machine learning algorithm in code can teach you a

How to Learn a Machine Learning Algorithm

Tweet Share Share Google Plus The question of how to learn a machine learning algorithm has come

How To Load CSV Machine Learning Data in Weka (如何在Weka中載入CSV機器學習資料)

How To Load CSV Machine Learning Data in Weka 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/load-csv-machine-learning-data-weka/

How to Transform Your Machine Learning Data in Weka

Tweet Share Share Google Plus Often your raw data for machine learning is not in an ideal form f

How to become a machine learning engineer: A cheat sheet

Machine learning engineers--i.e., advanced programmers who develop artificial intelligence (AI) machines and systems that can learn and apply knowledge--ar

How To Become A Machine Learning Engineer: Learning Path

How To Become A Machine Learning Engineer: Learning PathWe will walk you through all the aspects of machine learning from simple linear regressions to the

How to Work Through a Regression Machine Learning Project in Weka Step

Tweet Share Share Google Plus The fastest way to get good at applied machine learning is to prac

How to Normalize and Standardize Your Machine Learning Data in Weka

Tweet Share Share Google Plus Machine learning algorithms make assumptions about the dataset you

How to Better Understand Your Machine Learning Data in Weka

Tweet Share Share Google Plus It is important to take your time to learn about your data when st

【轉】How to initialize a two-dimensional array in Python?

use obj class amp example list tty address add 【wrong way:】 m=[[element] * numcols] * numrowsfor example: >>> m=[[‘a‘] *3] * 2&g

How to setup a slave for replication in 6 simple steps with Percona XtraBackup

second path binlog ica direct isam fetch owin value Data is, by far, the most valuable part of a system. Having a backup done systema

How to write a cell address encoder in ruby.

How to write a cell address encoder in ruby.I know that (unless you make a living building spreadsheets) you will probably never had to write an encoder li

How to Create a Simple Neural Network in Python

Neural networks (NN), also called artificial neural networks (ANN) are a subset of learning algorithms within the machine learning field that are loosely b

How to Apply Industrial Machine Learning

The concept of machine learning is becoming better understood as we increasingly interact with it every day. From Netflix and Amazon recommendations, to Si

Ask HN: How to find a remote client based in the US?

We have a small software development company based in Europe, about 10 developers with 5-10 years of experience in mostly fullstack product development pro

How to deliver on Machine Learning projects

As Machine Learning (ML) is becoming an important part of every industry, the demand for Machine Learning Engineers (MLE) has grown dramatically. MLEs comb

6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study

This goes back to what I originally stated. If you don't understand the basics, don't tackle an algorithm from scratch. For the Perceptron, let's go ahead

How To Build A Money Data Type In JavaScript

Last time I wrote a step-by-step example of how to apply Inside Out Test-Driven Development to a problem using JavaScript. That post used the Number type t

How to Sort a HashMap by Values in Ascending and Descending Order in Java 8

In the last article, I have shown you how to sort a Map in Java 8 by keys and today, I'll teach you how to sort a Map by values using Java 8 features e.g.