1. 程式人生 > >A Tour of the Weka Machine Learning Workbench

A Tour of the Weka Machine Learning Workbench

Weka is an easy to use and powerful machine learning platform.

It provides a large number of machine learning algorithms, feature selection methods and data preparation filters.

In this post you will discover the Weka machine learning workbench and take a tour of the key interfaces that you can use on your machine learning projects.

After reading this post you will know about:

  • The interfaces supported by the Weka machine learning workbench.
  • Those interfaces that are recommended from beginners to work through their problems, and those that are not.
  • How to at least click through each key interface you will need in Weka and generate a result.

Let’s get started.

Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

Weka GUI Chooser

The entry point into the Weka interface is the Weka GUI Chooser.

It is an interface that lets you choose and launch a specific Weka environment.

Weka GUI Chooser

Screenshot of the Weka GUI Chooser

In addition to providing access to the core Weka tools, it also has a number of additional utilities and tools provided in the menu.

There two important utilities to note in the “Tools” menu:

1. The Package Manager which lets you browse and install third party add-ons to Weka such as new algorithms.

Weka Package Manager

Screenshot of the Weka Package Manager

2. The ARFF-Viewer that allows you to load and transform datasets and save them in ARFF format.

Weka ARFF-Viewer

Screenshot of the Weka ARFF-Viewer

Weka Explorer

The Weka Explorer is designed to investigate your machine learning dataset.

It is useful when you are thinking about different data transforms and modeling algorithms that you could investigate with a controlled experiment later. It is excellent for getting ideas and playing what-if scenarios.

The interface is divided into 6 tabs, each with a specific function:

The preprocess tab is for loading your dataset and applying filters to transform the data into a form that better exposes the structure of the problem to the modeling processes. Also provides some summary statistics about loaded data.

Load a standard dataset in the data/ directory of your Weka installation, specifically data/breast-cancer.arff. This is a binary classification problem that we will use on this tour.

Weka Explorer Preprocess Tab

Screenshot of the Weka Explorer Preprocess Tab

The classify tab is for training and evaluating the performance of different machine learning algorithms on your classification or regression problem. Algorithms are divided up into groups, results are kept in a result list and summarized in the main Classifier output.

Click the “Start” button to run the ZeroR classifier on the dataset and summarize the results.

Weka Explorer Classify Tab

Screenshot of the Weka Explorer Classify Tab

The cluster tab is for training and evaluating the performance of different unsupervised clustering algorithms on your unlabeled dataset. Like the Classify tab, algorithms are divided into groups, results are kept in a result list and summarized in the main Clusterer output.

Click the “Start” button to run the EM clustering algorithm on the dataset and summarize the results.

Weka Explorer Cluster Tab

Screenshot of the Weka Explorer Cluster Tab

The associate tab is for automatically finding associations in a dataset. The techniques are often used for market basket analysis type data mining problems and require data where all attributes are categorical.

Click the “Start” button to run the Apriori association algorithm on the dataset and summarize the results.

Weka Explorer Associate Tab

Screenshot of the Weka Explorer Associate Tab

The select attributes tab is for performing feature selection on the loaded dataset and identifying those features that are most likely to be relevant in developing a predictive model.

Click the “Start” button to run the CfsSubsetEval algorithm with a BestFirst search on the dataset and summarize the results.

Weka Explorer Select Attributes Tab

Screenshot of the Weka Explorer Select Attributes Tab

The visualize tab is for reviewing pairwise scatterplot matrix of each attribute plotted against every other attribute in the loaded dataset. It is useful to get an idea of the shape and relationship of attributes that may aid in data filtering, transformation and modeling.

Increase the point size and the jitter and click the “Update” button to set an improved plot of the categorical attributes of the loaded dataset.

Weka Explorer Visualize Tab

Weka Explorer Visualize Tab

Weka Experiment Environment

The Weka Experiment Environment is for designing controlled experiments, running them, then analyzing the results collected.

It is the next step after using the Weka Explorer, where you can load up one or more views of your dataset and a suite of algorithms and design an experiment to find the combination that results in the best performance.

The interface is split into 3 tabs.

The setup tab is for designing an experiment. This includes the file where results are written, the test setup in terms of how algorithms are evaluated, the datasets to model and the algorithms to model them. The specifics of an experiment can be saved for later use and modification.

  • Click the “New” button to create a new Experiment.
  • Click the “Add New…” button in the Datasets pane and select the data/diabetes.arff dataset.
  • Click the “Add New…” button in the “Algorithms” pane and click “OK” to add the ZeroR algorithm.
Weka Experiment Environment Setup Tab

Screenshot of the Weka Experiment Environment Setup Tab

The run tab is for running your designed experiments. Experiments can be started and stopped. There is not a lot to it.

Click the “Start” button to run the small experiment you designed.

Weka Experiment Environment Run Tab

Screenshot of the Weka Experiment Environment Run Tab

The analyze tab is for analyzing the results collected from an experiment. Results can be loaded from a file, from the database or from an experiment just completed in the tool. A number of performance measures are collected from a given experiment which can be compared between algorithms using tools like statistical significance.

  • Click the “Experiment” button the “Source” pane to load the results from the experiment you just ran.
  • Click the “Perform Test” button to summary the classification accuracy results for the single algorithm in the experiment.
Weka Experiment Environment Analyse Tab

Screenshot of the Weka Experiment Environment Analyse Tab

Weka KnowledgeFlow Environment

The Weka KnowledgeFlow Environment is a graphical workflow tool for designing a machine learning pipeline from data source to results summary, and much more. Once designed, the pipeline can be executed and evaluated within the tool.

Weka KnowledgeFlow Environment

Screenshot of the Weka KnowledgeFlow Environment

The KnowledgeFlow Environment is a powerful tool that I do not recommend for beginners until after they have mastered use of the Weka Explorer and Weka Experiment Environment.

Weka Workbench

The Weka Workbench is an environment that combines all of the GUI interfaces into a single interface.

It is useful if you find yourself jumping a lot between two or more different interfaces, such as between the Explorer and the Experiment Environment. This can happen if you try out a lot of what if’s in the Explorer and quickly take what you learn and put it into controlled experiments.

Weka Workbench

Screenshot of the Weka Workbench

Weka SimpleCLI

Weka can be used from a simple Command Line Interface (CLI).

This is powerful because you can write shell scripts to use the full API from command line calls with parameters, allowing you to build models, run experiments and make predictions without a graphical user interface.

The SimpleCLI provides an environment where you can quickly and easily experiment with the Weka command line interface commands.

Weka SimpleCLI

Screenshot of the Weka SimpleCLI

Like the Weka KnowledgeFlow Environment, this is a powerful tool that I do not recommend for beginners until they have mastered use of the Weka Explorer and Weka Experiment Environment.

Weka Java API

Weka can also be used from the Java API.

This is for Java programmers and can be useful when you want to incorporate learning or prediction into your own applications.

This is an advanced feature that I do not recommend for beginners until they have mastered use of the Weka Explorer and Weka Experiment Environment.

Summary

In this post you discovered the Weka Machine Learning Workbench. You went on a tour of the key interfaces that you can use to explore and develop predictive machine learning models on your own problems.

Specifically, you learned about:

  • The Weka Explorer for data preparation, feature selection and evaluating algorithms.
  • The Weka Experiment Environment for designing, running and analyzing the results from controlled experiments.
  • The Weka KnowledgeFlow Environment for graphically designing and executing machine learning pipelines.
  • The Weka Workbench that incorporates all of the Weka tools into a single convenient interface.
  • The Weka SimpleCLI for using the Weka API from the command line.
  • The Weka Java API that can be used to incorporate learning and prediction into your own applications.

Do you have any questions about the Weka machine learning workbench or about this post? Ask your questions in the comments below and I will do my best to answer them.


Want Machine Learning Without The Code?

Master Machine Learning With Weka

Develop Your Own Models in Minutes

…with just a few a few clicks

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring The Machine Learning To
Your Own Projects

Skip the Academics. Just Results.


相關推薦

A Tour of the Weka Machine Learning Workbench

Tweet Share Share Google Plus Weka is an easy to use and powerful machine learning platform. It

What is the Weka Machine Learning Workbench

Tweet Share Share Google Plus Machine learning is an iterative process rather than a linear proc

A Tour of Machine Learning Algorithms

Tweet Share Share Google Plus In this post, we take a tour of the most popular machine learning

The future of Information Architecture: Machine Learning, Voice User Interface and Augmented…

The future of Information Architecture: Machine Learning, Voice User Interface and Augmented RealityIntroductionIn the information age we live in, it’s mor

雜談 | Why RTOS & The Art Of Coding & Machine Learning & Linux

這裡用來記錄學習路上偶爾一些感想以及大佬分享的一些知識~ 文章目錄 1.Why RTOS 1.1.嵌入式系統軟體 1.1.1.輪詢系統 1.1.2.前後臺系統 1.1.3.多執行緒系統 1.

後端程序員之路 52、A Tour of Go-2

run arrays primes var auto 程序 pointer ase tex # flowcontrol - for - for i := 0; i < 10; i++ { - for ; sum < 1000;

A glimpse of Support Vector Machine

gui 機器 相同 即使 vector ref kernel 好的 imp 支持向量機(support vector machine, 以下簡稱svm)是機器學習裏的重要方法,特別適用於中小型樣本、非線性、高維的分類和回歸問題。本篇希望在正篇提供一個svm的簡明

解決錯誤:Your ApplicationContext is unlikely to start due to a @ComponentScan of the default package.

context 錯誤 com sta spa can src 直接 代碼 原因是代碼直接放在默認包裏邊,比如src\main\java目錄下 應該在src\main\java下建立子目錄,比如src\main\java\com\test 這樣的話,代碼就在com.tes

CHAPTER 1 ----- a tour of computer sysytems(2)

reads 地址 cpu mach sin sel error evel over 1.3 It pays to understand how compilation systems work Why programmers need to understand how

CS:APP CH01.A Tour of Computer Systems

purple 編程 文本文 linker pre 二進制文件 程序語言 TE 處理 程序語言到機器指令的過程 1.hello.c 源程序一個文本文件 經過預處理(pre-processer)成為 hello.i 2.hello.i  修改的源程序 經過編譯處理(compi

A Tour of Go: Basics 1

unicode x64 連續 變量名 and export int asi constant Packages, variables and functions Packages packages中,以大寫字母開頭的name是exported name,當import pa

A Tour of Go: Basics 2

原則 panic https 動作 語言 表達 包括 for cas For For語句有三個基本部分組成,以分號隔開: 初始語句:只在第一次循環開始前執行,通常就是變量定義和初始化,這裏定義的變量作用範圍只在For循環本身。 條件表達式:每一次循環開始前執行,當fals

A Tour of Go: Basics 3

容量 返回 nil cap 創建 都是 變量 code str Struct 用指針和用變量名引用struct裏的值,用法是一樣的。Struct初始化語法: type Vertex struct { X, Y int } var ( v1 = Vertex{

A Tour of Go實踐總結

轉載 1,go的變數宣告順序是:”先寫變數名,再寫型別名“,此與C/C++的語法孰優孰劣,可見下文解釋:http://blog.golang.org/gos-declaration-syntax 2,go是通過package來組織的(與python類似),只有package名為main的包可以包含main

A Gentle Introduction to Applied Machine Learning as a Search Problem (譯文)

​ A Gentle Introduction to Applied Machine Learning as a Search Problem 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/applied-m

How To Size Your Apache Flink® Cluster: A Back-of-the-Envelope Calculation

January 11, 2018 - Apache Flink Robert Metzger and Chris Ward A favorite session from Flink Forward Berlin 2017 was Robert

Brain-Computer Interfaces and Augmented Reality: A State of the Art

Brain-Computer Interfaces and Augmented Reality: A State of the Art 腦機介面和增強現實:最先進的技術 文章目錄 Brain-Computer Interfaces and Augmented R

推薦系統論文筆記(2):Towards the Next Generation of Recommender Systems:A Survey of the State-of-the-Art ....

一、基本資訊 論文題目:《Towards the Next Generation of Recommender Systems:A Survey of the State-of-the-Art and Possible Extensions》 發表時間:July 2005,IEEE Tran

Getting the IP address of the current machine using Java

On a computer that has one network adapter, the IP address that is chosen is the Primary IP address of the network adaptor in the computer. However, on a

Presidential alert: Why did Trump just text me about a 'test of the National Wireless Emergency Alert System' and how does it wo

Donald Trump is texting everyone in the US the exact same message. "THIS IS A TEST of the National Wireless Emergency Alert System," the message will begin