1. 程式人生 > >Template for Working through Machine Learning Problems in Weka

Template for Working through Machine Learning Problems in Weka

When you are getting started in Weka, you may feel overwhelmed.

There are so many datasets, so many filters and so many algorithms to choose from.

There is too much choice. There are too many things you could be doing.

Too much Choice

Too much Choice
Photo by emilio labrador, some rights reserved.

Structured process is key.

I have talked about process and the need for tasks like spot checking algorithms to overcome the overwhelm and start learning useful things about your problem. In this post I want to give you a simplified version of this process that you can use to practice applied machine learning.

Problem Solving Template

This template a streamlined process that focuses on learning about the problem, a good solution, and doing so very quickly.

It is organized into the six-steps of applied machine learning. Each step is broken down into specific questions for you to answer by using the Weka Explorer

and the Weka Experimenter graphical user interfaces.

The six-steps of the process and their objectives are as follows:

  1. Problem Definition
  2. Data Analysis
  3. Data Preparation
  4. Evaluate Algorithms
  5. Improve Results
  6. Present Results

In the following sections I will summarize the key questions to answers for each step of the process. You might like to print out these questions or copy them into a document to create your own template document.

Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

1. Problem Definition

The objective of the problem definition is to understand and clearly describe the problem that is being solved.

Problem Description

  1. What is an informal description of the problem?
  2. What is a formal description of the problem?
  3. What assumptions do you have about the problem?

Provided Data

  1. What constraints were imposed to select the data?
  2. Define each attribute in the provided dataset.

2. Data Analysis

The objective of data analysis is to understand the information available that will be used to develop a model.

Attribute Histograms

Attribute Histograms Showing Class Values

  1. What data types are the attributes?
  2. Are there missing or corrupted values?
  3. Review the distributions of the attributes, what do you notice?
  4. Review the distributions of the class values, what do you notice?
  5. Review the attribute distributions with class values in the histograms, what do you notice?
  6. Review pairwise scatter plots of attributes, what do you notice?

3. Data Preparation

The objective of data preparation is to discover and expose the structure in the dataset.

  1. Normalize the dataset
  2. Standardize the dataset
  3. Square the dataset
  4. Discretize attributes (if integer)
  5. Remove and/or replace missing values (if present)
  6. Create transforms of the dataset to test assumptions raised in the Problem Definition

4. Evaluate Algorithms

The objective of evaluating algorithms is to develop a test harness and baseline accuracy from which to improve.

Algorithm ranking when analyzing results in the Weka Experimenter

Algorithm ranking when analyzing results in the Weka Experimenter

  1. Explore different classification algorithms
  2. Design and run a spot-check experiment
  3. Review and interpret the algorithm rankings
  4. Review and interpret the algorithm accuracy
  5. Repeat process as needed

5. Improve Results

The objective of improving the results is to leverage results to develop more accurate models.

Algorithm Tuning

  1. Explore different algorithm configurations
  2. Design and run a algorithm tuning experiment
  3. Review and interpret the algorithm rankings
  4. Review and interpret the algorithm accuracy
  5. Repeat process as needed

Ensemble Methods

  1. Explore different ensemble methods
  2. Design and run a algorithm ensemble experiment
  3. Review and interpret the ensemble rankings
  4. Review and interpret the ensemble accuracy
  5. Repeat process as needed
  6. Can you improve results with other meta algorithms, such as thresholding?
  7. Can you improve results by using other algorithms in the same family as algorithms that are performing well?

6. Present Results

The objective of presenting the results is to describe problem and solution so that it can be understood by third parties.

Complete the following section to summarize the problem and solution.

  1. What is the Problem?
  2. What is the Solution?
  3. What were the Findings?
  4. What are the Limitations?
  5. What are the Conclusions?

How To Use

There are a number of interesting datasets in the “data” directory of the Weka installation. There are also many datasets on the UCI machine learning repository that you can download and work on.

Select a problem and work through it using this template. You will be surprised at how much you learn and how much a structured process like this can help to keep you focused.

Summary

In this post you learned about a structured template for working the process of applied machine learning. This template can be printed and used step-by-step to work through a problem in the Weka Machine Learning Workbench.

Answering the specific questions in each step of the template will quickly build up a deeper understanding of the problem and your solution to it, as it unfolds. This is invaluable, like a scientists notebook in the lab.


Want Machine Learning Without The Code?

Master Machine Learning With Weka

Develop Your Own Models in Minutes

…with just a few a few clicks

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring The Machine Learning To
Your Own Projects

Skip the Academics. Just Results.


相關推薦

Template for Working through Machine Learning Problems in Weka

Tweet Share Share Google Plus When you are getting started in Weka, you may feel overwhelmed. Th

How to Work Through a Regression Machine Learning Project in Weka Step

Tweet Share Share Google Plus The fastest way to get good at applied machine learning is to prac

How To Load CSV Machine Learning Data in Weka (如何在Weka中載入CSV機器學習資料)

How To Load CSV Machine Learning Data in Weka 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/load-csv-machine-learning-data-weka/

How to Normalize and Standardize Your Machine Learning Data in Weka

Tweet Share Share Google Plus Machine learning algorithms make assumptions about the dataset you

How to Tune a Machine Learning Algorithm in Weka

Tweet Share Share Google Plus Weka is the perfect platform for learning machine learning. It pro

How to Better Understand Your Machine Learning Data in Weka

Tweet Share Share Google Plus It is important to take your time to learn about your data when st

How to Transform Your Machine Learning Data in Weka

Tweet Share Share Google Plus Often your raw data for machine learning is not in an ideal form f

Note for Coursera《Machine Learning》1(1) | What is machine learning?

sed xpl some pro form computer from com init What is Machine Learning? Two definitions of Machine Learning are offered. Arthur Samuel des

10 Machine Learning Examples in JavaScript

Machine learning libraries are becoming faster and more accessible with each passing year, showing no signs of slowing down. While traditionally Python has

Applitools Recognized as a Top Artificial Intelligence and Machine Learning Solution in DevOps

According to the report, AI is now the number one strategic enterprise IT investment priority in 2018. Applitools developed the first and only AI-powered i

Top 5 Machine Learning Libraries in Python

(Sponsors) Get started learning Python with DataCamp's free Intro to Python tutorial. Learn Data Science by completing interactive coding challenges and

Assistance with Planning is on the Way Through Machine Learning QAD Blog

If we look at today's environment, planners are often challenged with ordering long lead-time parts, making it difficult to know what to order from sub sup

Help improve lives through Machine Learning by joining the AWS DeepLens Challenge!

Today, we’re unveiling a fresh approach to the AWS DeepLens Challenge. We are bringing you four challenges to choose from–sustainability, games, h

Yantra Learning, First Machine Learning Competition in Nepal: Hackathon Edition

Robotics Association of Nepal (RAN) in association with Fusemachines, Inc., Developers Session [Intel Software Nepal Representative] and Synergy Tech Softw

Will "Leaky" Machine Learning Usher in a New Wave of Lawsuits?

A computer science professor at Cornell University has a new twist on Marc Andreessen’s 2011 pronouncement that software is “eating the world.”  Accordi

Training Machine Learning Models in Pharma and Biotech Manufacturing with Bigfinite Amazon Web Services

Creating and training machine learning models has become less time consuming and more cost efficient thanks to technology advancements like open source sof

Tune Machine Learning Algorithms in R (random forest case study)

Tweet Share Share Google Plus It is difficult to find a good machine learning algorithm for your

Machine Learning Datasets in R (10 datasets you can use right now)

Tweet Share Share Google Plus You need standard datasets to practice machine learning. In this s

How to Build an Ensemble Of Machine Learning Algorithms in R (ready to use boosting, bagging and stacking)

Tweet Share Share Google Plus Ensembles can give you a boost in accuracy on your dataset. In thi