How to Transform Your Machine Learning Data in Weka
Often your raw data for machine learning is not in an ideal form for modeling.
You need to prepare or reshape it to meet the expectations of different machine learning algorithms.
In this post you will discover two techniques that you can use to transform your machine learning data ready for modeling.
After reading this post you will know:
- How to convert a real valued attribute into a discrete distribution called discretization.
- How to convert a discrete attribute into multiple real values called dummy variables.
- When to discretize or create dummy variables from your data.
Let’s get started.
- Update March/2018
Need more help with Weka for Machine Learning?
Take my free 14-day email course and discover how to use the platform step-by-step.
Click to sign-up and also get a free PDF Ebook version of the course.
Discretize Numerical Attributes
Some machine learning algorithms prefer or find it easier to work with discrete attributes.
For example, decision tree algorithms can choose split points in real valued attributes, but are much cleaner when split points are chosen between bins or predefined groups in the real-valued attributes.
Discrete attributes are those that describe a category, called nominal attributes. Those attributes that describe a category that where there is a meaning in the order for the categories are called ordinal attributes. The process of converting a real-valued attribute into an ordinal attribute or bins is called discretization.
You can discretize your real valued attributes in Weka using the Discretize filter.
The tutorial below demonstrates how to use the Discretize filter. The Pima Indians onset of diabetes dataset is used to demonstrate this filter because of the input values are real-valued and grouping them into bins may make sense.
You can download the Pima Indians onset of diabetes dataset from the UCI Machine learning repository (update: download from here). You can also access the dataset directory in your installation of Weka under the data/ directory by loading the file diabetes.arff.
1. Open the Weka Explorer.
2. Load the Pima Indians onset of diabetes dataset.
3. Click the “Choose” button for the Filter and select Discretize, it is under unsupervised.attribute.Discretize.
4. Click on the filter to configure it. You can select the indices of the attributes to discretize, the default is to discretize all attributes, which is what we will do in this case. Click the “OK” button.
5. Click the “Apply” button to apply the filter.
You can click on each attribute and review the details in the “Selected attribute” window to confirm that the filter was applied successfully.
Discretizing your real valued attributes is most useful when working with decision tree type algorithms. It is perhaps more useful when you believe that there are natural groupings within the values of given attributes.
Convert Nominal Attributes to Dummy Variables
Some machine learning algorithms prefer to use real valued inputs and do not support nominal or ordinal attributes.
Nominal attributes can be converted to real values. This is done by creating one new binary attribute for each category. For a given instance that has a category for that value, the binary attribute is set to 1 and the binary attributes for the other categories is set to 0. This process is called creating dummy variables.
You can create dummy binary variables from nominal attributes in Weka using the NominalToBinary filter.
The recipe below demonstrates how to use the NominalToBinary filter. The Contact Lenses dataset is used to demonstrate this filter because the attributes are all nominal and provide plenty of opportunity for creating dummy variables.
You can download the Contact Lenses dataset from the UCI Machine learning repository. You can also access the dataset directory in your installation of Weka under the data/ directory by loading the file contact-lenses.arff.
1. Open the Weka Explorer.
2. Load the Contact Lenses dataset.
3. Click the “Choose” button for the Filter and select NominalToBinary, it is under unsupervised.attribute.NominalToBinary.
4. Click on the filter to configure it. You can select the indices of the attributes to convert to binary values, the default is to convert all attributes. Change it to only the first attribute. Click the “OK” button.
5. Click the “Apply” button to apply the filter.
Reviewing the list of attributes will show that the age attribute has been removed and replaced with three new binary attributes: age=young, age=pre-presbyopic and age=presbyopic.
Creating dummy variables is useful for techniques that do not support nominal input variables like linear regression and logistic regression. It can also prove useful in techniques like k-nearest neighbors and artificial neural networks.
Summary
In this post you discovered how to transform your machine learning data to meet the expectations of different machine learning algorithms.
Specifically, you learned:
- How to convert real valued input attributes to nominal attributes called discretization.
- How to convert a categorical input variable to multiple binary input attributes called dummy variables.
- When to use discretization and dummy variables when modeling data.
Do you have any questions about data transforms or about this post? Ask your questions in the comments and I will do my best to answer them.
Want Machine Learning Without The Code?
Develop Your Own Models in Minutes
…with just a few a few clicks
Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…
Finally Bring The Machine Learning To
Your Own Projects
Skip the Academics. Just Results.
相關推薦
How to Transform Your Machine Learning Data in Weka
Tweet Share Share Google Plus Often your raw data for machine learning is not in an ideal form f
How To Load CSV Machine Learning Data in Weka (如何在Weka中載入CSV機器學習資料)
How To Load CSV Machine Learning Data in Weka 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/load-csv-machine-learning-data-weka/
How to Normalize and Standardize Your Machine Learning Data in Weka
Tweet Share Share Google Plus Machine learning algorithms make assumptions about the dataset you
How to Better Understand Your Machine Learning Data in Weka
Tweet Share Share Google Plus It is important to take your time to learn about your data when st
How To Load Your Machine Learning Data Into R
Tweet Share Share Google Plus You need to be able to load data into R when working on a machine
How to Tune a Machine Learning Algorithm in Weka
Tweet Share Share Google Plus Weka is the perfect platform for learning machine learning. It pro
How to Define Your Machine Learning Problem
Tweet Share Share Google Plus The first step in any project is defining your problem. You can us
How To Handle Missing Values In Machine Learning Data With Weka
Tweet Share Share Google Plus Data is rarely clean and often you can have corrupt or missing val
How to Work Through a Regression Machine Learning Project in Weka Step
Tweet Share Share Google Plus The fastest way to get good at applied machine learning is to prac
QA: How Reliable Are Your Machine Learning Systems?
In this post, you will learn about different aspects of creating a Machine Learning system with high reliability. It should be noted that system reliabilit
How to Apply Industrial Machine Learning
The concept of machine learning is becoming better understood as we increasingly interact with it every day. From Netflix and Amazon recommendations, to Si
How to become a machine learning engineer: A cheat sheet
Machine learning engineers--i.e., advanced programmers who develop artificial intelligence (AI) machines and systems that can learn and apply knowledge--ar
How to deliver on Machine Learning projects
As Machine Learning (ML) is becoming an important part of every industry, the demand for Machine Learning Engineers (MLE) has grown dramatically. MLEs comb
How to unit test machine learning code.
How to unit test machine learning code.Note: The popularity of this post has inspired me to write a machine learning test library. Go check it out!Over the
How to Implement a Machine Learning Algorithm
Tweet Share Share Google Plus Implementing a machine learning algorithm in code can teach you a
How To Become A Machine Learning Engineer: Learning Path
How To Become A Machine Learning Engineer: Learning PathWe will walk you through all the aspects of machine learning from simple linear regressions to the
How to Learn a Machine Learning Algorithm
Tweet Share Share Google Plus The question of how to learn a machine learning algorithm has come
How To Get Better Machine Learning Performance
Tweet Share Share Google Plus 32 Tips, Tricks and Hacks That You Can Use To Make Better Predicti
How to One Hot Encode Sequence Data in Python
Tweet Share Share Google Plus Machine learning algorithms cannot work with categorical data dire
Save And Finalize Your Machine Learning Model in R
Tweet Share Share Google Plus Finding an accurate machine learning is not the end of the project