Feature Selection in Python with Scikit

阿新 • • 發佈：2019-01-12

Not all data attributes are created equal. More is not always better when it comes to attributes or columns in your dataset.

In this post you will discover how to select attributes in your data before creating a machine learning model using the

scikit-learn library.

Update: For a more recent tutorial on feature selection in Python see the post:

Cut Down on Your Options with Feature Selection
Photo by Josh Friedman, some rights reserved

Select Features

Feature selection is a process where you automatically select those features in your data that contribute most to the prediction variable or output in which you are interested.

Having too many irrelevant features in your data can decrease the accuracy of the models. Three benefits of performing feature selection before modeling your data are:

Reduces Overfitting: Less redundant data means less opportunity to make decisions based on noise.
Improves Accuracy: Less misleading data means modeling accuracy improves.

Reduces Training Time: Less data means that algorithms train faster.

Two different feature selection methods provided by the scikit-learn Python library are Recursive Feature Elimination and feature importance ranking.

Need help with Machine Learning in Python?

Take my free 2-week email course and discover data prep, algorithms and more (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

Recursive Feature Elimination

The Recursive Feature Elimination (RFE) method is a feature selection approach. It works by recursively removing attributes and building a model on those attributes that remain. It uses the model accuracy to identify which attributes (and combination of attributes) contribute the most to predicting the target attribute.

This recipe shows the use of RFE on the Iris floweres dataset to select 3 attributes.

Recursive Feature Elimination Python

# Recursive Feature Elimination
from sklearn import datasets
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
# load the iris datasets
dataset = datasets.load_iris()
# create a base classifier used to evaluate a subset of attributes
model = LogisticRegression()
# create the RFE model and select 3 attributes
rfe = RFE(model, 3)
rfe = rfe.fit(dataset.data, dataset.target)
# summarize the selection of the attributes
print(rfe.support_)
print(rfe.ranking_)

1234567891011121314

# Recursive Feature Eliminationfromsklearn importdatasetsfromsklearn.feature_selection importRFEfromsklearn.linear_model importLogisticRegression# load the iris datasetsdataset=datasets.load_iris()# create a base classifier used to evaluate a subset of attributesmodel=LogisticRegression()# create the RFE model and select 3 attributesrfe=RFE(model,3)rfe=rfe.fit(dataset.data,dataset.target)# summarize the selection of the attributesprint(rfe.support_)print(rfe.ranking_)

For more information see the RFE method in the API documentation.

Feature Importance

Methods that use ensembles of decision trees (like Random Forest or Extra Trees) can also compute the relative importance of each attribute. These importance values can be used to inform a feature selection process.

This recipe shows the construction of an Extra Trees ensemble of the iris flowers dataset and the display of the relative feature importance.

Feature Importance with datasets.load_iris() # fit an Extra Python

# Feature Importance
from sklearn import datasets
from sklearn import metrics
from sklearn.ensemble import ExtraTreesClassifier
# load the iris datasets
dataset = datasets.load_iris()
# fit an Extra Trees model to the data
model = ExtraTreesClassifier()
model.fit(dataset.data, dataset.target)
# display the relative importance of each attribute
print(model.feature_importances_)

1234567891011

# Feature Importancefromsklearn importdatasetsfromsklearn importmetricsfromsklearn.ensemble importExtraTreesClassifier# load the iris datasetsdataset=datasets.load_iris()# fit an Extra Trees model to the datamodel=ExtraTreesClassifier()model.fit(dataset.data,dataset.target)# display the relative importance of each attributeprint(model.feature_importances_)

For more information, see the ExtraTreesClassifier method in the API documentation.

Summary

Feature selection methods can give you useful information on the relative importance or relevance of features for a given problem. You can use this information to create filtered versions of your dataset and increase the accuracy of your models.

In this post you discovered two feature selection methods you can apply in Python using the scikit-learn library.

Frustrated With Python Machine Learning?

Develop Your Own Models in Minutes

…with just a few lines of scikit-learn code

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, modeling, tuning, and much more…

Finally Bring Machine Learning To
Your Own Projects

Skip the Academics. Just Results.

Feature Selection in Python with Scikit

Select Features

Need help with Machine Learning in Python?

Recursive Feature Elimination

Feature Importance

Summary

Frustrated With Python Machine Learning?

Develop Your Own Models in Minutes

Finally Bring Machine Learning To
Your Own Projects

Feature Selection in Python with Scikit

Rescaling Data for Machine Learning in Python with Scikit

Save and Load Machine Learning Models in Python with scikit

How to Load Data in Python with Scikit

Interactive Brokers in Python with backtrader

Interactive Data Visualization in Python With Bokeh

Project Spotlight: Event Recommendation in Python with Artem Yankov

Prepare Data for Machine Learning in Python with Pandas

Feature Selection for Time Series Forecasting with Python

pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available.

Fully-adaptive Feature Sharing in Multi-Task Networks with Applications in Person Attribute Classifi

scikit-learn--Feature selection(特徵選擇)

Deploying a Python serverless function in minutes with GCP

Feature Selection: A/B Test With Tableau

Analysis of Stock Market Cycles with fbprophet package in Python

Intro to Image Processing in OpenCV with Python

Create a bot with NLU in Python @ Alex Pliutau's Blog

Working With JSON Data in Python

Caching in Django With Redis β Real Python

Analyzing Obesity in England With Python β Real Python

Feature Selection in Python with Scikit

Select Features

Need help with Machine Learning in Python?

Recursive Feature Elimination

Feature Importance

Summary

Frustrated With Python Machine Learning?

Develop Your Own Models in Minutes

Finally Bring Machine Learning To Your Own Projects

相關推薦

Finally Bring Machine Learning To
Your Own Projects