Understanding Data and Machine Learning Models with Visualizations (Part 1)

阿新 • • 發佈：2018-12-29

Examining Feature-PC Correlation

On the non-interactive side, the tool also generates heatmaps with additional information about the principal components. The figure below displays a correlation matrix, between each principal component and our original set of features:

**iris**: Z-normalized (μ=0, σ=1) correlation matrix, examining correlation of each feature (y-axis) and each principal component (x-axis)

At first glance, it’s hard find meaning in just the correlation of the original features and our PCs. Often, when I show just this heatmap to others, they’d wonder what to look for.

After some discussion with others, I thought it would be more helpful to further normalize the matrix, to more immediately reveal meaningful information.

Take 2: below, we see the dot product of the explained variance per principal component with the previous correlation matrix. Essentially, we normalize by the explained variance in order to better highlight features that contribute to variance in the data and PCs.

**iris**: The normalized correlation matrix (|Z-normalized(features*PCs)|*explained_variance) gives a much clearer view of the features that contribute to variance in the dataset.

Now, we can find interesting aspects of the data more easily. As we saw in the earlier interactive plot, PC1 explains a majority of the variance in the dataset (~72.8%). Now we easily see that its contributors are primarily petal length (cm) and sepal width (cm) — features that correlate with PCs that explain the most variance are stacked-ranked on the y-axis now.

Let’s say we determine that we want to engineer features in our original dataset, to better classify each iris category. Following from the PCA analysis, we may conclude that petal length (cm) and sepal width (cm) are primary features of interest. We can iterate and engineer additional features with them (e.g. petal length/sepal width ratio, or petal length*sepal width) that may improve any classifier we train.

Understanding Data and Machine Learning Models with Visualizations (Part 1)

Examining Feature-PC Correlation

Understanding Data and Machine Learning Models with Visualizations (Part 1)

Three Ways Big Data and Machine Learning Reinvent Online Video Experience

Predict wildfire intensity using NASA satellite data and machine learning

Create Custom Machine Learning Models With Google Cloud ML Wimoxez

學習 Machine Learning Mastery With Python （1）

Marginally Interesting: Command Line Interactive Machine Learning on the JVM. Part 1: Why?

Training Machine Learning Models in Pharma and Biotech Manufacturing with Bigfinite Amazon Web Services

Training Machine Learning Models in Pharma and Biotech Manufacturing with Bigfinite

Save and Load Machine Learning Models in Python with scikit

How Data Integration and Machine Learning Improve Retention Marketing

10 must watch movies on Data Science and Machine Learning

SAS recommends government start small with AI and machine learning

Learn How to Code and Deploy Machine Learning Models on Spark Structured Streaming

Quiet log noise with Python and machine learning

Why Data Normalization is necessary for Machine Learning models

Recommended IDE for Data Scientists and Machine Learning Engineers

The Huge Role of Data Science in Artificial Intelligence and Machine Learning

How Data Science and Machine Learning Are Related Koenig IT Learning Center

The Transformation of Healthcare with AI and Machine Learning

A comprehensive Machine Learning workflow with multiple modelling using caret and caretEnsemble in…

Understanding Data and Machine Learning Models with Visualizations (Part 1)

Examining Feature-PC Correlation

相關推薦