Learn from Top Kagglers：高階特徵工程 II

決策樹特徵選擇矩陣分解 · 發表 2018-11-01 10:13:01

摘要：這是一篇筆記，課程來自Coursera上的 How to Win a Data Science Competition: Learn from Top Kagglers 本篇文章講解在資料科學競賽中常用的特徵工程技巧，這是本篇文章的下部分。如果你正在使用電腦檢視這篇文章，建議...

這是一篇筆記，課程來自Coursera上的

How to Win a Data Science Competition: Learn from Top Kagglers

本篇文章講解在資料科學競賽中常用的特徵工程技巧，這是本篇文章的下部分。

如果你正在使用電腦檢視這篇文章，建議進入閱讀原文，檢視jupyeter notebook檔案。

Statistics and distance based features

該部分專注於此高階特徵工程：計算由另一個分組的一個特徵的各種統計資料和從給定點的鄰域分析得到的特徵。

groupby and nearest neighbor methods

例子：這裡有一些CTR任務的資料

我們可以暗示廣告有頁面上的最低價格將吸引大部分注意力。頁面上的其他廣告不會很有吸引力。計算與這種含義相關的特徵非常容易。我們可以為每個廣告的每個使用者和網頁新增最低和最高價格。在這種情況下，具有最低價格的廣告的位置也可以使用。

程式碼實現

More feature

How many pages user visited
Standard deviation of prices
Most visited page
Many, many more

如果沒有特徵可以像這樣使用groupby呢？可以使用最近鄰點

Neighbors

Explicit group is not needed
More flexible
Much harder to implement

Examples

Number of houses in 500m, 1000m,..
Average price per square meter in 500m, 1000m,..
Number of schools/supermarkets/parking lots in 500m, 1000m,..
Distance to colsest subway station

講師在 Springleaf 比賽中使用了它。

KNN features in springleaf

Mean encode all the variables
For every point, find 2000 nearst neighbors using Bray-Curtis metric
Calculate various features from those 2000 neighbors

Evaluate

Mean target of neatrest 5,10,15,500,2000, neighbors
Mean distance to 10 closest neighbors
Mean distance to 10 closest neighbors with target 1
Mean distance to 10 closest neighbors with target 0

Matrix factorizations for feature extraction

Example of feature fusion

Notes about Matrix Fatorization

Can be apply only for some columns
Can provide additional diversity

Good for ensembles

It is lossy transformation.Its’ efficirncy depends on:

Usually 5-100
Particular task
Number of latent factors

Implementtation

Serveral MF methods you can find in sklearn
SVD and PCA

Standart tools for Matrix Fatorization

TruncatedSVD

Works with sparse matrices

Non-negative Matrix Fatorization(NMF)

Ensures that all latent fators are non-negative
Good for counts-like data

NMF for tree-based methods

non-negative matrix factorization 簡稱NMF，它以一種使資料更適合決策樹的方式轉換資料。

可以看出，NMF變換資料形成平行於軸的線。

因子分解

可以使用與線性模型的技巧來分解矩陣。

Conclusion

Matrix Factorization is a very general approach for dimensionality reduction and feature extraction
It can be applied for transforming categorical features into real-valued
Many of tricks trick suitable for linear models can be useful for MF

Feature interactions

特徵值的所有組合

Example:banner selection

假設我們正在構建一個預測模型，在網站上顯示的最佳廣告橫幅。

…	category_ad	category_site	…	is_clicked
…	auto_part	game_news	…	0
…	music_tickets	music_news	..	1
…	mobile_phones	auto_blog	…	0

將廣告橫幅本身的類別和橫幅將顯示的網站類別，進行組合將構成一個非常強的特徵。

…	ad_site	…	is_clicked
…	auto_part \| game_news	…	0
…	music_tickets \| music_news	..	1
…	mobile_phones \| auto_blog	…	0

構建這兩個特徵的組合特徵 ad_site

從技術角度來看，有兩種方法可以構建這種互動。

Example of interactions

方法1

方法2

相似的想法也可用於數值變數

事實上，這不限於乘法操作，還可以是其他的

Multiplication
Sum
Diff
Division
..

Practival Notes

We have a lot of possible interactions -N*N for N features.

a. Even more if use several types in interactions

Need ti reduce it’s number

a. Dimensionality reduction
b. Feature selection

通過這種方法生成了大量的特徵，可以使用特徵選擇或降維的方法減少特徵。以下用特徵選擇舉例說明

Interactions’ order

We looked at 2nd order interactions.
Such approach can be generalized for higher orders.
It is hard to do generation and selection automatically.
Manual building of high-order interactions is some kind of art.

Extract features from DT

看一下決策樹。讓我們將每個葉子對映成二進位制特徵。物件葉子的索引可以用作新分類特徵的值。如果我們不使用單個樹而是使用它們的整體。例如，隨機森林，那麼這種操作可以應用於每個條目。這是一種提取高階互動的強大方法。

How to use it

In sklearn:

tree_model.apply()

In xgboost:

booster.predict(pred_leaf=True)

Conclusion

We looked at ways to build an interaction of categorical attributes
Extended this approach to real-valued features
Learn how to extract features via decision trees

t-SNE

用於探索資料分析。可以被視為從資料中獲取特徵的方法。

Practical Notes

Result heavily depends on hyperparameters(perplexity)

Good practice is to use several projections with different perplexities(5-100)

Due to stochastic nature, tSNE provides different projections even for the same data\hyperparams

Train and test should be projected together

tSNE runs for a long time with a big number of features

it is common to do dimensionality reduction before projection.

Implementation of tSNE can be found in sklearn library.
But personally I perfer you use stand-alone implementation python package tsne due to its’ faster speed.

Conclusion

tSNE is a great tool for visualization
It can be used as feature as well
Be careful with interpretation of results
Try different perplexities

矩陣分解：

矩陣分解方法概述（sklearn） (http://scikit-learn.org/stable/modules/decomposition.html)

T-SNOW：
多核t-SNE實現

(https://github.com/DmitryUlyanov/Multicore-TSNE)

流形學習方法的比較（sklearn)

(http://scikit-learn.org/stable/auto_examples/manifold/plot_compare_methods.html)

如何有效使用t-SNE（distill.pub部落格）

(https://distill.pub/2016/misread-tsne/)

tSNE主頁（Laurens van der Maaten）

(https://lvdmaaten.github.io/tsne/)

示例：具有不同困惑的tSNE（sklearn）

(http://scikit-learn.org/stable/auto_examples/manifold/plot_t_sne_perplexity.html#sphx-glr-auto-examples-manifold-plot-t-sne-perplexity-py)

互動：

Facebook Research的論文關於從樹中提取分類特徵

(https://research.fb.com/publications/practical-lessons-from-predicting-clicks-on-ads-at-facebook/)

示例：使用樹集合進行要素轉換（sklearn）

(http://scikit-learn.org/stable/auto_examples/ensemble/plot_feature_transformation.html)

點選閱讀原文，檢視jupyter notebook檔案

長按識別二維碼

獲取更多AI資訊

Learn from Top Kagglers：高階特徵工程 II

Statistics and distance based features

例子：這裡有一些CTR任務的資料

Neighbors

Examples

KNN features in springleaf

Evaluate

Matrix factorizations for feature extraction

Example of feature fusion

Notes about Matrix Fatorization

Implementtation

NMF for tree-based methods

因子分解

Conclusion

Feature interactions

Example:banner selection

Example of interactions

方法1

方法2

相似的想法也可用於數值變數

Practival Notes

Interactions’ order

Extract features from DT

How to use it

Conclusion

t-SNE

Practical Notes

Conclusion

矩陣分解：

T-SNOW：

互動：

您可能也會喜歡…