1. 程式人生 > >【電腦科學】【2017.05】基於深度神經網路的特徵選擇

【電腦科學】【2017.05】基於深度神經網路的特徵選擇

在這裡插入圖片描述

本文為比利時列日國立大學(作者:Nicolas Vecoven)的碩士論文,共77頁。

變數和特徵選擇已經成為許多研究的焦點,特別是在生物資訊學中有許多應用。機器學習是選擇特徵的有力工具,然而並非所有的機器學習演算法在特徵選擇方面都處於同等的地位。事實上,人們已經提出了許多方法來利用隨機森林進行特徵選擇,這使得它們成為當前生物資訊學的熱門模型。

另一方面,由於所謂的深度學習技術的出現,神經網路在過去幾年中開始迅速發展。然而,神經網路是一種黑盒模型,很少有人試圖分析其底層實現過程。的確,可以找到很多關於使用神經網路進行特徵提取的文章(對於神經網路,底層的輸入-輸出過程不需要被理解),而很少涉及特徵選擇。

為了利用深度神經網路進行特徵選擇,本文提出了一些新的演算法。為了評估我們的結果,我們設計了迴歸和分類問題,允許我們從效能、計算時間和約束等多個方面比較每種演算法。本文所獲得的結果非常有希望,因為我們實現了在各種情況下超越(或對等)隨機森林演算法的效能。由於在人工資料集上獲得了非常有希望的結果,我們也解決了DREAM4的挑戰。由於該資料集中可用的樣本數量非常少,這個挑戰對於神經網路來說可能是一個不適合的問題。然而,我們仍然能夠達到幾乎所有期望的效果。

最後,給出了我們研究的大多數方法的擴充套件方案。實際上,本文所討論的演算法非常模組化,並且可以針對所面對的問題進行調整。例如,我們解釋了我們的某一種演算法如何通過修剪以適應神經網路而不損失準確性。

Variable and feature selection have becomethe focus of much research, especially in bioinformatics where there are manyapplications. Machine learning is a powerful tool to select features, howevernot all machine learning algorithms are on an equal footing when it comes tofeature selection. Indeed, many methods have been proposed to carry out featureselection with random forests, which makes them the current go-to model inbioinformatics. On the other hand, thanks to the so-called deep learning,neural networks have benefited a huge interest resurgence in the past fewyears. However neural networks are blackbox models and very few attempts havebeen made in order to analyse the underlying process. Indeed, quite a fewarticles can be found about feature extraction with neural networks (for whichthe underlying inputs-outputs process does not need to be understood), whilevery few tackle feature selection. In this document, we propose new algorithmsin order to carry out feature selection with deep neural networks. To assessour results, we generate regression and classification problems which allow usto compare each algorithm on multiple fronts: performances, computation timeand constraints. The results obtained are really promising since we manage toachieve our goal by surpassing (or equaling) random forests performances inevery case (which was set to be our state-of-the-art comparison). Due to thepromising results obtained on artificial datasets we also tackle the DREAM4challenge. Due to the very small number of samples available in the datasets,this challenge is supposedly an ill-suited problem for neural networks. We werenevertheless able to achieve near state of the art results. Finally, extensionsare given for most of our methods. Indeed, the algorithms discussed are verymodulable and can be adapted regarding the problem faced. For example, weexplain how one of our algorithm can be adapted in order to prune neuralnetworks without losing accuracy.

1 引言
2 深度神經網路回顧與特徵選擇的研究動機
3 方法研究與解釋
4 應用:基因調控的推斷
5 結論與展望
附錄A 超立方體資料集生成
附錄B 硬體與軟體的詳細描述

下載英文原文地址:

http://page5.dfpan.com/fs/elc4j2e21f2951667a7/

更多精彩文章請關注微訊號:在這裡插入圖片描述