馬爾可夫毯式遺傳演算法在基因選擇中的應用

阿新 • • 發佈：2018-12-20

#引用

##LaTex

@article{ZHU20073236, title = “Markov blanket-embedded genetic algorithm for gene selection”, journal = “Pattern Recognition”, volume = “40”, number = “11”, pages = “3236 - 3248”, year = “2007”, issn = “0031-3203”, doi = “https://doi.org/10.1016/j.patcog.2007.02.007”, url = “http://www.sciencedirect.com/science/article/pii/S0031320307000945

”, author = “Zexuan Zhu and Yew-Soon Ong and Manoranjan Dash”, keywords = “Microarray, Feature selection, Markov blanket, Genetic algorithm (GA), Memetic algorithm (MA)” }

##Normal

Zexuan Zhu, Yew-Soon Ong, Manoranjan Dash, Markov blanket-embedded genetic algorithm for gene selection, Pattern Recognition, Volume 40, Issue 11, 2007, Pages 3236-3248, ISSN 0031-3203,

https://doi.org/10.1016/j.patcog.2007.02.007. (http://www.sciencedirect.com/science/article/pii/S0031320307000945) Keywords: Microarray; Feature selection; Markov blanket; Genetic algorithm (GA); Memetic algorithm (MA)

#摘要

Microarray technologies the smallest possible set of genes

Markov blanket-embedded genetic algorithm (MBEGA) for gene selection problem

Markov blanket and predictive power in classifier model

filter, wrapper, and standard GA

evaluation criteria: classification accuracy, number of selected genes, computational cost, and robustness

#主要內容

這裡寫圖片描述

##Markov Blanket（Markov毯）

$F$ — 所有特徵的集合 $C$ — 類別

一個特徵 $F_i$ 的Markov毯定義如下：

定義（Markov毯） $M$ — 一個特徵子集（不包含 $F_i$ ）即， $M \in F$ 且 $F_i \notin M$ 。 $M$ 為 $F_i$ 的一個Markov毯，若給定 $M$ ， $F_i$ 是對於 $\left( F \cup C \right) - M - \left\{ F_i \right\}$ 條件獨立的，即， $P \left( F - M - \left\{ F_i \right\}, C | F_i, M \right) = P \left( F - M - \left\{ F_i \right\}, C | M \right)$

給定X，兩個屬性A與B是條件獨立的，若$P \left( A | X, B \right) = P \left( A | X \right) $，也就是說，B並不能在X之外提供關於A的資訊。若一個特徵$ F_i $在當前選擇的特徵子集中有一個Markov毯$ M $，那麼$ F_i $在$ M $之外關於$ C $不能提供其他選擇的特徵的資訊，因此，$ F_i $能夠安全移除。然而，決定特徵的條件獨立的計算複雜度通常非常高，因此，只使用一個特徵來估計$ F_i$的Markov毯。

定義（近似Markov毯） 對於兩個特徵 $F_i$ 與 $F_j$ $i\neq j$ ， $F_j$ 可看作為 $F_i$ 的近似Markov毯，若 $SU_{j,C} \geq SU_{i,C}$ 且 $SU_{i,j} \geq SU_{i,C}$ ，其中，對稱不確定性（symmetrical uncertainty，SU）度量特徵（包括類， $C$ ）間的相關性，定義為：

這裡寫圖片描述

$IG \left( F_i | F_j \right)$ — 特徵 $F_i$ 與 $F_j$ 間的資訊增益 $H \left( F_i \right)$ 與 $H \left( F_j \right)$ — 特徵 $F_i$ 與 $F_j$ 的熵 $SU_{i,C}$ — 特徵 $F_i$ 與類 $C$ 間的相關性，稱為C-correlation 一個特徵被認為是相關的若其C-correlation高於使用者給定的閾值 $\gamma$ ，即， $S_{i,C} > \gamma$ 沒有任何近似Markov毯的特徵為predominant feature主導特徵

##馬爾可夫毯式嵌入式遺傳演算法

這裡寫圖片描述

若適應值差異小於 $\varepsilon$ ，則特徵數較少的個體較好

Lamarckian learning：通過將區域性改進的個體放回種群競爭繁殖的機會，來迫使基因型反映改進的效果

這裡寫圖片描述

$X$ — 選擇的特徵子集 $Y$ — 排除的特徵子集

這裡寫圖片描述

C-correlation 只計算一次

搜尋範圍 $L$ — 定義了 $Add$ 與 $Del$ 操作的最大數目 — $L^2$ 個操作組合隨機順序 — 直到得到改進提升效果

這裡寫圖片描述

Lamarckian learning process

之後是 usual evolutionary operations：

linear ranking selection
uniform crossover
mutation operators with elitism

##試驗

MBEGA method

考慮了：

the FCBF (fast correlation-based filter)
BIRS (best incremental ranked subset)
standard GA feature selection algorithms

FCBF — a fast correlation based filter method

selecting a subset of relevant features whose C-correlation are larger than a given threshold $\gamma$
sorts the relevant features in descending order in terms of C-correlation
redundant features are eliminated one-by-one in a descending order

A feature is redundant 僅當 it has an approximate Markov blanket

predominant features with zero redundant features in terms of C-correlation

BIRS — a similar scheme as the FCBF evaluates the goodness of features using a classifier

ranking the genes according to some measure of interest
sequentially selects the ranked features one-by-one based on their incremental usefulness

calls to the classifier as many times as the number of features

$BIRS_F$ or $BIRS_W$ — 基於 — C-correlation (i.e., symmetrical uncertainty between feature $F_i$ and the class $C$ ) or individual predictive power

$BIRS_F$ 耗時更少

###synthetic data 合成數據

這裡寫圖片描述 ten 10-fold crossvalidations with C4.5 classifier

10 independent runs

The maximum number of selected features in each chromosome, m, is set to 50.

###microarray data 微陣列資料

這裡寫圖片描述

The .632+ bootstrap

這裡寫圖片描述

K次重取樣

the support vector machine (SVM) — microarray classification problems

one-versus-rest strategy — multi-class datasets

the linear kernel SVM

這裡寫圖片描述

馬爾可夫毯式遺傳演算法在基因選擇中的應用

馬爾可夫毯式遺傳演算法在基因選擇中的應用

馬爾可夫毯（Markov Blanket）

NLP --- 隱馬爾可夫HMM（EM演算法（期望最大化演算法））

機器學習_5.隱馬爾可夫模型的典型問題和演算法

隱馬爾可夫模型（HMM）及Viterbi演算法

隱馬爾可夫模型（HMM）和Viterbi演算法

【演算法】隱馬爾可夫模型 HMM

隱馬爾可夫模型學習筆記（一）：前後向演算法介紹與推導

【彩票】彩票預測演算法(一)：離散型馬爾可夫鏈模型C#實現

基於BP演算法的3維馬爾可夫隨機場運動目標檢測

隱馬爾可夫模型（五）——隱馬爾可夫模型的解碼問題(維特比演算法）

動態規劃之隱含馬爾可夫模型(HMM)和維特比演算法(Viterbi Algorithm)

序列的演算法（一·a）馬爾可夫模型

詳解隱馬爾可夫模型(HMM)中的維特比演算法

隱馬爾可夫模型（三）

隱馬爾可夫模型（一）

簡單馬爾可夫模型的實現（簡單的機器學習）

轉：從頭開始編寫基於隱含馬爾可夫模型HMM的中文分詞器

馬爾可夫決策過程中的動規

一份數學小白也能讀懂的「馬爾可夫鏈蒙特卡洛方法」入門指南

馬爾可夫毯式遺傳演算法在基因選擇中的應用

相關推薦