馬爾可夫毯式遺傳演算法在基因選擇中的應用
#引用
##LaTex
@article{ZHU20073236, title = “Markov blanket-embedded genetic algorithm for gene selection”, journal = “Pattern Recognition”, volume = “40”, number = “11”, pages = “3236 - 3248”, year = “2007”, issn = “0031-3203”, doi = “https://doi.org/10.1016/j.patcog.2007.02.007”, url = “http://www.sciencedirect.com/science/article/pii/S0031320307000945
##Normal
Zexuan Zhu, Yew-Soon Ong, Manoranjan Dash,
Markov blanket-embedded genetic algorithm for gene selection,
Pattern Recognition,
Volume 40, Issue 11,
2007,
Pages 3236-3248,
ISSN 0031-3203,
#摘要
Microarray technologies the smallest possible set of genes
Markov blanket-embedded genetic algorithm (MBEGA) for gene selection problem
Markov blanket and predictive power in classifier model
filter, wrapper, and standard GA
evaluation criteria: classification accuracy, number of selected genes, computational cost, and robustness
#主要內容
##Markov Blanket(Markov毯)
— 所有特徵的集合 — 類別
一個特徵的Markov毯 定義如下:
定義(Markov毯) — 一個特徵子集(不包含) 即,且。 為的一個Markov毯,若 給定,是對於 條件獨立的, 即,
給定X,兩個屬性A與B是條件獨立的,若$P \left( A | X, B \right) = P \left( A | X \right) F_iMF_iMCF_iF_i$的Markov毯。
定義(近似Markov毯) 對於兩個特徵與 ,可看作為的近似Markov毯,若 且 ,其中, 對稱不確定性(symmetrical uncertainty,SU)度量特徵(包括類,)間的相關性,定義為:
— 特徵與間的資訊增益 與 — 特徵與的熵 — 特徵與類間的相關性,稱為C-correlation 一個特徵被認為是相關的若其C-correlation高於使用者給定的閾值,即, 沒有任何近似Markov毯的特徵為predominant feature主導特徵
##馬爾可夫毯式嵌入式遺傳演算法
若適應值差異小於,則特徵數較少的個體較好
Lamarckian learning: 通過將區域性改進的個體放回種群競爭繁殖的機會,來迫使基因型反映改進的效果
— 選擇的特徵子集 — 排除的特徵子集
C-correlation 只計算一次
搜尋範圍 — 定義了與操作的最大數目 — 個操作組合 隨機順序 — 直到得到改進提升效果
Lamarckian learning process
之後是 usual evolutionary operations:
- linear ranking selection
- uniform crossover
- mutation operators with elitism
##試驗
MBEGA method
考慮了:
- the FCBF (fast correlation-based filter)
- BIRS (best incremental ranked subset)
- standard GA feature selection algorithms
FCBF — a fast correlation based filter method
- selecting a subset of relevant features whose C-correlation are larger than a given threshold
- sorts the relevant features in descending order in terms of C-correlation
- redundant features are eliminated one-by-one in a descending order
A feature is redundant 僅當 it has an approximate Markov blanket
predominant features with zero redundant features in terms of C-correlation
BIRS — a similar scheme as the FCBF evaluates the goodness of features using a classifier
- ranking the genes according to some measure of interest
- sequentially selects the ranked features one-by-one based on their incremental usefulness
calls to the classifier as many times as the number of features
or — 基於 — C-correlation (i.e., symmetrical uncertainty between feature and the class ) or individual predictive power
耗時更少
###synthetic data 合成數據
ten 10-fold crossvalidations with C4.5 classifier
10 independent runs
The maximum number of selected features in each chromosome, m, is set to 50.
###microarray data 微陣列資料
The .632+ bootstrap
K次重取樣
the support vector machine (SVM) — microarray classification problems
one-versus-rest strategy — multi-class datasets
the linear kernel SVM