線上銷售預測的多目標進化特徵選擇
#引用
##LaTex
@article{JIMENEZ201775, title = “Multi-objective evolutionary feature selection for online sales forecasting”, journal = “Neurocomputing”, volume = “234”, pages = “75 - 92”, year = “2017”, issn = “0925-2312”, doi = “https://doi.org/10.1016/j.neucom.2016.12.045”, url = “http://www.sciencedirect.com/science/article/pii/S0925231216315612
##Normal
F. Jiménez, G. Sánchez, J.M. García, G. Sciavicco, L. Miralles,
Multi-objective evolutionary feature selection for online sales forecasting,
Neurocomputing,
Volume 234,
2017,
Pages 75-92,
ISSN 0925-2312,
#摘要
historical sales figures 歷史銷售數字 products characteristics and peculiarities 產品特性與特點 sound financial and business plans 健全的財務和商業計劃
an accurate regression model for online sales forecasting: a novel feature selection methodology multi-objective evolutionary algorithm ENORA (Evolutionary NOn-dominated Radial slots based Algorithm) a wrapper method regression model learner — Random Forest
integrates feature selection for regression, model evaluation, and decision making in order to choose the most satisfactory model an a posteriori process a multi-objective context
#主要內容
root mean squared error (RMSE)
##ENORA (Evolutionary NOn-dominated Radial slots based Algorithm)
a (μ + λ) survival strategy an elitist method μ = λ = N N is the size of the population, binary tournament selection, and self-adaptive crossover and mutation for multi-objective evolutionary optimization
a rank-crowding-better function
— 目標函式在歸一化後 — 目標函式數目
##NSGA-II (Non-dominated Sorted Genetic Algorithm)
a (μ + λ) strategy a binary tournament selection a rank-crowding better function
##ENORA 與 NSGA-II 的區別
how the calculation of the ranking of the individuals in the population is performed
- ENORA:the non-domination level of the individual in its slot
- NSGA-II:the non-domination level of the individual in the whole population
在binary tournament中,被佔優的個體是否能勝出 個體C是否能優於B,提升多樣性
##特徵選擇
演算法:
- supervised
- unsupervised
- semi-supervised
取決於訓練集是否被標記
模型:
- filter — statistical measures
- wrapper — a search problem
- embedded — model-dependent
演算法步驟:
- subset generation — greedy hill-climbing approach, sequential forward selection, sequential backward elimination, bi-directional selection, branch and bound, beam search, Las Vegas algorithms, evolutionary algorithms, and particle swarm optimization algorithms.
- subset evaluation — multivariate filter methods (the distance, the uncertainty, the dependence, and the consistency) + wrapper methods (the accuracy)
- stopping criterion
- result validation
##多目標
- accuracy
- number of features
- number of instances
- the cardinality and granularity of the subset selection
- the cross-validation accuracy
- the false positive rate
- the false negative rate
- the sensitivity
- the specificity
- measures of consistency, dependency, distance and information
- error identification rate
- undetected identification rate
##演算法
同時優化特徵表示與使用的交叉與變異運算元
優化目標:
the root mean squared error the cardinality of the subset
a Bernoulli random variable
maintaining diversity in the population and sustaining the convergence capacity of the evolutionary algorithm
##試驗
data set — the Kaggle community — predictive modeling competitions — the Online Product Sales competition
population size equal to 1000 and for 100 generations 100,000 evaluations 10-folds cross-validation