1. 程式人生 > >線上銷售預測的多目標進化特徵選擇

線上銷售預測的多目標進化特徵選擇

#引用

##LaTex

@article{JIMENEZ201775, title = “Multi-objective evolutionary feature selection for online sales forecasting”, journal = “Neurocomputing”, volume = “234”, pages = “75 - 92”, year = “2017”, issn = “0925-2312”, doi = “https://doi.org/10.1016/j.neucom.2016.12.045”, url = “http://www.sciencedirect.com/science/article/pii/S0925231216315612

”, author = “F. Jim茅nez and G. S謾nchez and J.M. Garc鉚a and G. Sciavicco and L. Miralles”, keywords = “Multi-objective evolutionary algorithms, Feature selection, Random forest, Regression model, Online sales forecasting” }

##Normal

F. Jiménez, G. Sánchez, J.M. García, G. Sciavicco, L. Miralles, Multi-objective evolutionary feature selection for online sales forecasting, Neurocomputing, Volume 234, 2017, Pages 75-92, ISSN 0925-2312,

https://doi.org/10.1016/j.neucom.2016.12.045. (http://www.sciencedirect.com/science/article/pii/S0925231216315612) Keywords: Multi-objective evolutionary algorithms; Feature selection; Random forest; Regression model; Online sales forecasting

#摘要

historical sales figures 歷史銷售數字 products characteristics and peculiarities 產品特性與特點 sound financial and business plans 健全的財務和商業計劃

an accurate regression model for online sales forecasting: a novel feature selection methodology multi-objective evolutionary algorithm ENORA (Evolutionary NOn-dominated Radial slots based Algorithm) a wrapper method regression model learner — Random Forest

integrates feature selection for regression, model evaluation, and decision making in order to choose the most satisfactory model an a posteriori process a multi-objective context

#主要內容

root mean squared error (RMSE)

##ENORA (Evolutionary NOn-dominated Radial slots based Algorithm)

a (μ + λ) survival strategy an elitist method μ = λ = N N is the size of the population, binary tournament selection, and self-adaptive crossover and mutation for multi-objective evolutionary optimization

a rank-crowding-better function

這裡寫圖片描述

這裡寫圖片描述

d=Nn1d = \left\lfloor \sqrt[n-1]N \right\rfloor hjIh_j^I — 目標函式fjIf_j^I[0,1][0,1]歸一化後 nn — 目標函式數目

這裡寫圖片描述

這裡寫圖片描述

##NSGA-II (Non-dominated Sorted Genetic Algorithm)

a (μ + λ) strategy a binary tournament selection a rank-crowding better function

##ENORA 與 NSGA-II 的區別

how the calculation of the ranking of the individuals in the population is performed

  • ENORA:the non-domination level of the individual in its slot
  • NSGA-II:the non-domination level of the individual in the whole population

這裡寫圖片描述

在binary tournament中,被佔優的個體是否能勝出 個體C是否能優於B,提升多樣性

##特徵選擇

演算法:

  • supervised
  • unsupervised
  • semi-supervised

取決於訓練集是否被標記

模型:

  • filter — statistical measures
  • wrapper — a search problem
  • embedded — model-dependent

演算法步驟:

  • subset generation — greedy hill-climbing approach, sequential forward selection, sequential backward elimination, bi-directional selection, branch and bound, beam search, Las Vegas algorithms, evolutionary algorithms, and particle swarm optimization algorithms.
  • subset evaluation — multivariate filter methods (the distance, the uncertainty, the dependence, and the consistency) + wrapper methods (the accuracy)
  • stopping criterion
  • result validation

##多目標

  • accuracy
  • number of features
  • number of instances
  • the cardinality and granularity of the subset selection
  • the cross-validation accuracy
  • the false positive rate
  • the false negative rate
  • the sensitivity
  • the specificity
  • measures of consistency, dependency, distance and information
  • error identification rate
  • undetected identification rate

##演算法

同時優化特徵表示與使用的交叉與變異運算元

這裡寫圖片描述 這裡寫圖片描述

優化目標:

這裡寫圖片描述

the root mean squared error the cardinality of the subset

這裡寫圖片描述

這裡寫圖片描述

a Bernoulli random variable

maintaining diversity in the population and sustaining the convergence capacity of the evolutionary algorithm

##試驗

這裡寫圖片描述

data set — the Kaggle community — predictive modeling competitions — the Online Product Sales competition

population size equal to 1000 and for 100 generations 100,000 evaluations 10-folds cross-validation

這裡寫圖片描述

這裡寫圖片描述

這裡寫圖片描述

這裡寫圖片描述

這裡寫圖片描述