Grid Search for Tensorflow Deep Learning

阿新 • • 發佈：2018-12-31

Grid Search是Hyperparameter Tuning階段傳統的除錯方法，也稱為Exaustive Search。在除錯前，需要選定演算法，需要確定待調引數，以及待調引數的可能取值。Grid Search演算法會在引數空間中窮盡嘗試所有組合，選出表現最佳的引數。

Outline:

本文會用Sklearn的Boston房價預測作為資料集

先做一次‘調包俠’，用Sklearn內建的GridSearchCV來實現

然後在Tensorflow下實現一次，主要說明Deep Learning環境下的Grid Search操作

Boston House Price Problem:

經典資料集之一，內建在sklearn.datasets裡，我們直接載入：

from sklearn import datasets

boston = datasets.load_boston()
X = boston["data"]
Y = boston["target"]

print(X.shape)
print(Y.shape)

我們可以知道，一共有506條資料，13列（attributes），要求預測房價的數額（Regression Problem）。為了簡化過程，我們不去深究各個attribute的意義了。

Grid Search in Sklearn:

sklearn.model_selection.GridSearchCV實現了該功能。在下面的例子中，我們選取了SVR作為model來預測Boston房價。主要tuning的引數有2個，kernel型別(linear or RBF)以及C(=1 or 10)。預設採用5-fold Cross-Validation來評估模型。程式碼如下：

model = SVR(gamma='scale')
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
reg = GridSearchCV(model, parameters, cv=5)
reg.fit(X, Y)
sorted(reg.cv_results_.keys())
print(reg.cv_results_)

從列印的結果中，我們可以看到各個parameter組合所構建的模型評分，最終Linear Kernel&C=1勝出。

{...
 'params': [{'C': 1, 'kernel': 'linear'}, {'C': 1, 'kernel': 'rbf'}, {'C': 10, 'kernel': 'linear'}, {'C': 10, 'kernel': 'rbf'}], 
'split0_test_score': array([ 0.77285459,  0.12029639,  0.77953306, -0.04157249]), 
'split1_test_score': array([ 0.72771739, -0.08134385,  0.72810716,  0.01592944]), 
'split2_test_score': array([ 0.56131914, -0.79967714,  0.63566857, -0.38338425]), 
'split3_test_score': array([0.15056451, 0.09037651, 0.02786433, 0.25941567]),
'split4_test_score': array([ 0.08212844, -0.90391602, -0.07224368, -0.62731013]), 
'mean_test_score': array([ 0.45953725, -0.31399285,  0.42049685, -0.15515943]), 
'std_test_score': array([0.289307  , 0.44498376, 0.36516833, 0.31248031]), 
'rank_test_score': array([1, 4, 2, 3]), 
'split0_train_score': array([0.70714979, 0.39723582, 0.70149448, 0.70558716]), 
'split1_train_score': array([0.68986786, 0.39850963, 0.68696465, 0.68704436]), 
'split2_train_score': array([0.62838757, 0.37872469, 0.64670086, 0.66406787]), 
'split3_train_score': array([0.82850586, 0.38276233, 0.82941506, 0.73598928]), 
'split4_train_score': array([0.69005814, 0.29652628, 0.69148868, 0.64436246]), 
'mean_train_score': array([0.70879385, 0.37075175, 0.71121274, 0.68741023]), 
'std_train_score': array([0.06558667, 0.03791872, 0.06197584, 0.03190121])
}

至於為什麼要選擇SVR模型，而不是選擇Neural Network，從而利於與下章的Tensorflow程式碼做對比。原因在於，隨意組合的Neural Network引數，很有可能導致模型無法收斂，甚至把Cost Function推向Nan，從而丟擲異常。So...

Grid Search in Tensorflow Deep Learning:

在下面的例子中，我們自己定義迴圈函式來遍歷所有的Parameter組合，構建不同的Model並做Model的表現評估。我們首先定義兩個功能函式。第一個需要確定Parameter Scope，model_configs函式來生成Configuration List:

def model_configs():
    # define scope of configs
    learning_rate = [0.0001,0.01]
    layer1_nodes = [16,32]
    layer2_nodes = [8,4]
    
    # create configs
    configs = list()
    for i in learning_rate:
        for j in layer1_nodes:
            for k in layer2_nodes:
                cfg = [i,j,k]
                configs.append(cfg)
    print('Total configs: %d' % len(configs))
    return configs

第二個是在Tensorflow中增加神經網路Hidden Layer的函式add_layer:

def add_layer(name1,inputs,in_size,out_size,activation_function=None):
    Weights = tf.get_variable(name1,[in_size, out_size], \
                              initializer = tf.contrib.layers.xavier_initializer())
    biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
    Wx_plus_b = tf.matmul(inputs, Weights) + biases
    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)
    return outputs

最後是主流程控制程式，整體來講是遍歷Configure List裡面的所有Parameter組合，來構建Tensorflow Neural Network。在訓練中，使用MSE作為Cost Function，使用Adam Optimizer來做優化。在訓練好的Models上，使用20%的測試集&MSE來評估各個Model的表現。

cfg_list = model_configs()
error_list = []
for cfg in cfg_list:
    #unzip hyperparameters
    learning_rate,layer1_nodes,layer2_nodes = cfg;
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, shuffle=True)
    #define model
    tf.reset_default_graph()
    tf_x = tf.placeholder(tf.float32, [None, 13])
    tf_y = tf.placeholder(tf.int32, [None, 1])
    
    l1 = add_layer('l1',tf_x,13, layer1_nodes, activation_function=tf.nn.relu)
    l2 = add_layer('l2',l1, layer1_nodes, layer2_nodes, activation_function=tf.nn.relu)
    pred = add_layer('out',l2, layer2_nodes, 1, activation_function=tf.nn.relu)
    
    with tf.name_scope('loss'):
        loss =  tf.losses.mean_squared_error(tf_y, pred)
        #sigmoid_cross_entropy_with_logits(labels=tf_y, logits=pred)
        tf.summary.scalar("loss",tensor=loss)
    
    train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    
    init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) 
    sess = tf.Session()
    sess.run(init_op)
    
    for j in range(0,10000):
        sess.run(train_op,{tf_x:X_train,tf_y:y_train.reshape([y_train.shape[0],1])})
        cost_ = sess.run(loss, {tf_x:X_train,tf_y:y_train.reshape([y_train.shape[0],1])})
    loss = sess.run(loss, feed_dict={tf_x: X_test, tf_y:y_test.reshape([y_test.shape[0],1])})
    print('test loss: %.2f'% loss)
    error_list.append(loss)
    sess.close()
print(cfg_list)
print(error_list)

[[0.0001, 16, 8], [0.0001, 16, 4], [0.0001, 32, 8], [0.0001, 32, 4],
 [0.01, 16, 8], [0.01, 16, 4], [0.01, 32, 8], [0.01, 32, 4]]

[659.03925, 15.627606, 34.55378, 598.14703,
 579.9314, 10.684119, 25.026648, 103.17941]

最後，個人感覺Naive Grid Search是不太適合Deep Learning問題的。首先，我們看到在例子中，模型是假定的、層數是假定的、Activation是Relu，Optimizer也只使用了Adam，還有例如初始化方法，Cost Function選擇，Regularization等林林總總的引數可以被調節。可以說，需要Data Scientist來制定Deep Learning的Optimization計劃，然後固定幾個引數，再使用Grid Search來遍歷搜尋。並且，如此多的引數維度，遍歷所有引數組合是不可能的，所以應該引入更智慧的調參方式。

Grid Search for Tensorflow Deep Learning

Grid Search for Tensorflow Deep Learning

【論文閱讀】ConvNet Architecture Search for Spatiotemporal Feature Learning

How to use DeepLab in TensorFlow for object segmentation using Deep Learning

Deep Learning for Robotics 資源匯總

最實用的深度學習教程 Practical Deep Learning For Coders (Kaggle 冠軍 Jeremy Howard 親授)

論文筆記-Wide & Deep Learning for Recommender Systems

Python計算機視覺深度學習三合一Deep learning for computer vision with Python高清pdf

RBM-An approach for text summarization using deep learning algorithm

A Deep Learning-Based System for Vulnerability Detection(二)

「Computer Vision」Notes on Deep Learning for Generic Object Detection

【閱讀筆記】Applying Deep Learning To Airbnb Search

Gentle Introduction to the Adam Optimization Algorithm for Deep Learning

Tensorflow and deep learning without a PhD系列第一部分數字識別問題

18、Effect of Automatic Hyperparameter Tuning for Residential Load Forecasting via Deep Learning

deep learning 吳恩達第四課第四周卷積神經網路：Face Recognition for the Happy House - v3

【論文閱讀】韓鬆《Efficient Methods And Hardware For Deep Learning》節選《Learning both Weights and Connections 》

影象檢索入門：CVPR2015《Deep Learning of Binary Hash Codes for Fast Image Retrieval》

Deep Learning for Generic Object Detection: A Survey

《Transform- and multi-domain deep learning for single-frame rapid autofocusing》筆記

SenseGen: A Deep Learning Architecture for Synthetic Sensor Data Generation論文解讀

Grid Search for Tensorflow Deep Learning

相關推薦