1. 程式人生 > >Grid Search for Tensorflow Deep Learning

Grid Search for Tensorflow Deep Learning

Grid Search是Hyperparameter Tuning階段傳統的除錯方法,也稱為Exaustive Search。在除錯前,需要選定演算法,需要確定待調引數,以及待調引數的可能取值。Grid Search演算法會在引數空間中窮盡嘗試所有組合,選出表現最佳的引數。

 

Outline:

本文會用Sklearn的Boston房價預測作為資料集

先做一次‘調包俠’,用Sklearn內建的GridSearchCV來實現

然後在Tensorflow下實現一次,主要說明Deep Learning環境下的Grid Search操作

 

Boston House Price Problem:

經典資料集之一,內建在sklearn.datasets裡,我們直接載入:

from sklearn import datasets

boston = datasets.load_boston()
X = boston["data"]
Y = boston["target"]

print(X.shape)
print(Y.shape)

 我們可以知道,一共有506條資料,13列(attributes),要求預測房價的數額(Regression Problem)。為了簡化過程,我們不去深究各個attribute的意義了。

 

Grid Search in Sklearn:

sklearn.model_selection.GridSearchCV實現了該功能。在下面的例子中,我們選取了SVR作為model來預測Boston房價。主要tuning的引數有2個,kernel型別(linear or RBF)以及C(=1 or 10)。預設採用5-fold Cross-Validation來評估模型。程式碼如下:

model = SVR(gamma='scale')
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
reg = GridSearchCV(model, parameters, cv=5)
reg.fit(X, Y)
sorted(reg.cv_results_.keys())
print(reg.cv_results_)

 從列印的結果中,我們可以看到各個parameter組合所構建的模型評分,最終Linear Kernel&C=1勝出。

{...
 'params': [{'C': 1, 'kernel': 'linear'}, {'C': 1, 'kernel': 'rbf'}, {'C': 10, 'kernel': 'linear'}, {'C': 10, 'kernel': 'rbf'}], 
'split0_test_score': array([ 0.77285459,  0.12029639,  0.77953306, -0.04157249]), 
'split1_test_score': array([ 0.72771739, -0.08134385,  0.72810716,  0.01592944]), 
'split2_test_score': array([ 0.56131914, -0.79967714,  0.63566857, -0.38338425]), 
'split3_test_score': array([0.15056451, 0.09037651, 0.02786433, 0.25941567]),
'split4_test_score': array([ 0.08212844, -0.90391602, -0.07224368, -0.62731013]), 
'mean_test_score': array([ 0.45953725, -0.31399285,  0.42049685, -0.15515943]), 
'std_test_score': array([0.289307  , 0.44498376, 0.36516833, 0.31248031]), 
'rank_test_score': array([1, 4, 2, 3]), 
'split0_train_score': array([0.70714979, 0.39723582, 0.70149448, 0.70558716]), 
'split1_train_score': array([0.68986786, 0.39850963, 0.68696465, 0.68704436]), 
'split2_train_score': array([0.62838757, 0.37872469, 0.64670086, 0.66406787]), 
'split3_train_score': array([0.82850586, 0.38276233, 0.82941506, 0.73598928]), 
'split4_train_score': array([0.69005814, 0.29652628, 0.69148868, 0.64436246]), 
'mean_train_score': array([0.70879385, 0.37075175, 0.71121274, 0.68741023]), 
'std_train_score': array([0.06558667, 0.03791872, 0.06197584, 0.03190121])
}

至於為什麼要選擇SVR模型,而不是選擇Neural Network,從而利於與下章的Tensorflow程式碼做對比。原因在於,隨意組合的Neural Network引數,很有可能導致模型無法收斂,甚至把Cost Function推向Nan,從而丟擲異常。So...

 

Grid Search in Tensorflow Deep Learning:

在下面的例子中,我們自己定義迴圈函式來遍歷所有的Parameter組合,構建不同的Model並做Model的表現評估。我們首先定義兩個功能函式。第一個需要確定Parameter Scope,model_configs函式來生成Configuration List:

def model_configs():
    # define scope of configs
    learning_rate = [0.0001,0.01]
    layer1_nodes = [16,32]
    layer2_nodes = [8,4]
    
    # create configs
    configs = list()
    for i in learning_rate:
        for j in layer1_nodes:
            for k in layer2_nodes:
                cfg = [i,j,k]
                configs.append(cfg)
    print('Total configs: %d' % len(configs))
    return configs

 

第二個是在Tensorflow中增加神經網路Hidden Layer的函式add_layer:

def add_layer(name1,inputs,in_size,out_size,activation_function=None):
    Weights = tf.get_variable(name1,[in_size, out_size], \
                              initializer = tf.contrib.layers.xavier_initializer())
    biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
    Wx_plus_b = tf.matmul(inputs, Weights) + biases
    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)
    return outputs

 

最後是主流程控制程式,整體來講是遍歷Configure List裡面的所有Parameter組合,來構建Tensorflow Neural Network。在訓練中,使用MSE作為Cost Function,使用Adam Optimizer來做優化。在訓練好的Models上,使用20%的測試集&MSE來評估各個Model的表現。

cfg_list = model_configs()
error_list = []
for cfg in cfg_list:
    #unzip hyperparameters
    learning_rate,layer1_nodes,layer2_nodes = cfg;
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, shuffle=True)
    #define model
    tf.reset_default_graph()
    tf_x = tf.placeholder(tf.float32, [None, 13])
    tf_y = tf.placeholder(tf.int32, [None, 1])
    
    l1 = add_layer('l1',tf_x,13, layer1_nodes, activation_function=tf.nn.relu)
    l2 = add_layer('l2',l1, layer1_nodes, layer2_nodes, activation_function=tf.nn.relu)
    pred = add_layer('out',l2, layer2_nodes, 1, activation_function=tf.nn.relu)
    
    with tf.name_scope('loss'):
        loss =  tf.losses.mean_squared_error(tf_y, pred)
        #sigmoid_cross_entropy_with_logits(labels=tf_y, logits=pred)
        tf.summary.scalar("loss",tensor=loss)
    
    train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)
    
    init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) 
    sess = tf.Session()
    sess.run(init_op)
    
    for j in range(0,10000):
        sess.run(train_op,{tf_x:X_train,tf_y:y_train.reshape([y_train.shape[0],1])})
        cost_ = sess.run(loss, {tf_x:X_train,tf_y:y_train.reshape([y_train.shape[0],1])})
    loss = sess.run(loss, feed_dict={tf_x: X_test, tf_y:y_test.reshape([y_test.shape[0],1])})
    print('test loss: %.2f'% loss)
    error_list.append(loss)
    sess.close()
print(cfg_list)
print(error_list)

 

[[0.0001, 16, 8], [0.0001, 16, 4], [0.0001, 32, 8], [0.0001, 32, 4],
 [0.01, 16, 8], [0.01, 16, 4], [0.01, 32, 8], [0.01, 32, 4]]

[659.03925, 15.627606, 34.55378, 598.14703,
 579.9314, 10.684119, 25.026648, 103.17941]

 

最後,個人感覺Naive Grid Search是不太適合Deep Learning問題的。首先,我們看到在例子中,模型是假定的、層數是假定的、Activation是Relu,Optimizer也只使用了Adam,還有例如初始化方法,Cost Function選擇,Regularization等林林總總的引數可以被調節。可以說,需要Data Scientist來制定Deep Learning的Optimization計劃,然後固定幾個引數,再使用Grid Search來遍歷搜尋。並且,如此多的引數維度,遍歷所有引數組合是不可能的,所以應該引入更智慧的調參方式。