keras引數調優





  1. 如何在scikit-learn模型中使用Keras。
  2. 如何在scikit-learn模型中使用網格搜尋。
  3. 如何調優批尺寸和訓練epochs。
  4. 如何調優優化演算法。
  5. 如何調優學習率和動量因子。
  6. 如何確定網路權值初始值。
  7. 如何選擇神經元啟用函式。
  8. 如何調優Dropout正則化。
  9. 如何確定隱藏層中的神經元的數量。


通過用 KerasClassifier 或 KerasRegressor 類包裝Keras模型,可將其用於scikit-learn。

要使用這些包裝,必須定義一個函式,以便按順序模式建立並返回Keras,然後當構建 KerasClassifier 類時,把該函式傳遞給 build_fn 引數。


  1. def create_model():

  2. ...

  3. return model

  5. model = KerasClassifier(build_fn=create_model)

KerasClassifier類 的構建器為可以採取預設引數,並將其被傳遞給 model.fit() 的呼叫函式,比如 epochs數目和批尺寸(batch size)。


  1. def create_model():

  2. ...

  3. return model

  5. model = KerasClassifier(build_fn=create_model, nb_epoch=10)

KerasClassifier類的構造也可以使用新的引數,使之能夠傳遞給自定義的create_model()函式。這些新的引數,也必須由使用預設引數的 create_model() 函式的簽名定義。


  1. def create_model(dropout_rate=0.0):

  2. ...

  3. return model

  5. model = KerasClassifier(build_fn=create_model, dropout_rate=0.2)

您可以在 Keras API文件 中,瞭解到更多關於scikit-learn包裝器的知識。


網格搜尋(grid search)是一項模型超引數優化技術。

在scikit-learn中,該技術由 GridSearchCV 類提供。

當構造該類時,你必須提供超引數字典,以便用來評價 param_grid 引數。這是模型引數名稱和大量列值的示意圖。


預設情況下,網格搜尋只使用一個執行緒。在GridSearchCV建構函式中,通過將 n_jobs引數設定為-1,則程序將使用計算機上的所有核心。這取決於你的Keras後端,並可能干擾主神經網路的訓練過程。

當構造並評估一個模型中各個引數的組合時,GridSearchCV會起作用。使用交叉驗證評估每個單個模型,且預設使用3層交叉驗證,儘管通過將cv引數指定給 GridSearchCV建構函式時,有可能將其覆蓋。


param_grid = dict(nb_epochs=[10,20,30])
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, Y)

一旦完成,你可以訪問網格搜尋的輸出,該輸出來自結果物件,由grid.fit()返回。best_score_成員提供優化過程期間觀察到的最好的評分, best_params_描述了已取得最佳結果的引數的組合。

您可以在 scikit-learn API文件 中瞭解更多關於GridSearchCV類的知識。


現在我們知道了如何使用scikit-learn 的Keras模型,如何使用scikit-learn 的網格搜尋。現在一起看看下面的例子。

所有的例子都將在一個小型的標準機器學習資料集上來演示,該資料集被稱為 Pima Indians onset of diabetes 分類資料集 。該小型資料集包括了所有容易工作的數值屬性。

下載資料集 ,並把它放置在你目前工作目錄下,命名為: pima-indians-diabetes.csv 。





  1. INFO (theano.gof.compilelock): Waiting for existing lock by process '55614' (I am process '55613')

  2. INFO (theano.gof.compilelock): To manually release the lock, delete ...








  1. # Use scikit-learn to grid search the batch size and epochs

  2. import numpy

  3. from sklearn.grid_search import GridSearchCV

  4. from keras.models import Sequential

  5. from keras.layers import Dense

  6. from keras.wrappers.scikit_learn import KerasClassifier

  7. # Function to create model, required for KerasClassifier

  8. def create_model():

  9. # create model

  10. model = Sequential()

  11. model.add(Dense(12, input_dim=8, activation='relu'))

  12. model.add(Dense(1, activation='sigmoid'))

  13. # Compile model

  14. model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

  15. return model

  16. # fix random seed for reproducibility

  17. seed = 7

  18. numpy.random.seed(seed)

  19. # load dataset

  20. dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

  21. # split into input (X) and output (Y) variables

  22. X = dataset[:,0:8]

  23. Y = dataset[:,8]

  24. # create model

  25. model = KerasClassifier(build_fn=create_model, verbose=0)

  26. # define the grid search parameters

  27. batch_size = [10, 20, 40, 60, 80, 100]

  28. epochs = [10, 50, 100]

  29. param_grid = dict(batch_size=batch_size, nb_epoch=epochs)

  30. grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

  31. grid_result = grid.fit(X, Y)

  32. # summarize results

  33. print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

  34. for params, mean_score, scores in grid_result.grid_scores_:

  35. print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))


Best: 0.686198 using {'nb_epoch': 100, 'batch_size': 20}
0.348958 (0.024774) with: {'nb_epoch': 10, 'batch_size': 10}
0.348958 (0.024774) with: {'nb_epoch': 50, 'batch_size': 10}
0.466146 (0.149269) with: {'nb_epoch': 100, 'batch_size': 10}
0.647135 (0.021236) with: {'nb_epoch': 10, 'batch_size': 20}
0.660156 (0.014616) with: {'nb_epoch': 50, 'batch_size': 20}
0.686198 (0.024774) with: {'nb_epoch': 100, 'batch_size': 20}
0.489583 (0.075566) with: {'nb_epoch': 10, 'batch_size': 40}
0.652344 (0.019918) with: {'nb_epoch': 50, 'batch_size': 40}
0.654948 (0.027866) with: {'nb_epoch': 100, 'batch_size': 40}
0.518229 (0.032264) with: {'nb_epoch': 10, 'batch_size': 60}
0.605469 (0.052213) with: {'nb_epoch': 50, 'batch_size': 60}
0.665365 (0.004872) with: {'nb_epoch': 100, 'batch_size': 60}
0.537760 (0.143537) with: {'nb_epoch': 10, 'batch_size': 80}
0.591146 (0.094954) with: {'nb_epoch': 50, 'batch_size': 80}
0.658854 (0.054904) with: {'nb_epoch': 100, 'batch_size': 80}
0.402344 (0.107735) with: {'nb_epoch': 10, 'batch_size': 100}
0.652344 (0.033299) with: {'nb_epoch': 50, 'batch_size': 100}
0.542969 (0.157934) with: {'nb_epoch': 100, 'batch_size': 100}

我們可以看到,批尺寸為20、100 epochs能夠獲得最好的結果,精確度約68%。





在這裡,我們將評估 Keras API支援的整套優化演算法 。


  1. # Use scikit-learn to grid search the batch size and epochs

  2. import numpy

  3. from sklearn.grid_search import GridSearchCV

  4. from keras.models import Sequential

  5. from keras.layers import Dense

  6. from keras.wrappers.scikit_learn import KerasClassifier

  7. # Function to create model, required for KerasClassifier

  8. def create_model(optimizer='adam'):

  9. # create model

  10. model = Sequential()

  11. model.add(Dense(12, input_dim=8, activation='relu'))

  12. model.add(Dense(1, activation='sigmoid'))

  13. # Compile model

  14. model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

  15. return model

  16. # fix random seed for reproducibility

  17. seed = 7

  18. numpy.random.seed(seed)

  19. # load dataset

  20. dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

  21. # split into input (X) and output (Y) variables

  22. X = dataset[:,0:8]

  23. Y = dataset[:,8]

  24. # create model

  25. model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)

  26. # define the grid search parameters

  27. optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']

  28. param_grid = dict(optimizer=optimizer)

  29. grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

  30. grid_result = grid.fit(X, Y)

  31. # summarize results

  32. print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

  33. for params, mean_score, scores in grid_result.grid_scores_:

  34. print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))


Best: 0.704427 using {'optimizer': 'Adam'}
0.348958 (0.024774) with: {'optimizer': 'SGD'}
0.348958 (0.024774) with: {'optimizer': 'RMSprop'}
0.471354 (0.156586) with: {'optimizer': 'Adagrad'}
0.669271 (0.029635) with: {'optimizer': 'Adadelta'}
0.704427 (0.031466) with: {'optimizer': 'Adam'}
0.682292 (0.016367) with: {'optimizer': 'Adamax'}
0.703125 (0.003189) with: {'optimizer': 'Nadam'}



預先選擇一個優化演算法來訓練你的網路和引數調整是十分常見的。目前,最常用的優化演算法是普通的隨機梯度下降法(Stochastic Gradient Descent,SGD),因為它十分易於理解。在本例中,我們將著眼於優化SGD的學習速率和動量因子(momentum)。



一般來說,在優化演算法中包含epoch的數目是一個好主意,因為每批(batch)學習量(學習速率)、每個 epoch更新的數目(批尺寸)和 epoch的數量之間都具有相關性。


  1. # Use scikit-learn to grid search the learning rate and momentum

  2. import numpy

  3. from sklearn.grid_search import GridSearchCV

  4. from keras.models import Sequential

  5. from keras.layers import Dense

  6. from keras.wrappers.scikit_learn import KerasClassifier

  7. from keras.optimizers import SGD

  8. # Function to create model, required for KerasClassifier

  9. def create_model(learn_rate=0.01, momentum=0):

  10. # create model

  11. model = Sequential()

  12. model.add(Dense(12, input_dim=8, activation='relu'))

  13. model.add(Dense(1, activation='sigmoid'))

  14. # Compile model

  15. optimizer = SGD(lr=learn_rate, momentum=momentum)

  16. model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

  17. return model

  18. # fix random seed for reproducibility

  19. seed = 7

  20. numpy.random.seed(seed)

  21. # load dataset

  22. dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

  23. # split into input (X) and output (Y) variables

  24. X = dataset[:,0:8]

  25. Y = dataset[:,8]

  26. # create model

  27. model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)

  28. # define the grid search parameters

  29. learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]

  30. momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]

  31. param_grid = dict(learn_rate=learn_rate, momentum=momentum)

  32. grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

  33. grid_result = grid.fit(X, Y)

  34. # summarize results

  35. print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

  36. for params, mean_score, scores in grid_result.grid_scores_:

  37. print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))


Best: 0.680990 using {'learn_rate': 0.01, 'momentum': 0.0}
0.348958 (0.024774) with: {'learn_rate': 0.001, 'momentum': 0.0}
0.348958 (0.024774) with: {'learn_rate': 0.001, 'momentum': 0.2}
0.467448 (0.151098) with: {'learn_rate': 0.001, 'momentum': 0.4}
0.662760 (0.012075) with: {'learn_rate': 0.001, 'momentum': 0.6}
0.669271 (0.030647) with: {'learn_rate': 0.001, 'momentum': 0.8}
0.666667 (0.035564) with: {'learn_rate': 0.001, 'momentum': 0.9}
0.680990 (0.024360) with: {'learn_rate': 0.01, 'momentum': 0.0}
0.677083 (0.026557) with: {'learn_rate': 0.01, 'momentum': 0.2}
0.427083 (0.134575) with: {'learn_rate': 0.01, 'momentum': 0.4}
0.427083 (0.134575) with: {'learn_rate': 0.01, 'momentum': 0.6}
0.544271 (0.146518) with: {'learn_rate': 0.01, 'momentum': 0.8}
0.651042 (0.024774) with: {'learn_rate': 0.01, 'momentum': 0.9}
0.651042 (0.024774) with: {'learn_rate': 0.1, 'momentum': 0.0}
0.651042 (0.024774) with: {'learn_rate': 0.1, 'momentum': 0.2}
0.572917 (0.134575) with: {'learn_rate': 0.1, 'momentum': 0.4}
0.572917 (0.134575) with: {'learn_rate': 0.1, 'momentum': 0.6}
0.651042 (0.024774) with: {'learn_rate': 0.1, 'momentum': 0.8}
0.651042 (0.024774) with: {'learn_rate': 0.1, 'momentum': 0.9}
0.533854 (0.149269) with: {'learn_rate': 0.2, 'momentum': 0.0}
0.427083 (0.134575) with: {'learn_rate': 0.2, 'momentum': 0.2}
0.427083 (0.134575) with: {'learn_rate': 0.2, 'momentum': 0.4}
0.651042 (0.024774) with: {'learn_rate': 0.2, 'momentum': 0.6}
0.651042 (0.024774) with: {'learn_rate': 0.2, 'momentum': 0.8}
0.651042 (0.024774) with: {'learn_rate': 0.2, 'momentum': 0.9}
0.455729 (0.146518) with: {'learn_rate': 0.3, 'momentum': 0.0}
0.455729 (0.146518) with: {'learn_rate': 0.3, 'momentum': 0.2}
0.455729 (0.146518) with: {'learn_rate': 0.3, 'momentum': 0.4}
0.348958 (0.024774) with: {'learn_rate': 0.3, 'momentum': 0.6}
0.348958 (0.024774) with: {'learn_rate': 0.3, 'momentum': 0.8}
0.348958 (0.024774) with: {'learn_rate': 0.3, 'momentum': 0.9}




現在,有許多不同的技術可供選擇。 點選此處檢視Keras 提供的清單 。




  1. # Use scikit-learn to grid search the weight initialization

  2. import numpy

  3. from sklearn.grid_search import GridSearchCV

  4. from keras.models import Sequential

  5. from keras.layers import Dense

  6. from keras.wrappers.scikit_learn import KerasClassifier

  7. # Function to create model, required for KerasClassifier

  8. def create_model(init_mode='uniform'):

  9. # create model

  10. model = Sequential()

  11. model.add(Dense(12, input_dim=8, init=init_mode, activation='relu'))

  12. model.add(Dense(1, init=init_mode, activation='sigmoid'))

  13. # Compile model

  14. model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

  15. return model

  16. # fix random seed for reproducibility

  17. seed = 7

  18. numpy.random.seed(seed)

  19. # load dataset

  20. dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

  21. # split into input (X) and output (Y) variables

  22. X = dataset[:,0:8]

  23. Y = dataset[:,8]

  24. # create model

  25. model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)

  26. # define the grid search parameters

  27. init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']

  28. param_grid = dict(init_mode=init_mode)

  29. grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

  30. grid_result = grid.fit(X, Y)

  31. # summarize results

  32. print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

  33. for params, mean_score, scores in grid_result.grid_scores_:

  34. print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))


Best: 0.720052 using {'init_mode': 'uniform'}
0.720052 (0.024360) with: {'init_mode': 'uniform'}
0.348958 (0.024774) with: {'init_mode': 'lecun_uniform'}
0.712240 (0.012075) with: {'init_mode': 'normal'}
0.651042 (0.024774) with: {'init_mode': 'zero'}
0.700521 (0.010253) with: {'init_mode': 'glorot_normal'}
0.674479 (0.011201) with: {'init_mode': 'glorot_uniform'}
0.661458 (0.028940) with: {'init_mode': 'he_normal'}
0.678385 (0.004872) with: {'init_mode': 'he_uniform'}

我們可以看到,當採用均勻權值初始化方案(uniform weight initialization )時取得最好的結果,可以實現約72%的效能。



通常來說,整流器(rectifier)的啟用功能是最受歡迎的,但應對不同的問題, sigmoid函式和tanh 函式可能是更好的選擇。

在本例中,我們將探討、評估、比較 Keras提供的不同型別的啟用函式 。我們僅在隱層中使用這些函式。考慮到二元分類問題,需要在輸出層使用sigmoid啟用函式。



  1. # Use scikit-learn to grid search the activation function

  2. import numpy

  3. from sklearn.grid_search import GridSearchCV

  4. from keras.models import Sequential

  5. from keras.layers import Dense

  6. from keras.wrappers.scikit_learn import KerasClassifier

  7. # Function to create model, required for KerasClassifier

  8. def create_model(activation='relu'):

  9. # create model

  10. model = Sequential()

  11. model.add(Dense(12, input_dim=8, init='uniform', activation=activation))

  12. model.add(Dense(1, init='uniform', activation='sigmoid'))

  13. # Compile model

  14. model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

  15. return model

  16. # fix random seed for reproducibility

  17. seed = 7

  18. numpy.random.seed(seed)

  19. # load dataset

  20. dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

  21. # split into input (X) and output (Y) variables

  22. X = dataset[:,0:8]

  23. Y = dataset[:,8]

  24. # create model

  25. model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)

  26. # define the grid search parameters

  27. activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']

  28. param_grid = dict(activation=activation)

  29. grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

  30. grid_result = grid.fit(X, Y)

  31. # summarize results

  32. print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

  33. for params, mean_score, scores in grid_result.grid_scores_:

  34. print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))


Best: 0.722656 using {'activation': 'linear'}
0.649740 (0.009744) with: {'activation': 'softmax'}
0.720052 (0.032106) with: {'activation': 'softplus'}
0.688802 (0.019225) with: {'activation': 'softsign'}
0.720052 (0.018136) with: {'activation': 'relu'}
0.691406 (0.019401) with: {'activation': 'tanh'}
0.680990 (0.009207) with: {'activation': 'sigmoid'}
0.691406 (0.014616) with: {'activation': 'hard_sigmoid'}
0.722656 (0.003189) with: {'activation': 'linear'}





它涉及到擬合dropout率和權值約束。我們選定dropout percentages取值範圍是:0.0-0.9(1.0無意義);最大範數權值約束( maxnorm weight constraint)的取值範圍是0-5。


  1. # Use scikit-learn to grid search the dropout rate

  2. import numpy

  3. from sklearn.grid_search import GridSearchCV

  4. from keras.models import Sequential

  5. from keras.layers import Dense

  6. from keras.layers import Dropout

  7. from keras.wrappers.scikit_learn import KerasClassifier

  8. from keras.constraints import maxnorm

  9. # Function to create model, required for KerasClassifier

  10. def create_model(dropout_rate=0.0, weight_constraint=0):

  11. # create model

  12. model = Sequential()

  13. model.add(Dense(12, input_dim=8, init='uniform', activation='linear', W_constraint=maxnorm(weight_constraint)))

  14. model.add(Dropout(dropout_rate))

  15. model.add(Dense(1, init='uniform', activation='sigmoid'))

  16. # Compile model

  17. model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

  18. return model

  19. # fix random seed for reproducibility

  20. seed = 7

  21. numpy.random.seed(seed)

  22. # load dataset

  23. dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

  24. # split into input (X) and output (Y) variables

  25. X = dataset[:,0:8]

  26. Y = dataset[:,8]

  27. # create model

  28. model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)

  29. # define the grid search parameters

  30. weight_constraint = [1, 2, 3, 4, 5]

  31. dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

  32. param_grid = dict(dropout_rate=dropout_rate, weight_constraint=weight_constraint)

  33. grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

  34. grid_result = grid.fit(X, Y)

  35. # summarize results

  36. print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

  37. for params, mean_score, scores in grid_result.grid_scores_:

  38. print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))


Best: 0.723958 using {'dropout_rate': 0.2, 'weight_constraint': 4}
0.696615 (0.031948) with: {'dropout_rate': 0.0, 'weight_constraint': 1}
0.696615 (0.031948) with: {'dropout_rate': 0.0, 'weight_constraint': 2}
0.691406 (0.026107) with: {'dropout_rate': 0.0, 'weight_constraint': 3}
0.708333 (0.009744) with: {'dropout_rate': 0.0, 'weight_constraint': 4}
0.708333 (0.009744) with: {'dropout_rate': 0.0, 'weight_constraint': 5}
0.710937 (0.008438) with: {'dropout_rate': 0.1, 'weight_constraint': 1}
0.709635 (0.007366) with: {'dropout_rate': 0.1, 'weight_constraint': 2}
0.709635 (0.007366) with: {'dropout_rate': 0.1, 'weight_constraint': 3}
0.695312 (0.012758) with: {'dropout_rate': 0.1, 'weight_constraint': 4}
0.695312 (0.012758) with: {'dropout_rate': 0.1, 'weight_constraint': 5}
0.701823 (0.017566) with: {'dropout_rate': 0.2, 'weight_constraint': 1}
0.710938 (0.009568) with: {'dropout_rate': 0.2, 'weight_constraint': 2}
0.710938 (0.009568) with: {'dropout_rate': 0.2, 'weight_constraint': 3}
0.723958 (0.027126) with: {'dropout_rate': 0.2, 'weight_constraint': 4}
0.718750 (0.030425) with: {'dropout_rate': 0.2, 'weight_constraint': 5}
0.721354 (0.032734) with: {'dropout_rate': 0.3, 'weight_constraint': 1}
0.707031 (0.036782) with: {'dropout_rate': 0.3, 'weight_constraint': 2}
0.707031 (0.036782) with: {'dropout_rate': 0.3, 'weight_constraint': 3}
0.694010 (0.019225) with: {'dropout_rate': 0.3, 'weight_constraint': 4}
0.709635 (0.006639) with: {'dropout_rate': 0.3, 'weight_constraint': 5}
0.704427 (0.008027) with: {'dropout_rate': 0.4, 'weight_constraint': 1}
0.717448 (0.031304) with: {'dropout_rate': 0.4, 'weight_constraint': 2}
0.718750 (0.030425) with: {'dropout_rate': 0.4, 'weight_constraint': 3}
0.718750 (0.030425) with: {'dropout_rate': 0.4, 'weight_constraint': 4}
0.722656 (0.029232) with: {'dropout_rate': 0.4, 'weight_constraint': 5}
0.720052 (0.028940) with: {'dropout_rate': 0.5, 'weight_constraint': 1}
0.703125 (0.009568) with: {'dropout_rate': 0.5, 'weight_constraint': 2}
0.716146 (0.029635) with: {'dropout_rate': 0.5, 'weight_constraint': 3}
0.709635 (0.008027) with: {'dropout_rate': 0.5, 'weight_constraint': 4}
0.703125 (0.011500) with: {'dropout_rate': 0.5, 'weight_constraint': 5}
0.707031 (0.017758) with: {'dropout_rate': 0.6, 'weight_constraint': 1}
0.701823 (0.018688) with: {'dropout_rate': 0.6, 'weight_constraint': 2}
0.701823 (0.018688) with: {'dropout_rate': 0.6, 'weight_constraint': 3}
0.690104 (0.027498) with: {'dropout_rate': 0.6, 'weight_constraint': 4}
0.695313 (0.022326) with: {'dropout_rate': 0.6, 'weight_constraint': 5}
0.697917 (0.014382) with: {'dropout_rate': 0.7, 'weight_constraint': 1}
0.697917 (0.014382) with: {'dropout_rate': 0.7, 'weight_constraint': 2}
0.687500 (0.008438) with: {'dropout_rate': 0.7, 'weight_constraint': 3}
0.704427 (0.011201) with: {'dropout_rate': 0.7, 'weight_constraint': 4}
0.696615 (0.016367) with: {'dropout_rate': 0.7, 'weight_constraint': 5}
0.680990 (0.025780) with: {'dropout_rate': 0.8, 'weight_constraint': 1}
0.699219 (0.019401) with: {'dropout_rate': 0.8, 'weight_constraint': 2}
0.701823 (0.015733) with: {'dropout_rate': 0.8, 'weight_constraint': 3}
0.684896 (0.023510) with: {'dropout_rate': 0.8, 'weight_constraint': 4}
0.696615 (0.017566) with: {'dropout_rate': 0.8, 'weight_constraint': 5}
0.653646 (0.034104) with: {'dropout_rate': 0.9, 'weight_constraint': 1}
0.677083 (0.012075) with: {'dropout_rate': 0.9, 'weight_constraint': 2}
0.679688 (0.013902) with: {'dropout_rate': 0.9, 'weight_constraint': 3}
0.669271 (0.017566) with: {'dropout_rate': 0.9, 'weight_constraint': 4}
0.669271 (0.012075) with: {'dropout_rate': 0.9, 'weight_constraint': 5}

我們可以看到,當 dropout率為0.2%、最大範數權值約束( maxnorm weight constraint)取值為4時,可以取得準確率約為72%的最好結果。





一個大型網路要求更多的訓練,此外,至少批尺寸(batch size)和 epoch的數量應該與神經元的數量優化。


  1. # Use scikit-learn to grid search the number of neurons

  2. import numpy

  3. from sklearn.grid_search import GridSearchCV

  4. from keras.models import Sequential

  5. from keras.layers import Dense

  6. from keras.layers import Dropout

  7. from keras.wrappers.scikit_learn import KerasClassifier

  8. from keras.constraints import maxnorm

  9. # Function to create model, required for KerasClassifier

  10. def create_model(neurons=1):

  11. # create model

  12. model = Sequential()

  13. model.add(Dense(neurons, input_dim=8, init='uniform', activation='linear', W_constraint=maxnorm(4)))

  14. model.add(Dropout(0.2))

  15. model.add(Dense(1, init='uniform', activation='sigmoid'))

  16. # Compile model

  17. model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

  18. return model

  19. # fix random seed for reproducibility

  20. seed = 7

  21. numpy.random.seed(seed)

  22. # load dataset

  23. dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")

  24. # split into input (X) and output (Y) variables

  25. X = dataset[:,0:8]

  26. Y = dataset[:,8]

  27. # create model

  28. model = KerasClassifier(build_fn=create_model, nb_epoch=100, batch_size=10, verbose=0)

  29. # define the grid search parameters

  30. neurons = [1, 5, 10, 15, 20, 25, 30]

  31. param_grid = dict(neurons=neurons)

  32. grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)

  33. grid_result = grid.fit(X, Y)

  34. # summarize results

  35. print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

  36. for params, mean_score, scores in grid_result.grid_scores_:

  37. print("%f (%f) with: %r" % (scores.mean(), scores.std(), params))


Best: 0.714844 using {'neurons': 5}
0.700521 (0.011201) with: {'neurons': 1}
0.714844 (0.011049) with: {'neurons': 5}
0.712240 (0.017566) with: {'neurons': 10}
0.705729 (0.003683) with: {'neurons': 15}
0.696615 (0.020752) with: {'neurons': 20}
0.713542 (0.025976) with: {'neurons': 25}
0.705729 (0.008027) with: {'neurons': 30}




  • K層交叉檢驗(k-fold Cross Validation), 你可以看到,本文中的不同示例的結果存在一些差異。使用了預設的3層交叉驗證,但也許K=5或者K=10時會更加穩定。認真選擇您的交叉驗證配置,以確保您的結果是穩定的。
  • 審查整個網路。 不要只注意最好的結果,審查整個網路的結果,並尋找支援配置決策的趨勢。
  • 並行(Parallelize), 如果可以,使用全部的CPU,神經網路訓練十分緩慢,並且我們經常想嘗試不同的引數。參考AWS例項。
  • 使用資料集的樣本。 由於神經網路的訓練十分緩慢,嘗試訓練在您訓練資料集中較小樣本,得到總方向的一般引數即可,並非追求最佳的配置。
  • 從粗網格入手。 從粗粒度網格入手,並且一旦縮小範圍,就細化為細粒度網格。
  • 不要傳遞結果。 結果通常是特定問題。儘量避免在每一個新問題上都採用您最喜歡的配置。你不可能將一個問題的最佳結果轉移到另一個問題之上。相反地,你應該歸納更廣泛的趨勢,例如層的數目或者是引數之間的關係。
  • 再現性(Reproducibility)是一個問題。 在NumPy中,儘管我們為隨機數發生器設定了種子,但結果並非百分百重現。網格搜尋wrapped Keras模型將比本文中所示Keras模型展現更多可重複性(reproducibility)。




  • 如何包裝Keras模型以便在scikit-learn使用以及如何使用網格搜尋。
  • 如何網格搜尋Keras 模型中不同標準的神經網路引數。
  • 如何設計自己的超引數優化實驗。