多層感知機（Multilayer Perceptron）

阿新 • • 發佈：2019-01-13

在本節中，假設你已經瞭解了使用邏輯迴歸進行MNIST分類。同時本節的所有程式碼可以在這裡下載.

下一個我們將在Theano中使用的結構是單隱層的多層感知機（MLP）。MLP可以被看作一個邏輯迴歸分類器。這個中間層被稱為隱藏層。一個單隱層對於MLP成為通用近似器是有效的。然而在後面，我們將講述使用多個隱藏層的好處，例如深度學習的前提。這個課程介紹了MLP，反向誤差傳導，如何訓練MLPs。

模型

一個多層感知機（或者說人工神經網路——ANN）,在只有一個隱藏層時可以被表示為如下的圖：

mlp_model_1

事實上，一個單隱藏層的MLP是一個如下的函式 mlp_model_2 ，其中x是輸入向量的維度，L是輸出向量的維度。我們用下面的公式來表示MLP模型：

mlp_model_3

其中b_1，W_1是輸出層到隱藏層的偏置向量和權值矩陣，s是該層的啟用函式。而b_2，W_2是隱藏層到輸出層的偏置向量和權值矩陣，G是該層的啟用函式。通常選擇s為sigmoid函式，G為softmax函式。
在訓練MLP模型的引數時，我們使用minibatch的隨機梯度下降，在獲得梯度後使用反向誤差傳導演算法來實現引數的訓練。由於Theano提供自動的微分，我們不需要在這個教程裡面談及這個方面。

從邏輯迴歸到多層感知機

本教程將專注於單隱藏層的MLP。我們以隱藏層的類的實現開始，如果要構建一個MLP，只需要在此基礎上新增一個邏輯迴歸就好。

class HiddenLayer 
(object):
    def __init__(self, rng, input, n_in, n_out, W=None, b=None,
                 activation=T.tanh):
        """
        Typical hidden layer of a MLP: units are fully-connected and have
        sigmoidal activation function. Weight matrix W is of shape (n_in,n_out)
        and the bias vector b is of shape (n_out,).

        NOTE : The nonlinearity used here is tanh

        Hidden unit activation is given by: tanh(dot(input,W) + b)

        :type rng: numpy.random.RandomState
        :param rng: a random number generator used to initialize weights

        :type input: theano.tensor.dmatrix
        :param input: a symbolic tensor of shape (n_examples, n_in)

        :type n_in: int
        :param n_in: dimensionality of input

        :type n_out: int
        :param n_out: number of hidden units

        :type activation: theano.Op or function
        :param activation: Non linearity to be applied in the hidden
                           layer
        """ 

        self.input = input

一個隱藏層的權值初始化，應當從基於啟用函式的均勻間隔中均勻取樣。對於sigmoid函式而言，這個間隔是 interval 。其中fan_in是第（i－1）層的單元數目，fan_out是第（i）層單元的數目，結論出自這裡。
這樣的初始化，保證了在訓練的早期，每個神經元都可以工作在它啟用函式的控制範圍內，從而使得資訊可以更簡單的前向傳導（從輸入到輸出的啟用）和後向傳導（從輸出到輸入的梯度）。

        # `W` is initialized with `W_values` which is uniformely sampled
        # from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden))
        # for tanh activation function
        # the output of uniform if converted using asarray to dtype
        # theano.config.floatX so that the code is runable on GPU
        # Note : optimal initialization of weights is dependent on the
        #        activation function used (among other things).
        #        For example, results presented in [Xavier10] suggest that you
        #        should use 4 times larger initial weights for sigmoid
        #        compared to tanh
        #        We have no info for other function, so we use the same as
        #        tanh.
        if W is None:
            W_values = numpy.asarray(
                rng.uniform(
                    low=-numpy.sqrt(6. / (n_in + n_out)),
                    high=numpy.sqrt(6. / (n_in + n_out)),
                    size=(n_in, n_out)
                ),
                dtype=theano.config.floatX
            )
            if activation == theano.tensor.nnet.sigmoid:
                W_values *= 4

            W = theano.shared(value=W_values, name='W', borrow=True)

        if b is None:
            b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
            b = theano.shared(value=b_values, name='b', borrow=True)

        self.W = W
        self.b = b

注意，我們通過會將一個給定的非線性函式作為隱藏層的啟用函式。預設是tanh函式，當然很多時候你可能需要其他函式。

        lin_output = T.dot(input, self.W) + self.b
        self.output = (
            lin_output if activation is None
            else activation(lin_output)
        )

如果你已經閱讀了上面的隱藏層輸出和使用邏輯迴歸進行MNIST分類。那麼你可以看下面的MLP類的實現了。

class MLP(object):
    """Multi-Layer Perceptron Class

    A multilayer perceptron is a feedforward artificial neural network model
    that has one layer or more of hidden units and nonlinear activations.
    Intermediate layers usually have as activation function tanh or the
    sigmoid function (defined here by a ``HiddenLayer`` class)  while the
    top layer is a softamx layer (defined here by a ``LogisticRegression``
    class).
    """

    def __init__(self, rng, input, n_in, n_hidden, n_out):
        """Initialize the parameters for the multilayer perceptron

        :type rng: numpy.random.RandomState
        :param rng: a random number generator used to initialize weights

        :type input: theano.tensor.TensorType
        :param input: symbolic variable that describes the input of the
        architecture (one minibatch)

        :type n_in: int
        :param n_in: number of input units, the dimension of the space in
        which the datapoints lie

        :type n_hidden: int
        :param n_hidden: number of hidden units

        :type n_out: int
        :param n_out: number of output units, the dimension of the space in
        which the labels lie

        """

        # Since we are dealing with a one hidden layer MLP, this will translate
        # into a HiddenLayer with a tanh activation function connected to the
        # LogisticRegression layer; the activation function can be replaced by
        # sigmoid or any other nonlinear function
        self.hiddenLayer = HiddenLayer(
            rng=rng,
            input=input,
            n_in=n_in,
            n_out=n_hidden,
            activation=T.tanh
        )

        # The logistic regression layer gets as input the hidden units
        # of the hidden layer
        self.logRegressionLayer = LogisticRegression(
            input=self.hiddenLayer.output,
            n_in=n_hidden,
            n_out=n_out
        )

在本節中，我們也使用L1/L2正則化（L1/L2正則化）。所以我們需要去計算W_1和W_2矩陣的L1正則和L2平方正則。

        # L1 norm ; one regularization option is to enforce L1 norm to
        # be small
        self.L1 = (
            abs(self.hiddenLayer.W).sum()
            + abs(self.logRegressionLayer.W).sum()
        )

        # square of L2 norm ; one regularization option is to enforce
        # square of L2 norm to be small
        self.L2_sqr = (
            (self.hiddenLayer.W ** 2).sum()
            + (self.logRegressionLayer.W ** 2).sum()
        )

        # negative log likelihood of the MLP is given by the negative
        # log likelihood of the output of the model, computed in the
        # logistic regression layer
        self.negative_log_likelihood = (
            self.logRegressionLayer.negative_log_likelihood
        )
        # same holds for the function computing the number of errors
        self.errors = self.logRegressionLayer.errors

        # the parameters of the model are the parameters of the two layer it is
        # made out of
        self.params = self.hiddenLayer.params + self.logRegressionLayer.params

在此之前，我們使用minibatch的隨機梯度下降來訓練這個模型。不同的是，我們現在在cost函式裡面添加了正則項。L1_reg和L2_reg可以控制權值矩陣的正則化。計算新cost的程式碼如下：

    # the cost we minimize during training is the negative log likelihood of
    # the model plus the regularization terms (L1 and L2); cost is expressed
    # here symbolically
    cost = (
        classifier.negative_log_likelihood(y)
        + L1_reg * classifier.L1
        + L2_reg * classifier.L2_sqr
    )

我們使用梯度來更新模型引數，這基本和邏輯迴歸裡面的一樣。我們從模型的params中獲取引數列表，然後分析它，並計算每一步的梯度。

    # compute the gradient of cost with respect to theta (sotred in params)
    # the resulting gradients will be stored in a list gparams
    gparams = [T.grad(cost, param) for param in classifier.params]

    # specify how to update the parameters of the model as a list of
    # (variable, update expression) pairs

    # given two list the zip A = [a1, a2, a3, a4] and B = [b1, b2, b3, b4] of
    # same length, zip generates a list C of same size, where each element
    # is a pair formed from the two lists :
    #    C = [(a1, b1), (a2, b2), (a3, b3), (a4, b4)]
    updates = [
        (param, param - learning_rate * gparam)
        for param, gparam in zip(classifier.params, gparams)
    ]

    # compiling a Theano function `train_model` that returns the cost, but
    # in the same time updates the parameter of the model based on the rules
    # defined in `updates`
    train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size],
            y: train_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

把它組合起來

已經解釋了所有的基本該概念，下面的程式碼就是一個完整的MLP類。

"""
This tutorial introduces the multilayer perceptron using Theano.

 A multilayer perceptron is a logistic regressor where
instead of feeding the input to the logistic regression you insert a
intermediate layer, called the hidden layer, that has a nonlinear
activation function (usually tanh or sigmoid) . One can use many such
hidden layers making the architecture deep. The tutorial will also tackle
the problem of MNIST digit classification.

.. math::

    f(x) = G( b^{(2)} + W^{(2)}( s( b^{(1)} + W^{(1)} x))),

References:

    - textbooks: "Pattern Recognition and Machine Learning" -
                 Christopher M. Bishop, section 5

"""
__docformat__ = 'restructedtext en'


import os
import sys
import time

import numpy

import theano
import theano.tensor as T


from logistic_sgd import LogisticRegression, load_data


# start-snippet-1
class HiddenLayer(object):
    def __init__(self, rng, input, n_in, n_out, W=None, b=None,
                 activation=T.tanh):
        """
        Typical hidden layer of a MLP: units are fully-connected and have
        sigmoidal activation function. Weight matrix W is of shape (n_in,n_out)
        and the bias vector b is of shape (n_out,).

        NOTE : The nonlinearity used here is tanh

        Hidden unit activation is given by: tanh(dot(input,W) + b)

        :type rng: numpy.random.RandomState
        :param rng: a random number generator used to initialize weights

        :type input: theano.tensor.dmatrix
        :param input: a symbolic tensor of shape (n_examples, n_in)

        :type n_in: int
        :param n_in: dimensionality of input

        :type n_out: int
        :param n_out: number of hidden units

        :type activation: theano.Op or function
        :param activation: Non linearity to be applied in the hidden
                           layer
        """
        self.input = input
        # end-snippet-1

        # `W` is initialized with `W_values` which is uniformely sampled
        # from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden))
        # for tanh activation function
        # the output of uniform if converted using asarray to dtype
        # theano.config.floatX so that the code is runable on GPU
        # Note : optimal initialization of weights is dependent on the
        #        activation function used (among other things).
        #        For example, results presented in [Xavier10] suggest that you
        #        should use 4 times larger initial weights for sigmoid
        #        compared to tanh
        #        We have no info for other function, so we use the same as
        #        tanh.
        if W is None:
            W_values = numpy.asarray(
                rng.uniform(
                    low=-numpy.sqrt(6. / (n_in + n_out)),
                    high=numpy.sqrt(6. / (n_in + n_out)),
                    size=(n_in, n_out)
                ),
                dtype=theano.config.floatX
            )
            if activation == theano.tensor.nnet.sigmoid:
                W_values *= 4

            W = theano.shared(value=W_values, name='W', borrow=True)

        if b is None:
            b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
            b = theano.shared(value=b_values, name='b', borrow=True)

        self.W = W
        self.b = b

        lin_output = T.dot(input, self.W) + self.b
        self.output = (
            lin_output if activation is None
            else activation(lin_output)
        )
        # parameters of the model
        self.params = [self.W, self.b]


# start-snippet-2
class MLP(object):
    """Multi-Layer Perceptron Class

    A multilayer perceptron is a feedforward artificial neural network model
    that has one layer or more of hidden units and nonlinear activations.
    Intermediate layers usually have as activation function tanh or the
    sigmoid function (defined here by a ``HiddenLayer`` class)  while the
    top layer is a softamx layer (defined here by a ``LogisticRegression``
    class).
    """

    def __init__(self, rng, input, n_in, n_hidden, n_out):
        """Initialize the parameters for the multilayer perceptron

        :type rng: numpy.random.RandomState
        :param rng: a random number generator used to initialize weights

        :type input: theano.tensor.TensorType
        :param input: symbolic variable that describes the input of the
        architecture (one minibatch)

        :type n_in: int
        :param n_in: number of input units, the dimension of the space in
        which the datapoints lie

        :type n_hidden: int
        :param n_hidden: number of hidden units

        :type n_out: int
        :param n_out: number of output units, the dimension of the space in
        which the labels lie

        """

        # Since we are dealing with a one hidden layer MLP, this will translate
        # into a HiddenLayer with a tanh activation function connected to the
        # LogisticRegression layer; the activation function can be replaced by
        # sigmoid or any other nonlinear function
        self.hiddenLayer = HiddenLayer(
            rng=rng,
            input=input,
            n_in=n_in,
            n_out=n_hidden,
            activation=T.tanh
        )

        # The logistic regression layer gets as input the hidden units
        # of the hidden layer
        self.logRegressionLayer = LogisticRegression(
            input=self.hiddenLayer.output,
            n_in=n_hidden,
            n_out=n_out
        )
        # end-snippet-2 start-snippet-3
        # L1 norm ; one regularization option is to enforce L1 norm to
        # be small
        self.L1 = (
            abs(self.hiddenLayer.W).sum()
            + abs(self.logRegressionLayer.W).sum()
        )

        # square of L2 norm ; one regularization option is to enforce
        # square of L2 norm to be small
        self.L2_sqr = (
            (self.hiddenLayer.W ** 2).sum()
            + (self.logRegressionLayer.W ** 2).sum()
        )

        # negative log likelihood of the MLP is given by the negative
        # log likelihood of the output of the model, computed in the
        # logistic regression layer
        self.negative_log_likelihood = (
            self.logRegressionLayer.negative_log_likelihood
        )
        # same holds for the function computing the number of errors
        self.errors = self.logRegressionLayer.errors

        # the parameters of the model are the parameters of the two layer it is
        # made out of
        self.params = self.hiddenLayer.params + self.logRegressionLayer.params
        # end-snippet-3


def test_mlp(learning_rate=0.01, L1_reg=0.00, L2_reg=0.0001, n_epochs=1000,
             dataset='mnist.pkl.gz', batch_size=20, n_hidden=500):
    """
    Demonstrate stochastic gradient descent optimization for a multilayer
    perceptron

    This is demonstrated on MNIST.

    :type learning_rate: float
    :param learning_rate: learning rate used (factor for the stochastic
    gradient

    :type L1_reg: float
    :param L1_reg: L1-norm's weight when added to the cost (see
    regularization)

    :type L2_reg: float
    :param L2_reg: L2-norm's weight when added to the cost (see
    regularization)

    :type n_epochs: int
    :param n_epochs: maximal number of epochs to run the optimizer

    :type dataset: string
    :param dataset: the path of the MNIST dataset file from
                 http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz


   """
    datasets = load_data(dataset)

    train_set_x, train_set_y = datasets[0]
    valid_set_x, valid_set_y = datasets[1]
    test_set_x, test_set_y = datasets[2]

    # compute number of minibatches for training, validation and testing
    n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size
    n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_size
    n_test_batches = test_set_x.get_value(borrow=True).shape[0] / batch_size

    ######################
    # BUILD ACTUAL MODEL #
    ######################
    print '... building the model'

    # allocate symbolic variables for the data
    index = T.lscalar()  # index to a [mini]batch
    x = T.matrix('x')  # the data is presented as rasterized images
    y = T.ivector('y')  # the labels are presented as 1D vector of
                        # [int] labels

    rng = numpy.random.RandomState(1234)

    # construct the MLP class
    classifier = MLP(
        rng=rng,
        input=x,
        n_in=28 * 28,
        n_hidden=n_hidden,
        n_out=10
    )

    # start-snippet-4
    # the cost we minimize during training is the negative log likelihood of
    # the model plus the regularization terms (L1 and L2); cost is expressed
    # here symbolically
    cost = (
        classifier.negative_log_likelihood(y)
        + L1_reg * classifier.L1
        + L2_reg * classifier.L2_sqr
    )
    # end-snippet-4

    # compiling a Theano function that computes the mistakes that are made
    # by the model on a minibatch
    test_model = theano.function(
        inputs=[index],
        outputs=classifier.errors(y),
        givens={
            x: test_set_x[index * batch_size:(index + 1) * batch_size],
            y: test_set_y[index * batch_size:(index + 1) * batch_size]
        }
    )

    validate_model = theano.function(
        inputs=[index],
        outputs=classifier.errors(y),
        givens={
            x: valid_set_x[index * batch_size:(index + 1) * batch_size],
            y: valid_set_y[index * batch_size:(index + 1) * batch_size]
        }
    )

    # start-snippet-5
    # compute the gradient of cost with respect to theta (sotred in params)
    # the resulting gradients will be stored in a list gparams
    gparams = [T.grad(cost, param) for param in classifier.params]

    # specify how to update the parameters of the model as a list of
    # (variable, update expression) pairs

    # given two list the zip A = [a1, a2, a3, a4] and B = [b1, b2, b3, b4] of
    # same length, zip generates a list C of same size, where each element
    # is a pair formed from the two lists :
    #    C = [(a1, b1), (a2, b2), (a3, b3), (a4, b4)]
    updates = [
        (param, param - learning_rate * gparam)
        for param, gparam in zip(classifier.params, gparams)
    ]

    # compiling a Theano function `train_model` that returns the cost, but
    # in the same time updates the parameter of the model based on the rules
    # defined in `updates`
    train_model = theano.function(
        inputs=[index],
        outputs=cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size],
            y: train_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )
    # end-snippet-5

    ###############
    # TRAIN MODEL #
    ###############
    print '... training'

    # early-stopping parameters
    patience = 10000  # look as this many examples regardless
    patience_increase = 2  # wait this much longer when a new best is
                           # found
    improvement_threshold = 0.995  # a relative improvement of this much is
                                   # considered significant
    validation_frequency = min(n_train_batches, patience / 2)
                                  # go through this many
                                  # minibatche before checking the network
                                  # on the validation set; in this case we
                                  # check every epoch

    best_validation_loss = numpy.inf
    best_iter = 0
    test_score = 0.
    start_time = time.clock()

    epoch = 0
    done_looping = False

    while (epoch < n_epochs) and (not done_looping):
        epoch = epoch + 1
        for minibatch_index in xrange(n_train_batches):

            minibatch_avg_cost = train_model(minibatch_index)
            # iteration number
            iter = (epoch - 1) * n_train_batches + minibatch_index

            if (iter + 1) % validation_frequency == 0:
                # compute zero-one loss on validation set
                validation_losses = [validate_model(i) for i
                                     in xrange(n_valid_batches)]
                this_validation_loss = numpy.mean(validation_losses)

                print(
                    'epoch %i, minibatch %i/%i, validation error %f %%' %
                    (
                        epoch,
                        minibatch_index + 1,
                        n_train_batches,
                        this_validation_loss * 100.
                    )
                )

                # if we got the best validation score until now
                if this_validation_loss < best_validation_loss:
                    #improve patience if loss improvement is good enough
                    if (
                        this_validation_loss < best_validation_loss *
                        improvement_threshold
                    ):
                        patience = max(patience, iter * patience_increase)

                    best_validation_loss = this_validation_loss
                    best_iter = iter

                    # test it on the test set
                    test_losses = [test_model(i) for i
                                   in xrange(n_test_batches)]
                    test_score = numpy.mean(test_losses)

                    print(('     epoch %i, minibatch %i/%i, test error of '
                           'best model %f %%') %
                          (epoch, minibatch_index + 1, n_train_batches,
                           test_score * 100.))

            if patience <= iter:
                done_looping = True
                break

    end_time = time.clock()
    print(('Optimization complete. Best validation score of %f %% '
           'obtained at iteration %i, with test performance %f %%') %
          (best_validation_loss * 100., best_iter + 1, test_score * 100.))
    print >> sys.stderr, ('The code for file ' +
                          os.path.split(__file__)[1] +
                          ' ran for %.2fm' % ((end_time - start_time) / 60.))


if __name__ == '__main__':
    test_mlp()

預計將會得到這樣的輸出：

Optimization complete. Best validation score of 1.690000 % obtained at iteration 2070000, with test performance 1.650000 %
The code for file mlp.py ran for 97.34m

在一臺Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz的機器上，這個程式碼跑了10.3 epoch/minute然後花了828 epochs得到了1.65%的測試錯誤率。
讀者也可以在這個頁面檢視MNIST的識別結果。

訓練MLPs的技巧

在上面的程式碼中國，有一些是不能進行梯度下降來優化的。嚴格意義上將，發現最優的超參集合是不可能的任務。第一，我們不能獨立的優化每一個引數。第二，我們不能很容易的求解所有引數的梯度（有些是離散的值，有些是實數）。第三，這個優化問題是非凸的，容易陷入區域性最優。
好訊息是，過去25年，研究者發明了一些在神經網路中選擇超引數的方法和規則。你可以在LeCun等人的Efficient BackPro中閱讀，這是一個好的綜述。這裡，我們將總結下我們的程式碼中用到的幾個重要的方法和技術。

非線性

最常見的就是sigmoid和tanh函式。在第4.4節中解釋的，非線性是關於原點對稱的，它傾向去輸出0均值的輸出（這是被期望的屬性）。根據我們的經驗，tanh(雙曲函式)擁有更好的收斂性。

權值初始化

在初始化權值的時候，我們一般需要它們在0附近，要足夠小（在啟用函式的近似線性區域可以獲得最大的梯度）。另一個特性，尤其對深度網路而言，是可以減小層與層之間的啟用函式的方差和反向傳導梯度的方差。這就可以讓資訊更好的向下和向上的傳導，減少層間差異。數學推倒，請看Xavier10。

學習率

有許多文獻專注在好的學習速率的選擇上。最簡單的方案就是選擇一個固定速率。經驗法則：嘗試對數間隔的值(0.1，001，。。)，然後縮小（對數）網路搜尋的範圍（你獲得最低驗證錯誤的區域）。
隨著時間的推移減小學習速率有時候也是一個好主意。一個簡單的方法是使用這個公式：u/(1+d*t)，u是初始速率（可以使用上面講的網格搜尋選擇），d是減小常量，用以控制學習速率，可以設為0.001或者更小，t是迭代次數或者時間。
4.7節講述了網路中每個引數學習速率選擇的方法，然後基於分類錯誤率自適應的選擇它們。

隱藏節點數

這個超引數是非常基於資料集的。模糊的來說就是，輸入分佈越複雜，去模擬它的網路就需要更大的容量，那麼隱藏單元的數目就要更大。事實上，一個層的權值矩陣就是可以直接度量的（輸入維度＊輸出維度）。
除非我們去使用正則選項（early-stopping或L1/L2懲罰），隱藏節點數和泛化表現的分佈圖，將呈現U型（即隱藏節點越多，在後期並不能提高泛化性）。

正則化引數

典型的方法是使用L1/L2正則化，同時lambda設為0.01，0.001等。儘管在我們之前提及的框架裡面，它並沒有顯著提高效能，但它仍然是一個值得探討的方法。

多層感知機（Multilayer Perceptron）

模型

從邏輯迴歸到多層感知機

把它組合起來

訓練MLPs的技巧

非線性

權值初始化

學習率

隱藏節點數

正則化引數

多層感知機（Multilayer Perceptron）

用pytorch實現多層感知機（MLP)（全連線神經網路FC）分類MNIST手寫數字體的識別

Deep learning with Theano 官方中文教程（翻譯）（三）——多層感知機（MLP）

TensorFlow HOWTO 4.1 多層感知機（分類）

多層感知機（MLP）演算法原理及Spark MLlib呼叫例項（Scala/Java/Python）

深度學習基礎（二）—— 從多層感知機（MLP）到卷積神經網路（CNN）

TensorFlow學習筆記（4）--實現多層感知機（MNIST資料集）

深度學習筆記二：多層感知機（MLP）與神經網路結構

MLlib--多層感知機（MLP）演算法原理及Spark MLlib呼叫例項（Scala/Java/Python）

多層感知機（MLP）

Keras簡單實現多層感知機（MLP）程式碼

MLP多層感知機（人工神經網路）原理及程式碼實現

DeepLearning tutorial（3）MLP多層感知機原理簡介+程式碼詳解

基於神經網路（多層感知機）識別手寫數字

Deeplearning4j 實戰（5）：基於多層感知機的Mnist壓縮以及在Spark實現

TensorFlow HOWTO 4.2 多層感知機迴歸（時間序列）

多層感知機-印第安人糖尿病診斷-基於keras的python學習筆記（一）

神經網路之多層感知機MLP的實現（Python+TensorFlow）

pytorch實戰（二）-多層感知機識別MNIST數字

深度學習Deeplearning4j 入門實戰（5）：基於多層感知機的Mnist壓縮以及在Spark實現

多層感知機（Multilayer Perceptron）

模型

從邏輯迴歸到多層感知機

把它組合起來

訓練MLPs的技巧

非線性

權值初始化

學習率

隱藏節點數

正則化引數

相關推薦