在神經網路中,我們通常會使用深度可分離卷積結構(depthwise separable convolution)。




這個例子就是深度可分離卷積的具體操作,其中上面的深度乘數(depth multiplier)設為1,這也是目前這類網路層的通用引數。


src convolution

  input                              output

M*N*Cin                      M*N*Cout


depthwise separable convolution

  input                       output1                          output2

M*N*Cin                 M*N*Cin                       M*N*Cout

               16*3*3                       16*32*1*1




MobileNet模型的核心就是將原本標準的卷積操作因式分解成一個depthwise convolution和一個1*1的pointwise convolution操作。簡單講就是將原來一個卷積層分成兩個卷積層,其中前面一個卷積層的每個filter都只跟input的每個channel進行卷積,然後後面一個卷積層則負責combining,即將上一層卷積的結果進行合併。 

depthwise convolution:


pointwise convolution:



第一層為常規卷積,後面接著都為depthwise convolution+pointwise convolution,最後兩層為Pool層和全連線層,總共28層.


# Tensorflow mandates these.
from collections import namedtuple
import functools

import tensorflow as tf

slim = tf.contrib.slim

# Conv and DepthSepConv namedtuple define layers of the MobileNet architecture
# Conv defines 3x3 convolution layers
# DepthSepConv defines 3x3 depthwise convolution followed by 1x1 convolution.
# stride is the stride of the convolution
# depth is the number of channels or filters in a layer
Conv = namedtuple('Conv', ['kernel', 'stride', 'depth'])
DepthSepConv = namedtuple('DepthSepConv', ['kernel', 'stride', 'depth'])

# _CONV_DEFS specifies the MobileNet body
    Conv(kernel=[3, 3], stride=2, depth=32),
    DepthSepConv(kernel=[3, 3], stride=1, depth=64),
    DepthSepConv(kernel=[3, 3], stride=2, depth=128),
    DepthSepConv(kernel=[3, 3], stride=1, depth=128),
    DepthSepConv(kernel=[3, 3], stride=2, depth=256),
    DepthSepConv(kernel=[3, 3], stride=1, depth=256),
    DepthSepConv(kernel=[3, 3], stride=2, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=2, depth=1024),
    DepthSepConv(kernel=[3, 3], stride=1, depth=1024)

input_size = 160
inputdepth = 3
conv_defs = _CONV_DEFS
sumcost = 0
for i, conv_def in enumerate(conv_defs):
    stride = conv_def.stride
    kernel = conv_def.kernel
    outdepth = conv_def.depth
    output_size = round((input_size - int(kernel[0] / 2) * 2) / stride)
    if isinstance(conv_def, Conv):
        sumcost += output_size * output_size * kernel[0] * kernel[0] * inputdepth * outdepth
    if isinstance(conv_def, DepthSepConv):
        sumcost += output_size * output_size * kernel[0] * kernel[0] * inputdepth * outdepth
    inputdepth = outdepth
    input_size = output_size
print("src conv:    ", sumcost)

input_size = 160
inputdepth = 3
conv_defs = _CONV_DEFS
sumcost1 = 0
for i, conv_def in enumerate(conv_defs):
    stride = conv_def.stride
    kernel = conv_def.kernel
    outdepth = conv_def.depth
    output_size = round((input_size - int(kernel[0] / 2) * 2) / stride)
    if isinstance(conv_def, Conv):
        sumcost1 += output_size * output_size * kernel[0] * kernel[0] * inputdepth * outdepth
    if isinstance(conv_def, DepthSepConv):
        #sumcost += output_size * output_size * kernel[0] * kernel[0] * inputdepth * outdepth
        sumcost1 += output_size * output_size *(inputdepth * kernel[0] * kernel[0]  + inputdepth * outdepth * 1 * 1)
    inputdepth = outdepth
    input_size = output_size
print("DepthSepConv:", sumcost1)
print("compare:", sumcost1 / sumcost)

src conv:            1045417824 DepthSepConv:   126373376 compare: 0.12088312739538674