1. 程式人生 > >基於PyTorch的深度學習入門教程(七)——PyTorch重點綜合實踐

基於PyTorch的深度學習入門教程(七)——PyTorch重點綜合實踐

前言
PyTorch提供了兩個主要特性:
(1) 一個n維的Tensor,與numpy相似但是支援GPU運算。
(2) 搭建和訓練神經網路的自動微分功能。
我們將會使用一個全連線的ReLU網路作為例項。該網路有一個隱含層,使用梯度下降來訓練,目標是最小化網路輸出和真實輸出之間的歐氏距離。

目錄

Tensors(張量)

Warm-up:numpy


在介紹PyTorch之前,我們先使用numpy來實現一個網路。
Numpy提供了一個n維陣列物件,以及操作這些陣列的函式。Numpy是一個通用的科學計算框架。它不是專門為計算圖、深度學習或者梯度計算而生,但是我們能用它來把一個兩層的網路擬合到隨機資料上,只要我們手動把numpy運算在網路上前向和反向執行即可。

# -*- coding: utf-8 -*-
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H) w2 = np.random.randn(H, D_out) learning_rate = 1e-6 for t in range(500): # Forward pass: compute predicted y h = x.dot(w1) h_relu = np.maximum(h, 0) y_pred = h_relu.dot(w2) # Compute and print loss loss = np.square(y_pred - y).sum() print(t, loss) # Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y) grad_w2 = h_relu.T.dot(grad_y_pred) grad_h_relu = grad_y_pred.dot(w2.T) grad_h = grad_h_relu.copy() grad_h[h < 0] = 0 grad_w1 = x.T.dot(grad_h) # Update weights w1 -= learning_rate * grad_w1 w2 -= learning_rate * grad_w2

PyTorch:Tensors


Numpy是一個了不起的框架,但是它很遺憾地不能支援GPU運算,無法對數值計算進行GPU加速。對於現在的深度神經網路,GPU一般能提供50倍以上的加速,所以numpy由於對GPU缺少支援,不能滿足深度神經網路的計算需求。
這裡介紹一下最基本的PyTorch概念:Tensor。一個PyTorch Tensor在概念上等價於numpy array:Tensor是一個n維的array,PyTorch提供了很多函式來在Tensors上進行運算。像numpy arrays一樣,PyTorch Tensors也不是為深度學習、計算圖、梯度而生;他們是一個科學計算的通用工具。
PyTorch Tensors可以利用GPU來加速數值計算。為了能在GPU上跑Tensor,我們只需要將它轉到新的資料型別。
我們使用PyTorch Tensors來擬合2層的網路。與上面的numpy例子一樣,我們需要手動執行網路上的前向和反向過程。

# -*- coding: utf-8 -*-

import torch


dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in).type(dtype)
y = torch.randn(N, D_out).type(dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H).type(dtype)
w2 = torch.randn(H, D_out).type(dtype)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

Autograd(自動梯度)

PyTorch:Variables and autograd (變數和自動梯度)


在上面的例子中,我們必須手動執行網路的前向和反向通道。對於一個兩層的小網路來說,手動反向執行不是什麼大事,但是對於大型網路來說,就非常費勁了。
幸運的是,我們可以使用自動微分來自動計算神經網路的反向通道。PyTorch的autograd 包就提供了此項功能。當使用autograd的時候,你的網路的前向通道定義一個計算圖(computational graph),圖中的節點(node)是Tensors,邊(edge)將會是根據輸入Tensor來產生輸出Tensor的函式。這個圖的反向傳播將會允許你很輕鬆地去計算梯度。
這個聽起來複雜,但是實際操作非常簡單。我們把PyTorch Tensors打包到Variable 物件中,一個Variable代表一個計算圖中的節點。如果x是一個Variable,那麼x. data 就是一個Tensor 。並且x.grad是另一個Variable,該Variable保持了x相對於某個標量值得梯度。
PyTorch的Variable具有與PyTorch Tensors相同的API。差不多所有適用於Tensor的運算都能適用於Variables。區別在於,使用Variables定義一個計算圖,令我們可以自動計算梯度。
下面我們使用PyTorch 的Variables和自動梯度來執行我們的兩層的神經網路。我們不再需要手動執行網路的反向通道了。

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs, and wrap them in Variables.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Variables during the backward pass.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Variables during the backward pass.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y using operations on Variables; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Variables.
    # Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
    # (1,); loss.data[0] is a scalar value holding the loss.
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data[0])

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Variables with requires_grad=True.
    # After this call w1.grad and w2.grad will be Variables holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Update weights using gradient descent; w1.data and w2.data are Tensors,
    # w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are
    # Tensors.
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

PyTorch : Defining new autograd functions(定義新的自動梯度函式)


在底層,每一個原始的自動梯度運算子實際上是兩個在Tensor上執行的函式。其中,forward函式計算從輸入Tensors獲得的輸出Tensors。而backward函式接收輸出Tensors相對於某個標量值的梯度,並且計算輸入Tensors相對於該相同標量值的梯度。
在PyTorch中,我們可以很容易地定義自己的自動梯度運算子。具體來講,就是先定義torch.autograd.Function的子類,然後實現forward和backward函式。之後我們就可以使用這個新的自動梯度運算子了。使用該運算子的方式是建立一個例項,並且像一個函式一樣去呼叫它,傳遞包含輸入資料的Variables。
在這個例子中,我們定義自己的定製自動梯度函式來執行ReLU非線性,然後使用它執行我們的兩層網路。

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable


class MyReLU(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.
    """

    def forward(self, input):
        """
        In the forward pass we receive a Tensor containing the input and return a
        Tensor containing the output. You can cache arbitrary Tensors for use in the
        backward pass using the save_for_backward method.
        """
        self.save_for_backward(input)
        return input.clamp(min=0)

    def backward(self, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = self.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input


dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Construct an instance of our MyReLU class to use in our network
    relu = MyReLU()

    # Forward pass: compute predicted y using operations on Variables; we compute
    # ReLU using our custom autograd operation.
    y_pred = relu(x.mm(w1)).mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data[0])

    # Use autograd to compute the backward pass.
    loss.backward()

    # Update weights using gradient descent
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

TensorFlow: Static Graphs(靜態圖)


PyTorch自動梯度看起來非常像TensorFlow:在兩個框架中,我們都定義計算圖,使用自動微分來計算梯度。兩者最大的不同就是TensorFlow的計算圖是靜態的,而PyTorch使用動態的計算圖。
在TensorFlow中,我們定義計算圖一次,然後重複執行這個相同的圖,可能會提供不同的輸入資料。而在PyTorch中,每一個前向通道定義一個新的計算圖。
靜態圖的好處在於你可以預先對圖進行優化。例如,一個框架可能要融合一些圖運算來提升效率,或者產生一個策略來將圖分佈到多個GPU或機器上。如果你重複使用相同的圖,前期優化的消耗就會被分攤開,因為相同的圖在多次重複執行。
靜態圖和動態圖的一個不同之處是控制流。對於一些模型,我們希望對每個資料點執行不同的計算。例如,一個遞迴神經網路可能對於每個資料點執行不同的時間步數,這個展開(unrolling)可以作為一個迴圈來實現。對於一個靜態圖,迴圈結構要作為圖的一部分。因此,TensorFlow提供了運算子(例如tf .scan)來把迴圈嵌入到圖當中。對於動態圖來說,情況更加簡單:既然我們為每個例子即時建立圖,我們可以使用正常的解釋流控制來為每個輸入執行不同的計算。
為了與上面的PyTorch自動梯度例項做對比,我們使用TensorFlow來擬合一個簡單的2層網路。

# -*- coding: utf-8 -*-
import tensorflow as tf
import numpy as np

# First we set up the computational graph:

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create placeholders for the input and target data; these will be filled
# with real data when we execute the graph.
x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))

# Create Variables for the weights and initialize them with random data.
# A TensorFlow Variable persists its value across executions of the graph.
w1 = tf.Variable(tf.random_normal((D_in, H)))
w2 = tf.Variable(tf.random_normal((H, D_out)))

# Forward pass: Compute the predicted y using operations on TensorFlow Tensors.
# Note that this code does not actually perform any numeric operations; it
# merely sets up the computational graph that we will later execute.
h = tf.matmul(x, w1)
h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, w2)

# Compute loss using operations on TensorFlow Tensors
loss = tf.reduce_sum((y - y_pred) ** 2.0)

# Compute gradient of the loss with respect to w1 and w2.
grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])

# Update the weights using gradient descent. To actually update the weights
# we need to evaluate new_w1 and new_w2 when executing the graph. Note that
# in TensorFlow the the act of updating the value of the weights is part of
# the computational graph; in PyTorch this happens outside the computational
# graph.
learning_rate = 1e-6
new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)

# Now we have built our computational graph, so we enter a TensorFlow session to
# actually execute the graph.
with tf.Session() as sess:
    # Run the graph once to initialize the Variables w1 and w2.
    sess.run(tf.global_variables_initializer())

    # Create numpy arrays holding the actual data for the inputs x and targets
    # y
    x_value = np.random.randn(N, D_in)
    y_value = np.random.randn(N, D_out)
    for _ in range(500):
        # Execute the graph many times. Each time it executes we want to bind
        # x_value to x and y_value to y, specified with the feed_dict argument.
        # Each time we execute the graph we want to compute the values for loss,
        # new_w1, and new_w2; the values of these Tensors are returned as numpy
        # arrays.
        loss_value, _, _ = sess.run([loss, new_w1, new_w2],
                                    feed_dict={x: x_value, y: y_value})
        print(loss_value)

nn module

PyTorch: nn


計算圖和自動梯度是非常強大的正規化,可用於定義複雜的運算子和自動求導數。然而,對於一個大型的網路來說,原始的自動梯度有點太低級別了。
在建立神經網路的時候,我們經常把計算安排在層(layers)中。某些層有可學習的引數,將會在學習中進行優化。
在TensorFlow中,Keras,TensorFlow-Slim和TFLearn這些包提供了原始計算圖之上的高階抽象,這對於構建神經網路大有裨益。
在PyTorch中, nn包服務於相同的目的。nn包定義了一系列Modules,大體上相當於神經網路的層。一個Module接收輸入Variables,計算輸出Variables,但是也可以保持一個內部狀態,例如包含了可學習引數的Variables。nn 包還定義了一系列在訓練神經網路時常用的損失函式。
在下面例子中,我們使用nn包來實現我們的兩層神經網路。

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Variables for its weight and bias.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(size_average=False)

learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Variable of input data to the Module and it produces
    # a Variable of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Variables containing the predicted and true
    # values of y, and the loss function returns a Variable containing the
    # loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.data[0])

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Variables with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Variable, so
    # we can access its data and gradients like we did before.
    for param in model.parameters():
        param.data -= learning_rate * param.grad.data

PyTorch: optim


目前,我們已經通過手動改變持有可學習引數的Variables的 .data成員來更新模型的權重。對於簡單的優化演算法(例如隨機梯度下降)來說這不是一個大的負擔,但是實際上我們經常使用更加複雜的優化器來訓練神經網路,例如AdaGrad, RMSProp, Adam等。
PyTorch的optim包將優化演算法進行抽象,並提供了常用的優化演算法的實現。
下面這個例子,我們將會使用 nn包來定義模型,使用optim包提供的Adam演算法來優化這個模型。

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables.
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(size_average=False)

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Variables it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.data[0])

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable weights
    # of the model)
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

PyTorch: Custom nn Modules (定製nn模組)


有時候,需要設定比現有模組序列更加複雜的模型。這時,你可以通過生成一個nn.Module的子類來定義一個forward。該forward可以使用其他的modules或者其他的自動梯度運算來接收輸入Variables,產生輸出Variables。
在這個例子中,我們實現兩層神經網路作為一個定製的Module子類。

# -*- coding: utf-8 -*-
import torch
from torch.autograd import Variable


class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Construct our model by instantiating the class defined above
model = TwoLayerNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

PyTorch: Control Flow + Weight Sharing (控制流+權重分享)


我們實現一個非常奇怪的模型來作為動態圖和權重分享的例子。這個模型是一個全連線的ReLU網路。每一個前向通道選擇一個1至4之間的隨機數,在很多隱含層中使用。多次使用相同的權重來計算最內層的隱含層。
這個模型我們使用正常的Python流控制來實現迴圈。在定義前向通道時,通過多次重複使用相同的Module來實現權重分享。
我們實現這個模型作為一個Module的子類。

# -*- coding: utf-8 -*-
import random
import torch
from torch.autograd import Variable


class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        In the constructor we construct three nn.Linear instances that we will use
        in the forward pass.
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
        For the forward pass of the model, we randomly choose either 0, 1, 2, or 3
        and reuse the middle_linear Module that many times to compute hidden layer
        representations.

        Since each forward pass builds a dynamic computation graph, we can use normal
        Python control-flow operators like loops or conditional statements when
        defining the forward pass of the model.

        Here we also see that it is perfectly safe to reuse the same Module many
        times when defining a computational graph. This is a big improvement from Lua
        Torch, where each Module could be used only once.
        """
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred


# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs, and wrap them in Variables
x = Variable(torch.randn(N, D_in))
y = Variable(torch.randn(N, D_out), requires_grad=False)

# Construct our model by instantiating the class defined above
model = DynamicNet(D_in, H, D_out)

# Construct our loss function and an Optimizer. Training this strange model with
# vanilla stochastic gradient descent is tough, so we use momentum
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x)

    # Compute and print loss
    loss = criterion(y_pred, y)
    print(t, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

總結
本文介紹了PyTorch中的重點模組和使用,對於開展之後的實戰練習非常重要。所以,我們需要認真練習一下本文的所有模組。最好手敲程式碼走一遍。