Pytorch學習-自動求導
自動求導/自動微分
在離騷的資料裡面,求導對應為微分。
Central to all neural networks in PyTorch is the autograd package. Let’s first briefly visit this, and we will then go to training our first neural network.
Pytorch中所有的神經網路核心的包是autograd.讓我們簡單地瀏覽它,然後我們將去訓練我們的第一個神經網路。
The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.
對於張量tensor的所有操作,autograd包都提供自動求導的功能。這是一個define-by-run的框架,這意味著您的backprop由您的程式碼執行方式定義,而且每次迭代的都可以不同。
Let us see this in more simple terms with some examples.
讓我們來用一些例子來以更簡單的術語來看待這一點。
TENSOR
張量
torch.Tensor is the central class of the package. If you set its attribute .requires_grad as True, it starts to track all operations on it. When you finish your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.
torch.Tensor是包的核心類。假如你設定了它的 .requires_grad屬性為True,它將會開始跟蹤關於它的所有操作。當你完成了計算你可以呼叫.backward()函式然後得到所有自動自動算好的梯度。此時張量的梯度將累積到.grad屬性中。
To stop a tensor from tracking history, you can call .detach() to detach it from the computation history, and to prevent future computation from being tracked.
要阻止張量的跟蹤歷史記錄,你可以呼叫.detch()函式將其從計算曆史記錄中分離出來,並防止將來的計算被跟蹤。
To prevent tracking history (and using memory), you can also wrap the code block in with torch.no_grad():. This can be particularly helpful when evaluating a model because the model may have trainable parameters with requires_grad=True, but for which we don’t need the gradients.
要防止跟蹤計算曆史記錄(和使用記憶體),你還可以使用torch.no_grad()去包裝程式碼:在評估模型的時候,這可能特別有用,因為模型可能requires_grad=True的可訓練引數,但是我們並不需要。
There’s one more class which is very important for autograd implementation - a Function.
還有一個類對於自動求導的實現非常重要-一個函式
Tensor and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a .grad_fn attribute that references a Function that has created the Tensor (except for Tensors created by the user - their grad_fn is None).
Tensor和Function互相連線並構建一個非迴圈圖,它編碼完整的計算曆史.每個張量都有一個.grad_fn屬性,該屬性引用已建立Tensor的Function(除了使用者建立的張量 - 他們的grad_fn是None)
If you want to compute the derivatives, you can call .backward() on a Tensor. If Tensor is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.
如果要計算導數,可以在Tensor上呼叫.backward()。如果Tensor是一個標量(它只含有一個元素資料),你不需要為backward()指定任何引數,但是如果它有更多的元素,你需要指定一個梯度引數,它是一個匹配形狀的張量
import torch
Create a tensor and set requires_grad=True to track computation with it
建立一個張量並設定requires_grad=Ture以跟蹤它的計算
x = torch.ones(2, 2, requires_grad = True) print(x)
tensor([[1., 1.], [1., 1.]], requires_grad=True)
Do an operation of tensor:
做一個張量的操作:
y = x + 2 print(y)
tensor([[3., 3.], [3., 3.]], grad_fn=<AddBackward>)
#y was created as a result of an operation, so it has a grad_fn. print(y.grad_fn)
<AddBackward object at 0x00000192745E2240>
對進行更多的操作
z = y * y * 3 out = z.mean() print(z, out)
tensor([[27., 27.], [27., 27.]], grad_fn=<MulBackward>) tensor(27., grad_fn=<MeanBackward1>)
requires_grad_( … ) changes an existing Tensor’s requires_grad flag in-place. The input flag defaults to False if not given.
requires_grad_(…)就地更改現有的Tensor的requires_grad標誌。如果沒有給出,輸入標誌預設為False。
#建立一個張量 a = torch.randn(2, 2) #對張量進行如下計算 a = (a * 3)/(a - 1) #產看此時requires_grad的值 print(a.requires_grad) #賦予True a.requires_grad_(True) print(a.requires_grad) #再計算 b = (a * a).sum() print(b.grad_fn)
False True <SumBackward0 object at 0x0000019276E47208>
Let’s backprop now Because out contains a single scalar, out.backward() is equivalent to out.backward(torch.tensor(1)).\
現在讓我們回溯因為out包含一個標量,out.backward()等同於out.backward(torch.tensor(1))
out.backward()
print gradients d(out)/dx
列印梯度d(out)/dx
print(x.grad)
tensor([[4.5000, 4.5000], [4.5000, 4.5000]])
以下是上面倒數的計算過程
You should have got a matrix of 4.5. Let’s call the out Tensor “o”. We have that o=14∑izi, zi=3(xi+2)2 and zi∣∣xi=1=27. Therefore, ∂o∂xi=32(xi+2), hence ∂o∂xi∣∣xi=1=92=4.5.
You can do many crazy things with autograd!
你可以使用自動求導做很多瘋狂的事情!
#建立一個張量並且設定跟蹤計算 x = torch.randn(3, requires_grad = True) #計算 y = x + 2 while y.data.norm() < 1000: y = y * 2 print(y)
tensor([1091.3589, 1317.4091,271.5838], grad_fn=<MulBackward>)
y.data.norm()的解釋
In [15]: x = torch.randn(3, requires_grad=True) In [16]: y = x * 2 In [17]: y.data Out[17]: tensor([-1.2510, -0.6302,1.2898]) In [18]: y.data.norm() Out[18]: tensor(1.9041) # computing the norm using elementary operations In [19]: torch.sqrt(torch.sum(torch.pow(y, 2))) Out[19]: tensor(1.9041)
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float) #計算dy/d(gradients) y.backward(gradients) print(x.grad)
tensor([ 51.2000, 512.0000,0.0512])
print(x.requires_grad) print((x ** 2).requires_grad) with torch.no_grad(): print((x ** 2).requires_grad)
True True False