DL學習筆記【18】nn包中的各位Criterions
很多事情不是因為有多難才沒完成,只是因為沒有開始。come on,看好你喲!
參考自https://github.com/torch/nn/blob/master/doc/criterion.md
Criterions
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Classification criterions(分類準則)[output]forward(input, target)
計算該準則下的損失函式的值。output需要是標量
The state variable self.output should be updated after a call to forward().
[gradInput] backward(input, target)
The state variable self.gradInput should be updated after a call to backward().
BCECriterion
基於sigmoid的二進位制交叉熵(ClassNLLCriterion的二分類情況)
公式如下:
loss(o, t) = - 1/n sum_i (t[i] * log(o[i]) + (1 - t[i]) * log(1 - o[i]))
ClassNLLCriterion
criterion = nn.ClassNLLCriterion([weights])
如果要使用這個,就需要在網路最後一層新增logsoftmax層,如果不想額外新增layer,可以使用CrossEntropyCriterionThe loss can be described as:
loss(x, class) = -x[class]
網頁上的這一句看不明白,class是y麼?。。。(或許只要知道它和logsoftmax組合起來是交叉熵就ok了)
CrossEntropyCriterion
criterion = nn.CrossEntropyCriterion([weights])
用於多分類情況
The loss can be described as:
loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j]))) = -x[class] + log(\sum_j exp(x[j]))
通常將size average設定為false
crit = nn.CrossEntropyCriterion(weights)
crit.nll.sizeAverage = false
ClassSimplexCriterion
criterion = nn.ClassSimplexCriterion(nClasses)
該函式對於每一個類,學習一個embedding,embedding將極其稀疏的one-hot編碼的詞語進行降維。
在使用這個損失函式之前需要有這樣兩層(NormalizedLinearNoBias和Normalized),嗯。不太明白。。先記錄一下,有時間再研究咯 -- 在教程中有論文,如果想了解,可以去看看論文,比多元邏輯迴歸更魯棒
nInput = 10
nClasses = 30
nHidden = 100
mlp = nn.Sequential()
mlp:add(nn.Linear(nInput, nHidden)):add(nn.ReLU())
mlp:add(nn.NormalizedLinearNoBias(nHidden, nClasses))
mlp:add(nn.Normalize(2))
criterion = nn.ClassSimplexCriterion(nClasses)
function gradUpdate(mlp, x, y, learningRate)
pred = mlp:forward(x)
local err = criterion:forward(pred, y)
mlp:zeroGradParameters()
local t = criterion:backward(pred, y)
mlp:backward(x, t)
mlp:updateParameters(learningRate)
end
MarginCriterion
criterion = nn.MarginCriterion([margin])
二分類
例子程式碼:
function gradUpdate(mlp, x, y, criterion, learningRate)
local pred = mlp:forward(x)
local err = criterion:forward(pred, y)
local gradCriterion = criterion:backward(pred, y)
mlp:zeroGradParameters()
mlp:backward(x, gradCriterion)
mlp:updateParameters(learningRate)
end
mlp = nn.Sequential()
mlp:add(nn.Linear(5, 1))
x1 = torch.rand(5)
x1_target = torch.Tensor{1}
x2 = torch.rand(5)
x2_target = torch.Tensor{-1}
criterion=nn.MarginCriterion(1)
for i = 1, 1000 do
gradUpdate(mlp, x1, x1_target, criterion, 0.01)
gradUpdate(mlp, x2, x2_target, criterion, 0.01)
end
print(mlp:forward(x1))
print(mlp:forward(x2))
print(criterion:forward(mlp:forward(x1), x1_target))
print(criterion:forward(mlp:forward(x2), x2_target))
輸出:
1.0043
[torch.Tensor of dimension 1]
-1.0061
[torch.Tensor of dimension 1]
0
0
By default, the losses are averaged over observations for each minibatch. However, if the field sizeAverage is set to false, the losses are instead summed.
SoftMarginCriterion
criterion = nn.SoftMarginCriterion()
二分類
例子程式碼:function gradUpdate(mlp, x, y, criterion, learningRate)
local pred = mlp:forward(x)
local err = criterion:forward(pred, y)
local gradCriterion = criterion:backward(pred, y)
mlp:zeroGradParameters()
mlp:backward(x, gradCriterion)
mlp:updateParameters(learningRate)
end
mlp = nn.Sequential()
mlp:add(nn.Linear(5, 1))
x1 = torch.rand(5)
x1_target = torch.Tensor{1}
x2 = torch.rand(5)
x2_target = torch.Tensor{-1}
criterion=nn.SoftMarginCriterion(1)
for i = 1, 1000 do
gradUpdate(mlp, x1, x1_target, criterion, 0.01)
gradUpdate(mlp, x2, x2_target, criterion, 0.01)
end
print(mlp:forward(x1))
print(mlp:forward(x2))
print(criterion:forward(mlp:forward(x1), x1_target))
print(criterion:forward(mlp:forward(x2), x2_target))
輸出:
0.7471
[torch.DoubleTensor of size 1]
-0.9607
[torch.DoubleTensor of size 1]
0.38781049558836
0.32399356957564
MultiMarginCriterion
criterion = nn.MultiMarginCriterion(p, [weights], [margin])
多分類
使用時,前邊需要加這兩句:
mlp = nn.Sequential()
mlp:add(nn.Euclidean(n, m)) -- outputs a vector of distances
mlp:add(nn.MulConstant(-1)) -- distance to similarity
(公式還沒有細看,先知道是多分類就好啦。。。)MultiLabelMarginCriterion
criterion = nn.MultiLabelMarginCriterion()
一個物體屬於多個類別
程式碼例子:
criterion = nn.MultiLabelMarginCriterion()
input = torch.randn(2, 4)
target = torch.Tensor{{1, 3, 0, 0}, {4, 0, 0, 0}} -- zero-values are ignored
criterion:forward(input, target)
MultiLabelSoftMarginCriterion
criterion = nn.MultiLabelSoftMarginCriterion()
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Regression criterions
AbsCriterion
criterion = nn.AbsCriterion()
公式如下:
loss(x, y) = 1/n \sum |x_i - y_i|
如果x,y是d維的,也是除n,這樣計算其實是不對的,所以我們可以通過以下方法來避免
criterion = nn.AbsCriterion()
criterion.sizeAverage = false
(問:如果不除,全都加起來豈不是很大?這樣之後需要在後邊加別的來歸一化麼?)SmoothL1Criterion
criterion = nn.SmoothL1Criterion()
smooth version of AbsCriterion使用方法:
criterion = nn.SmoothL1Criterion()
criterion.sizeAverage = false
MSECriterion
criterion = nn.MSECriterion()
最小均方誤差使用方法:
criterion = nn.MSECriterion()
criterion.sizeAverage = false
SpatialAutoCropMSECriterion
criterion = nn.SpatialAutoCropMSECriterion()
如果目標和輸出差得比較大,那麼可以用這個。使用方法,之前已經解釋過很多次false,此處不解釋了哦:
criterion = nn.SpatialAutoCropMSECriterion()
criterion.sizeAverage = false
SpatialAutoCropMSECriterion
criterion = nn.SpatialAutoCropMSECriterion()
如果目標和輸出差得比較大,那麼可以用這個。使用方法,之前已經解釋過很多次false,此處不解釋了哦:
criterion = nn.SpatialAutoCropMSECriterion()
criterion.sizeAverage = false
DiskKLDivCriterion
criterion = nn.DistKLDivCriterion()
KL散度
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Embedding criterions (測量兩個輸入是否相似或者不相似)
HingeEmbeddingCriterion
criterion = nn.HingeEmbeddingCriterion([margin])
y=1改變weight和bias使得輸入的兩個tensor越來越近,y=-1輸入的兩個tensor越來越遠
(個人理解,可以使用l1距離,也可以自己設定距離,教程。。。寫錯了吧。。。只給一個tensor應該不能算距離吧)
L1HingeEmbeddingCriterion
criterion = nn.L1HingeEmbeddingCriterion([margin])
算輸入兩個向量的l1距離
CosineEmbeddingCriterion
criterion = nn.CosineEmbeddingCriterion([margin])
算cosine距離
DistanceRatioCriterion
criterion = nn.DistanceRatioCriterion(sizeAverage)
共三個向量,第一個是anchor,第二個和第一個相似,第三個和第一個不相似,公式如下:
loss = -log( exp(-Ds) / ( exp(-Ds) + exp(-Dd) ) )
程式碼如下(沒有看懂):
torch.setdefaulttensortype("torch.FloatTensor")
require 'nn'
-- triplet : with batchSize of 32 and dimensionality 512
sample = {torch.rand(32, 512), torch.rand(32, 512), torch.rand(32, 512)}
embeddingModel = nn.Sequential()
embeddingModel:add(nn.Linear(512, 96)):add(nn.ReLU())
tripleModel = nn.ParallelTable()
tripleModel:add(embeddingModel)
tripleModel:add(embeddingModel:clone('weight', 'bias',
'gradWeight', 'gradBias'))
tripleModel:add(embeddingModel:clone('weight', 'bias',
'gradWeight', 'gradBias'))
-- Similar sample distance w.r.t anchor sample
posDistModel = nn.Sequential()
posDistModel:add(nn.NarrowTable(1,2)):add(nn.PairwiseDistance())
-- Different sample distance w.r.t anchor sample
negDistModel = nn.Sequential()
negDistModel:add(nn.NarrowTable(2,2)):add(nn.PairwiseDistance())
distanceModel = nn.ConcatTable():add(posDistModel):add(negDistModel)
-- Complete Model
model = nn.Sequential():add(tripleModel):add(distanceModel)
-- DistanceRatioCriterion
criterion = nn.DistanceRatioCriterion(true)
-- Forward & Backward
output = model:forward(sample)
loss = criterion:forward(output)
dLoss = criterion:backward(output)
model:backward(sample, dLoss)
怎麼合在一起的。。怎麼連線的。。
96和32的關係是什麼情況
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Miscelaneus criterions(混合準則)
MultiCriterion
criterion = nn.MultiCriterion()
將多個準則放在一起。並賦予權重。程式碼如下:
input = torch.rand(2,10)
target = torch.IntTensor{1,8}
nll = nn.ClassNLLCriterion()
nll2 = nn.CrossEntropyCriterion()
mc = nn.MultiCriterion():add(nll, 0.5):add(nll2)
output = mc:forward(input, target)
ParallelCriterion
criterion = nn.ParallelCriterion([repeatTarget])
兩個輸入?兩個輸出?計算tensor中對應的損失然後按權重相加?
為什麼要這樣?可以用在哪裡呢?
MarginRankingCriterion
criterion = nn.MarginRankingCriterion(margin)
輸入3個tensor。
例子程式碼看不懂啊啊啊
p1_mlp = nn.Linear(5, 2)
p2_mlp = p1_mlp:clone('weight', 'bias')
prl = nn.ParallelTable()
prl:add(p1_mlp)
prl:add(p2_mlp)
mlp1 = nn.Sequential()
mlp1:add(prl)
mlp1:add(nn.DotProduct())
mlp2 = mlp1:clone('weight', 'bias')
mlpa = nn.Sequential()
prla = nn.ParallelTable()
prla:add(mlp1)
prla:add(mlp2)
mlpa:add(prla)
crit = nn.MarginRankingCriterion(0.1)
x=torch.randn(5)
y=torch.randn(5)
z=torch.randn(5)
-- Use a typical generic gradient update function
function gradUpdate(mlp, x, y, criterion, learningRate)
local pred = mlp:forward(x)
local err = criterion:forward(pred, y)
local gradCriterion = criterion:backward(pred, y)
mlp:zeroGradParameters()
mlp:backward(x, gradCriterion)
mlp:updateParameters(learningRate)
end
for i = 1, 100 do
gradUpdate(mlpa, {{x, y}, {x, z}}, 1, crit, 0.01)
if true then
o1 = mlp1:forward{x, y}[1]
o2 = mlp2:forward{x, z}[1]
o = crit:forward(mlpa:forward{{x, y}, {x, z}}, 1)
print(o1, o2, o)
end
end
print "--"
for i = 1, 100 do
gradUpdate(mlpa, {{x, y}, {x, z}}, -1, crit, 0.01)
if true then
o1 = mlp1:forward{x, y}[1]
o2 = mlp2:forward{x, z}[1]
o = crit:forward(mlpa:forward{{x, y}, {x, z}}, -1)
print(o1, o2, o)
end
end
第一個比第二個value更高?