機器翻譯評價指標之BLEU詳細計算過程

阿新 • • 發佈：2018-11-24

原文連線 https://blog.csdn.net/guolindonggld/article/details/56966200

1. 簡介

BLEU（Bilingual Evaluation Understudy），相信大家對這個評價指標的概念已經很熟悉，隨便百度谷歌就有相關介紹。原論文為BLEU: a Method for Automatic Evaluation of Machine Translation，IBM出品。

本文通過一個例子詳細介紹BLEU是如何計算以及NLTKnltk.align.bleu_score模組的原始碼。

首先祭出公式：

B L E U = B P \cdot e x p (\sum_{n = 1}^{N} w_{n} l o g P_{n})

其中，

B P = {\begin{cases} 1 & if c > r \\ e^{1 - r / c} & if c \leq r \end{cases}

注意這裡的BLEU值是針對一條翻譯（一個樣本）來說的。

NLTKnltk.align.bleu_score模組實現了這裡的公式，主要包括三個函式，兩個私有函式分別計算P和BP，一個函式整合計算BLEU值。

# 計算BLEU值
def bleu(candidate, references, weights)

# （1）私有函式，計算修正的n元精確率（Modified n-gram Precision）
def _modified_precision(candidate, references, n) # （2）私有函式，計算BP懲罰因子 def _brevity_penalty(candidate, references)

例子：

候選譯文（Predicted）：
It is a guide to action which ensures that the military always obeys the commands of the party

參考譯文（Gold Standard）
1：It is a guide to action that ensures that the military will forever heed Party commands
2：It is the guiding principle which guarantees the military forces always being under the command of the Party
3：It is the practical guide for the army always to heed the directions of the party

2. Modified n-gram Precision計算（也即是 $P_{n}$

def _modified_precision(candidate, references, n):
    counts = Counter(ngrams(candidate, n))

    if not counts: return 0 max_counts = {} for reference in references: reference_counts = Counter(ngrams(reference, n)) for ngram in counts: max_counts[ngram] = max(max_counts.get(ngram, 0), reference_counts[ngram]) clipped_counts = dict((ngram, min(count, max_counts[ngram])) for ngram, count in counts.items()) return sum(clipped_counts.values()) / sum(counts.values())

我們這裡 $n$

Modified 1-gram precision：

首先統計候選譯文裡每個詞出現的次數，然後統計每個詞在參考譯文中出現的次數，Max表示3個參考譯文中的最大值，Min表示候選譯文和Max兩個的最小值。

詞	候選譯文	參考譯文1	參考譯文2	參考譯文3	Max	Min
the	3	1	4	4	4	3
obeys	1	0	0	0	0	0
a	1	1	0	0	1	1
which	1	0	1	0	1	1
ensures	1	1	0	0	1	1
guide	1	1	0	1	1	1
always	1	0	1	1	1	1
is	1	1	1	1	1	1
of	1	0	1	1	1	1
to	1	1	0	1	1	1
commands	1	1	0	0	1	1
that	1	2	0	0	2	1
It	1	1	1	1	1	1
action	1	1	0	0	1	1
party	1	0	0	1	1	1
military	1	1	1	0	1	1

然後將每個詞的Min值相加，將候選譯文每個詞出現的次數相加，然後兩值相除即得 $P_{1} = \frac{3 + 0 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1}{3 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1} = 0.95$

類似可得：

Modified 2-gram precision：

詞	候選譯文	參考譯文1	參考譯文2	參考譯文3	Max	Min
ensures that	1	1	0	0	1	1
guide to	1	1	0	0	1	1
which ensures	1	0	0	0	0	0
obeys the	1	0	0	0	0	0
commands of	1	0	0	0	0	0
that the	1	1	0	0	1	1
a guide	1	1	0	0	1	1
of the	1	0	1	1	1	1
always obeys	1	0	0	0	0	0
the commands	1	0	0	0	0	0
to action	1	1	0	0	1	1
the party	1	0	0	1	1	1
is a	1	1	0	0	1	1
action which	1	0	0	0	0	0
It is	1	1	1	1	1	1
military always	1	0	0	0	0	0
the military	1	1	1	0	1	1

$P_{2} = \frac{10}{17} = 0.588235294$

Modified 3-gram precision：

詞	候選譯文	參考譯文1	參考譯文3	Max	Min
ensures that the	1	1	0	1	1
which ensures that	1	0	0	0	0
action which ensures	1	0	0	0	0
a guide to	1	1	0	1	1
military always obeys	1	0	0	0	0
the commands of	1	0	0	0	0
commands of the	1	0	0	0	0
to action which	1	0	0	0	0
the military always	1	0	0	0	0
obeys the commands	1	0	0	0	0
It is a	1	1	0	1	1
of the party	1	0	1	1	1
is a guide	1	1	0	1	1
that the military	1	1	0	1	1
always obeys the	1	0	0	0	0
guide to action	1	1	0	1	1

$P_{3} = \frac{7}{16} = 0.4375$

Modified 4-gram precision：

詞	候選譯文	參考譯文1	Max	Min
to action which ensures	1	0	0	0
action which ensures that	1	0	0	0
guide to action which	1	0	0	0
obeys the commands of	1	0	0	0
which ensures that the	1	0	0	0
commands of the party	1	0	0	0
ensures that the military	1	1	1	1
a guide to action	1	1	1	1
always obeys the commands	1	0	0	0
that the military always	1	0	0	0
the commands of the	1	0	0	0
the military always obeys	1	0	0	0
military always obeys the	1	0	0	0
is a guide to	1	1	1	1
It is a guide	1	1	1	1

$P_{4} = \frac{4}{15} = 0.266666667$

然後我們取 $w_{1} = w_{2} = w_{3} = w_{4} = 0.25$

所以：

$\sum_{i = 1}^{N} w_{n} \log P_{n} = 0.25 * \log P_{1} + 0.25 * \log P_{2} + 0.25 * \log P_{3} + 0.25 * \log P_{4} = - 0.684055269517$

3. Brevity Penalty 計算

def _brevity_penalty(candidate, references):

    c = len(candidate)
    ref_lens = (len(reference) for reference in references) #這裡有個知識點是Python中元組是可以比較的，如(0,1)>(1,0)返回False，這裡利用元組比較實現了選取參考翻譯中長度最接近候選翻譯的句子，當最接近的參考翻譯有多個時，選取最短的。例如候選翻譯長度是10，兩個參考翻譯長度分別為9和11，則r=9. r = min(ref_lens, key=lambda ref_len: (abs(ref_len - c), ref_len)) print 'r:',r if c > r: return 1 else: return math.exp(1 - r / c)

下面計算BP（Brevity Penalty），翻譯過來就是“過短懲罰”。由BP的公式可知取值範圍是(0,1]，候選句子越短，越接近0。

候選翻譯句子長度為18，參考翻譯分別為：16，18，16。
所以 $c = 18$

所以 $B P = e^{0} = 1$

4. 整合

最終 $B L E U = 1 \cdot e x p (- 0.684055269517) = 0.504566684006$

BLEU的取值範圍是[0,1]，0最差，1最好。

通過計算過程，我們可以看到，BLEU值其實也就是“改進版的n-gram”加上“過短懲罰因子”。

機器翻譯評價指標之BLEU詳細計算過程

原文連線 https://blog.csdn.net/guolindonggld/article/details/56966200

1. 簡介

2. Modified n-gram Precision計算（也即是 $P_{n}$

Modified 1-gram precision：

Modified 2-gram precision：

Modified 3-gram precision：

Modified 4-gram precision：

3. Brevity Penalty 計算

4. 整合

機器翻譯評價指標之BLEU詳細計算過程

BLEU機器翻譯評價指標學習筆記

關於機器翻譯評價指標BLEU(bilingual evaluation understudy)的直覺以及個人理解

機器翻譯評價指標

顯著性檢測(saliency detection)評價指標之KL散度距離Matlab代碼實現

圖像質量評價指標之 PSNR 和 SSIM

影象質量評價指標之 PSNR 和 SSIM

【NLP】文字生成評價指標：BLEU

sklearn實踐之——計算迴歸模型的四大評價指標（explained_variance_score、mean_absolute_error、mean_squared_error、r2_score）

詳細講解準確率、召回率和綜合評價指標

人體關鍵點評價指標---OKS計算

指標之計算字元的長度

【ML1】機器學習之EM演算法（含演算法詳細推導過程）

python 之醫學影象評價指標

物體交叉檢測---IoU評價指標的計算

搞懂迴歸和分類模型的評價指標的計算：混淆矩陣，ROC，AUC，KS，SSE，R-square，Adjusted R-Square

記憶體篇之程式記憶體消耗評價指標

SSE影象演算法優化系列二十六:和時間賽跑之優化高斯金字塔建立的計算過程。

來去學習之---KMP演算法--next計算過程

Servlet之Filter詳細講解

機器翻譯評價指標之BLEU詳細計算過程

原文連線 https://blog.csdn.net/guolindonggld/article/details/56966200

1. 簡介

2. Modified n-gram Precision計算（也即是PnPn）

Modified 1-gram precision：

Modified 2-gram precision：

Modified 3-gram precision：

Modified 4-gram precision：

3. Brevity Penalty 計算

4. 整合

相關推薦

2. Modified n-gram Precision計算（也即是 $P_{n}$