1. 程式人生 > >對話系統評測任務:勒布納獎(The Loebner Prize)

對話系統評測任務:勒布納獎(The Loebner Prize)

The Loebner Prize is an annual competition in artificial intelligence that awards prizes to the computer programs considered by the judges to be the most human-like. The format of the competition is that of a standard Turing test. In each round, a human judge simultaneously holds textual conversations with a computer program and a human being via computer. Based upon the responses, the judge must decide which is which. [2]

勒布納獎(The Loebner Prize)[1]由紐約慈善家休·勒布納( Hugh Loebner, 1942-2016)於1990年設立,準備將大獎頒給第一臺通過圖靈測試的計算機。其目的為鼓勵對人工智慧的研究。時至今日,由於大獎依然沒能頒出,因此這一大賽也在繼續。

The Loebner Prize 2018

2018年的比賽分為兩個階段,海選階段與測試階段。 海選階段:共有12個系統參賽。每個參賽系統都要回答20個與往屆風格相似的問題,並至少回答2個維諾格拉德(T.Winograd)[3]式的應用了格語法的問題。此後,由人類專家就機器與人類的相似性進行評分,按照評分從高到低排序,選取前四名參加最終的測試。

The top four entries from the pool of entries that conform to the entry specifications will be selected as follows. Each entry will be provided with a set of 20 questions in English in a similar format to previous competitions, with at least 2 Winograd style questions. The responses from each of the AI systems will be recorded for this question set and then assessed for how human their responses are. The top 4 entries from this process will be entered into the finals of the competition at Bletchley Park.

測試階段:評測者由四個裁判組成。共有四輪測試,每個系統一輪。在每輪測試中,每個裁判通過電腦同時與兩個物件進行交流,其中一個物件是該輪對應的對話系統,而另一個物件則是一個真實的人。每個裁判根據最長25分鐘的問詢,判斷哪個物件是對話系統,哪個物件是人。如果有一個系統成功欺騙了半數以上的裁判,則該系統的創造者將會獲得銀獎;否則,裁判將會對對話過程進行評分,根據評分結果從高到低排序依次頒發獎項。

The contest consists of 4 rounds where in each round, the 4 judges will each interact with two entities using a computer terminal. One of these entities will be a human ‘confederate’ and the other an AI system. After 25 minutes of questioning the judge must decide which entity is the human and which is the AI. If a system can fool half the judges that it is human under these conditions, a solid Silver Medal will be awarded to the creator of that AI system. In the event that this doesn’t happen, prizes will be awarded to the creators of the AI system as follows in accordance with judges’ ranked scores: 1st place - a bronze medal and $4000 2nd place - $1500 3rd place - $1000 4th place - $500

The Loebner Prize 2017

2017年的比賽分為兩個階段:海選階段與測試階段。 海選階段:共有16個系統參賽,每個系統都要回答20個問題,專家對系統與人類的相似性評分,從高到低選出4個系統參加最終的測試。 測試階段:四個評委對每個系統的表現進行評分,表現與人最接近的為4分,接下來按照相似程度從高到低依次為3,2,1分,每個系統的最終分數為評委評分之和。

The Loebner Prize 2016

2016年比賽共有16個系統參加海選。海選的評分採用100分制,排位最高的4個系統參加最終的評分。最終的測試只確定排序而沒有具體的評分。

歷屆冠軍

歷屆獲獎者

Mitsuku

由Steve Worswick開發的Mitsuku[4]目前已經獲得了四個銅獎(銀獎、金獎未頒出過)。這一聊天機器人以AIML[5]為基礎,主要採用正則表示式匹配,同時可以進行簡單的推理[6]。 AIML在GitHub上的資源:rosie

Suzette, Rosette, Angela,Rose

由Bruce Wilcox開發的多個對話系統在該獎項中都曾獲得較好的名次,它們都以他開發的ChatScript為基礎[7]。ChatScript也是一種應用於對話系統開發的語言。 ChatScript在GitHub上的資源:ChatScript

小結

相對而言,勒布納獎是歷史比較悠久的對話系統測試。這一測試繼承了圖靈測試的形式,要求對話系統有特別的人物設定,比賽的目標是獎勵能夠成功扮演所設定的角色的系統。評價方面,主要由裁判對對話系統的扮演效果進行評分,沒有客觀、公開的標準。這一評測中取得較好成績的都是基於啟發式規則的對話系統。歷史上,這一評測促生了AIML、ChatScript等設計對話系統的框架,對於聊天機器人的發展有重要的作用。

參考內容:

[1] Loebner Prize [2] WikiPedia: Loebner Prize [3] SHRDLU 人機對話系統 [4] Mitsuku [5] AIML [6] WikiPedia: Mitsuku [7] Wilcox B, Wilcox S. Making it real: Loebner-winning chatbot design[J]. Arbor Ciencia Pensamiento Y Cultura, 2013, 189(764):a086. [8] ChatScript