1. 程式人生 > >Teaching Machines to Understand Us 讓機器理解我們 之三 自然語言學習及深度學習的信仰

Teaching Machines to Understand Us 讓機器理解我們 之三 自然語言學習及深度學習的信仰

boa ping beats pen des pla 遇到 muc net

Language learning

自然語言學習

Facebook’s New York office is a three-minute stroll up Broadway from LeCun’s office at NYU, on two floors of a building constructed as a department store in the early 20th century. Workers are packed more densely into the open plan than they are at Facebook’s headquarters in Menlo Park, California, but they can still be seen gliding on articulated skateboards past notices for weekly beer pong. Almost half of LeCun’s team of leading AI researchers works here, with the rest at Facebook’s California campus or an office in Paris. Many of them are trying to make neural networks better at understanding language. “I’ve hired all the people working on this that I could,” says LeCun.

從LeCun在紐約大學的辦公室沿著百老匯往上走三分鐘,就到了Facebook的紐約辦公室,這棟建築在20世紀初建成,是一個百貨商店,辦公室就在建築的二樓。工作人員在開敞布置裏擠在一起,比Facebook在加利福尼亞Menlo公園的總部要擁擠,但仍然可以看到他們穿著溜冰鞋滑過每周啤酒聚會的告示。LeCun團隊的幾乎半數主要人工智能研究者都在這裏工作,剩下的在Facebook加利福尼亞園區或在巴黎的辦公室。他們中很多人都在努力使神經網絡更好的理解自然語言。LeCun說:“我已經雇傭了所有在方面有所工作的人。”

A neural network can “learn” words by spooling through text and calculating how each word it encounters could have been predicted from the words before or after it. By doing this, the software learns to represent every word as a vector that indicates its relationship to other words—a process that uncannily captures concepts in language. The difference between the vectors for “king” and “queen” is the same as for “husband” and “wife,” for example. The vectors for “paper” and “cardboard” are close together, and those for “large” and “big” are even closer.

神經網絡“學習”語言的方式是掃描這些詞語,計算遇到的每個詞語如何通過前面或後面的文字預測出來。這樣,軟件將每個詞語都表示成為一個向量,代表與其他詞語的關系,這是神秘的在語言中捕獲概念的過程。比如,“國王”與“王後”的向量之間的差別和“丈夫”與“妻子”之間的差別是一樣的。“紙張”與“硬紙板”的向量應當是很類似的,“large”與“big”的向量應當也是一樣的。

The same approach works for whole sentences (Hinton says it generates “thought vectors”), and Google is looking at using it to bolster its automatic translation service. A recent paper from researchers at a Chinese university and Microsoft’s Beijing lab used a version of the vector technique to make software that beats some humans on IQ-test questions requiring an understanding of synonyms, antonyms, and analogies.

對於整個句子,有相同的方法(Hinton說這產生“思想向量”),Google希望能將其用於支持自動翻譯服務上。最近一所中國大學和微軟北京研究院有一篇文章,用了這種向量技術制作了軟件,在一個需要理解同義詞、反義詞和類比的IQ測試裏,擊敗了一些人類參與者。

LeCun’s group is working on going further. “Language in itself is not that complicated,” he says. “What’s complicated is having a deep understanding of language and the world that gives you common sense. That’s what we’re really interested in building into machines.” LeCun means common sense as Aristotle used the term: the ability to understand basic physical reality. He wants a computer to grasp that the sentence “Yann picked up the bottle and walked out of the room” means the bottle left with him. Facebook’s researchers have invented a deep-learning system called a memory network that displays what may be the early stirrings of common sense.

LeCun小組的工作計劃更加長遠。他說:“語言本身沒有那麽復雜,復雜的是對語言和整個世界要有深入的理解,這會讓你擁有常識。這是我們真正感興趣的,可以集成到機器裏面去。”LeCun的常識的意思就像亞裏士多德所指的這個詞語的意思:理解基本物質現實的能力。他希望一臺計算機在理解這個句子“Yann拿起瓶子,走出房間”時,能知道這個瓶子跟著他也出了房間。Facebook的研究者已經創造了一個深度學習系統,稱為記憶網絡,這可能是常識的早起萌芽。

A memory network is a neural network with a memory bank bolted on to store facts it has learned so they don’t get washed away every time it takes in fresh data. The Facebook AI lab has created versions that can answer simple common-sense questions about text they have never seen before. For example, when researchers gave a memory network a very simplified summary of the plot of Lord of the Rings, it could answer questions such as “Where is the ring?” and “Where was Frodo before Mount Doom?” It could interpret the simple world described in the text despite having never previously encountered many of the names or objects, such as “Frodo” or “ring.”

記憶網絡是一個神經網絡,附帶一個記憶庫存,用來存儲學習到的事實,這樣當每次新數據來的時候,不會被沖刷掉。Facebook的人工智能實驗室已經開發了幾個版本,已經可以回答一些簡單的常識問題,這些文字是它們從來沒有看到的。比如,當研究者給記憶網絡一個非常簡化版本的《魔戒》的劇情,它可以回答像“戒指在哪裏?”和“Mount Doom之前Frodo在哪裏”這樣的問題。它可以解釋文字裏描述的簡單的世界,雖然之前從來沒有遇到過這些名字和物體,比如“Frodo”或“戒指”。

The software learned its rudimentary common sense by being shown how to answer questions about a simple text in which characters do things in a series of rooms, such as “Fred moved to the bedroom and Joe went to the kitchen.” But LeCun wants to expose the software to texts that are far better at capturing the complexity of life and the things a virtual assistant might need to do. A virtual concierge called Money-penny that Facebook is expected to release could be one source of that data. The assistant is said to be powered by a team of human operators who will help people do things like make restaurant reservations. LeCun’s team could have a memory network watch over Moneypenny’s shoulder before eventually letting it learn by interacting with humans for itself.

軟件是怎樣學到這些基本常識的呢?之前會給出如何回答簡單問題的示例,給出的簡單文本中要有角色在一系列空間中做事情的句子,比如“Frodo移到了臥室,Joe到了廚房”。但LeCun希望軟件接收的文本要復雜的多,比描述生活的復雜性還要復雜,或是接收一個虛擬助理需要做的事。Facebook希望推出的一個虛擬看門人,叫Money-penny,可以是這種數據的一個來源。這個助理由一個團隊進行維護,可以幫人做一些事,比如訂餐。LeCun團隊可以有一個記憶網絡接收Money-penny所做的事,當然是在讓它自己與人類互動學習之前。

Building something that can hold even a basic, narrowly focused conversation still requires significant work. For example, neural networks have shown only very simple reasoning, and researchers haven’t figured out how they might be taught to make plans, says LeCun. But results from the work that has been done with the technology so far leave him confident about where things are going. “The revolution is on the way,” he says.

做出能夠進行基本的話題很小的對話的算法也是需要大量工作的。比如,LeCun說,神經網絡只有簡單的推理能力,研究者還沒弄清楚怎樣教網絡去做計劃。但已經做過的工作得到的結果讓他對事情的進展感到有信心,他說:“革命正在路上。”

Some people are less sure. Deep-learning software so far has displayed only the simplest capabilities required for what we would recognize as conversation, says Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence in Seattle. The logic and planning capabilities still needed, he says, are very diferent from the things neural networks have been doing best: digesting sequences of pixels or acoustic waveforms to decide which image category or word they represent. “The problems of understanding natural language are not reducible in the same way,” he says.

一些人則沒那麽確定。Oren Etzioni是西雅圖艾倫人工智能研究所的CEO,他說,目前的深度學習軟件進行對話的能力只是最基本最簡單的,仍然需要的邏輯與計劃的能力,與神經網絡可以做的事非常不一樣:接收像素序列或語音波形來確定圖像屬於哪個類別,語音代表哪個字。他說:“理解自然語言的問題不能以同樣的方式進行簡化。”

Gary Marcus, a professor of psychology and neural science at NYU who has studied how humans learn language and recently started an artificial-intelligence company called Geometric Intelligence, thinks LeCun underestimates how hard it would be for existing software to pick up language and common sense. Training the software with large volumes of carefully annotated data is fine for getting it to sort images. But Marcus doubts it can acquire the trickier skills needed for language, where the meanings of words and complex sentences can flip depending on context. “People will look back on deep learning and say this is a really powerful technique—it’s the first time that AI became practical,” he says. “They’ll also say those things required a lot of data, and there were domains where people just never had enough.” Marcus thinks language may be one of those domains. For software to master conversation, it would need to learn more like a toddler who picks it up without explicit instruction, he suggests.

Gary Marcus是紐約大學一個心理學和神經科學的教授,研究過人類如何學習語言,最近成立了一個人工智能公司,名叫幾何智能,他認為LeCun低估了現有軟件學習語言和常識的難度。用大量仔細標註的數據對軟件進行訓練是可以對圖像進行分類的。但Marcus懷疑這對於語言這種需要更復雜技巧的問題是不夠的,在不同的上下文環境中,文字和復雜句子的意思可以完全不一樣。他說:“人們將來回望深度學習的時候,會說這確實是很強大的技術,這是人工智能第一次變得實用,人們也會說這些東西需要大量數據,總有一些領域人們永遠也不會有足夠的數據。”Marcus認為語言就是這樣一個領域。他認為,對於想掌握對話技巧的軟件,應當更像一個蹣跚學步的孩子在沒有明確指令的情況下去學習。

Deep belief

深度學習的信仰

At Facebook’s headquarters in California, the West Coast members of LeCun’s team sit close to Mark Zuckerberg and Mike Schroepfer, the company’s CTO. Facebook’s leaders know that LeCun’s group is still some way from building something you can talk to, but Schroepfer is already thinking about how to use it. The future Facebook he describes retrieves and coordinates information, like a butler you communicate with by typing or talking as you might with a human one.

在加利福尼亞的Facebook總部裏,LeCun團隊在西海岸的成員與紮克伯格和公司CTO Mike Schroepfer坐在一起。Facebook的領導者知道LeCun小組還正在構建可以對話的東西的過程中,但Schroepfer已經正在想如何去使用它了。他所描述的Facebook的未來能檢索和整合信息,就像正在與一個管家正在交流,通過打字或談話,並且與一個人類管家的能力應當類似。

“You can engage with a system that can really understand concepts and language at a much higher level,” says Schroepfer. He imagines being able to ask that you see a friend’s baby snapshots but not his jokes, for example. “I think in the near term a version of that is very realizable,” he says. As LeCun’s systems achieve better reasoning and planning abilities, he expects the conversation to get less one-sided. Facebook might offer up information that it thinks you’d like and ask what you thought of it. “Eventually it is like this super-intelligent helper that’s plugged in to all the information streams in the world,” says Schroepfer.

Schroepfer說:“你可以用上一個在更高層次真正理解概念和語言的系統。”比如,他設想系統當看到朋友的寶寶時會發問,而看到他的笑話時則不,他說:“我認為在近期其可行性是很高的。” 當LeCun的系統擁有了更好的推理和計劃的能力時,他希望對話不要那麽片面。Facebook可能會提供你可能會喜歡的信息,並問你認為怎樣。Schroepfer說:“最終它會像一個超級智能的幫手,連接著世界上所有的信息流。”

The algorithms needed to power such interactions would also improve the systems Facebook uses to filter the posts and ads we see. And they could be vital to Facebook’s ambitions to become much more than just a place to socialize. As Facebook begins to host articles and video on behalf of media and entertainment companies, for example, it will need better ways for people to manage information. Virtual assistants and other spinouts from LeCun’s work could also help Facebook’s more ambitious departures from its original business, such as the Oculus group working to make virtual reality into a mass-market technology.

支持這種連接的算法肯定也會幫助Facebook系統改進帖子和廣告的過濾。Facebook的誌向遠不止是一個進行社交的地方,這些技術對這個誌向是至關重要的。比如,當Facebook開始以媒體和娛樂公司提供文章和視頻時,需要以更好的方式管理信息。LeCun工作中的虛擬助理和其他方面也會幫助Facebook從最初的生意向遠方航行,比如Oculus小組正在使虛擬現實成為一個巨大市場的技術。

None of this will happen if the recent impressive results meet the fate of previous big ideas in artificial intelligence. Blooms of excitement around neural networks have withered twice already. But while complaining that other companies or researchers are over-hyping their work is one of LeCun’s favorite pastimes, he says there’s enough circumstantial evidence to stand firm behind his own predictions that deep learning will deliver impressive payoffs. The technology is still providing more accuracy and power in every area of AI where it has been applied, he says. New ideas are needed about how to apply it to language processing, but the still-small field is expanding fast as companies and universities dedicate more people to it. “That will accelerate progress,” says LeCun.

如果人工智能前一個宏大思想成為了現實,那麽這一切都不會發生了。關於神經網絡的興奮已經萎縮了兩次了。但在抱怨其他公司過度宣傳他們的工作時,他說深度學習已經像他語言那樣得到了足夠的回報,這項技術仍然在應用的每個領域都提供了更多的準確性和能量。需要新的想法將其應用在自然語言處理中,但這個仍然很小的領域正在快速膨脹,公司和大學都在投入更多的人,“這將會加速這個過程”,LeCun說。

It’s still not clear that deep learning can deliver anything like the information butler Facebook envisions. And even if it can, it’s hard to say how much the world really would benefit from it. But we may not have to wait long to find out. LeCun guesses that virtual helpers with a mastery of language unprecedented for software will be available in just two to five years. He expects that anyone who doubts deep learning’s ability to master language will be proved wrong even sooner. “There is the same phenomenon that we were observing just before 2012,” he says. “Things are starting to work, but the people doing more classical techniques are not convinced. Within a year or two it will be the end.”

深度學習是否能夠實現Facebook所預言的信息大管家的功能現在還不是很清除,即使可以,這個世界怎樣從中受益也不是很確定,但不需要多久我們就會發現結果。LeCun猜測具有語言功能的虛擬助手軟件在兩到五年內就會出現。他希望很快就能證明懷疑深度學習是否能掌握語言技能的人是錯誤的。他說:“在2012年前我們看到了相同的現象,算法正在起作用,但持傳統技術觀點的人還沒有被說服,一兩年後就會出現結果。”

Teaching Machines to Understand Us 讓機器理解我們 之三 自然語言學習及深度學習的信仰