Embed,encode,attend,predict:the new deep learning formula for state-of-the -art NLP models

阿新 • • 發佈：2018-12-20

轉載來自：https://explosion.ai/blog/deep-learning-formula-nlp 在過去六個月，一種強大的新型神經網路工具出現應用於自然語言處理。新型的方法可以總結為四步驟：嵌入（embed），編碼（encode），加入（attend），預測（predict）。本篇文章解釋了新方法的各個部分。並在最近的兩個系統中展示如何使用。當人們思考機器學習的改善時，他們通常會考慮效率與準確率，但是這最重要的問題是通用性。比如，如果你想要在社交平臺上編寫程式來檢測帶有辱罵資訊，你應該能夠問題延伸為需要文字來預測類別的問題。因此不管是檢測帶有辱罵資訊的帖子還是辨明垃圾郵件，如果這兩種問題採用同樣型別的輸入，和輸出同樣型別的輸出。我們可以採用相同的模型程式碼，並且可以通過在不同的資料資料上得到不同的解決問題方式。——採用相同的工程演繹不同的問題。

基於深度學習的四步法處理文字

詞嵌入表示，即詞向量是現在自然語言處理最流行的方法。詞嵌入可以將分離的詞當做有關聯的單元，而不是完全不同的ID。可是大多數自然語言處理問題要求理解很長的文字，不單單是獨立的詞語。在將文字嵌入一個詞向量後，雙向RNN用來編碼向量為詞矩陣。這個矩陣可以理解為詞向量。詞向量對於分詞文字的上下文很敏感。最後讓人疑惑的是一個加入機制，是的你的句子矩陣轉換為句子向量，來準備預測。

1：詞嵌入（embed）

一個嵌入表可以將一個很長，稀疏的，二進位制向量表示為很短，高位，具有連續的詞向量，比如，我們火車的文字是一組ASCII類別的序列，將會有256個可能的值，因此我們需要將每一個值用二進位制來表示需要256維。對於a來說，只有屬於97的值會是1，其他值為0 。a=0000…1…0000.（256） b表示為在98位置為1，其餘為0.。這種方式叫做one-hot編碼，不同值代表不同的向量。在這裡插入圖片描述

大多數神經網路模型開始與將本文分詞為詞語，然後嵌入這些詞語到詞向量裡面，奇特的模型將詞向量延伸到其他資訊裡面。比如建立一個詞袋通常是非常有用的。除了word IDS 你可以學習標籤嵌入，將標籤嵌入連線到單詞嵌入。這使得你將一些有用的位置敏感的資訊加入詞表示中。可是，這會有很多強有力的方法使得詞表達特定的上下文。

2：詞編碼（Encode）

給定一個詞向量序列，彪馬這部分可以計算一個城市為句子矩陣的表示方式。其中每一行表示句子其他部分上下文中的每個標記含義。在這裡插入圖片描述這個技術可以用bidirectional RNN來實現，LSTM和GRU體系結構已經顯示了很好的效果。每一個標註都可以用兩部分來計算，一部分是通過前向傳播，另一個是後向傳播。為了得到全部的詞向量，我們簡單的將兩個貼上一起，下面是簡單的程式碼：

def encode(fwd_rnn, bwd_rnn, word_vectors):
    fwd_out = ndarray((len(word_vectors), fwd_rnn.nr_hidden), dtype='float32')
    bwd_out = ndarray((len(word_vectors), bwd_rnn.nr_hidden), dtype='float32')
    fwd_state = fwd_rnn.initial_state()
    bwd_state = bwd_rnn.initial_state()
    for i in range(len(word_vectors)):
        fwd_state = fwd_rnn(word_vectors[i], fwd_state)
        bwd_state = bwd_rnn(word_vectors[-(i+1)], bwd_state)
        fwd_out[i] = fwd_state
        bwd_out[-(i+1)] = bwd_state
    return concatenate([fwd_state, bwd_state])

我認為雙向RNN將是一個隨著時間而變得更加具有洞察力、更有效。可是RNN大部分的直接應用是讀取文字，然後從文字中預測資訊。編碼（encode）所做的功能是計算一箇中間表示方式，特別是每一個標註特徵。重要的是這個表示我們可以通過在上下文的標記來反向表達。我們可以學習到詞語 “pick up” 與詞語“pick on”的不同。甚至我們可以通過分離標記來處理這兩個詞語。這是NLP模型最大的挪點，但是現在我們解決了。

3：載入（Attend）

載入（attend）步驟將編碼（encode）步驟產生的矩陣表示減少到單個向量。以至於可以通過一個標準的前向傳播網路預測。載入步驟區別於其他類似操作的機制典型的優點是作為輔助上下文向量的輸入。在這裡插入圖片描述通過減少矩陣到向量，你必然會丟失資訊。這就是為什麼上下文向量是至關重要的：它告訴你丟棄哪些資訊。因此，“摘要”向量是適合於網路來計算它。最近的研究表明，載入機制是一種靈活的技術，它的新變體可以用來建立優雅和強大的解決方案。例如，Palikh等人。（2016）引入兩個句子矩陣的載入機制，並輸出一個向量。在這裡插入圖片描述楊等人。（2016）引入一個載入機制，它採用一個矩陣並輸出一個向量。而不是從輸入的某個方面派生的上下文向量。“摘要”是參照作為模型引數學習的上下文向量來計算的。這使得注載入機制是純粹的還原操作，它可以用來代替任何和或平均池步驟。

4：預測（Predict）

一旦文字或文字對被簡化為單個向量，我們就可以學習目標表示——類標籤、實值、向量等。我們也可以通過使用網路作為狀態機的控制器，例如基於轉換的解析器來進行結構化預測。在這裡插入圖片描述有趣的是，大多數NLP模型通常支援較淺的前饋網路。這意味著一些最新的計算機視覺最重要的技術，如剩餘連線和批量標準化，到目前為止對NLP社群的影響相對較小。

Embed,encode,attend,predict:the new deep learning formula for state-of-the -art NLP models

轉載來自：https://explosion.ai/blog/deep-learning-formula-nlp 在過去六個月，一種強大的新型神經網路工具出現應用於自然語言處理。新型的方法可以總結為四步驟：嵌入（embed），編碼（encode），加入（atte

最新自然語言處理(NLP)四步流程：Embed->Encode->Attend->Predict

過去半年以來，自然語言處理領域進化出了一件神器。此神器乃是深度神經網路的一種新模式，該模式分為：embed、encode、attend、predict四部分。本文將對這四個部分娓娓道來，並且剖析它在兩個例項中的用法。人們在談論機器學習帶來的提升時，往往只想到了機器在效率和

自然語言處理(NLP)四步流程：Embed->Encode->Attend->Predict

過去半年以來，自然語言處理領域進化出了一件神器。此神器乃是深度神經網路的一種新模式，該模式分為：embed、encode、attend、predict四部分。本文將對這四個部分娓娓道來，並且剖析它在兩個例項中的用法。人們在談論機器學習帶來的提升時，往往只想到了機器在效

Let's draw! New deep learning technique for realistic caricature art: Data

A team of computer scientists from City University of Hong Kong and Microsoft, have developed an innovative deep learning-based approach to automatically

Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study

Non-contrast head CT scan is the current standard for initial imaging of patients with head trauma or stroke symptoms. We aimed to develop and validate a s

Head for State of The Art

一個簡單的python socket程式設計一、套接字套接字是為特定網路協議（例如TCP/IP，ICMP/IP，UDP/IP等）套件對上的網路應用程式提供者提供當前可移植標準的物件。它們允許程式接受並進行連線，如傳送和接受資料。為了建立通訊通道，網路通訊的每個端點擁有一個套接字物件極為重要。套接字為BS

The Other Deep Learning Data Problem: Even Good Data Isn't Enough, Algorithms Must Be Trustworthy

In the early days of computing, there was an acronym: GIGO. It stands for Garbage In, Garbage Out. The few people in the mainframe industry understood that

PyTorch 1.0 preview now available in Amazon SageMaker and the AWS Deep Learning AMIs

Amazon SageMaker and the AWS Deep Learning AMIs (DLAMI) now provide an easy way to evaluate the PyTorch 1.0 preview release. PyTorch 1.0 adds seam

6 Reasons I loved the 2018 Deep Learning Indaba

#3 The Opportunities to Discuss Research and Get FeedbackWhile it is just impossible to talk to everyone, the organization and structure of the conference

Regression Tutorial with the Keras Deep Learning Library in Python

Tweet Share Share Google Plus Keras is a deep learning library that wraps the efficient numerica

New AWS Deep Learning AMIs for Machine Learning Practitioners

We’re excited to announce the availability of two new versions of the AWS Deep Learning AMI. The first is a Conda-based AMI with separate Python e

Get Started with Deep Learning Using the AWS Deep Learning AMI

Whether you’re new to deep learning or want to build advanced deep learning projects in the cloud, it’s easy to get started by using AWS.

#讀原始碼+論文# 三維點雲分割Deep Learning Based Semantic Labelling of 3D Point Cloud in Visual SLAM

from Deep Learning Based Semantic Labelling of 3D Point Cloud in Visual SLAM 超體素方法進行預分割，將點雲根據相似性變成表層面片（surface patches）降低計算複雜度。

SenseGen: A Deep Learning Architecture for Synthetic Sensor Data Generation論文解讀

一、論文概述 SenseGen這篇論文是17年發表在PerCom Workshops上的一篇論文，來自加州大學洛杉磯分校（University of California at Los Aneles，UCLA）網路與嵌入式系統實驗室（Netoworked & Embedded Syste

《PCANet: A Simple Deep Learning Baseline for Image Classification》

對照論文中的示例圖和文章給出的程式碼來梳理從圖中看到，整個網路有三個關鍵步驟，Patch-mean removal 、 PCA filter convolution與Binary quantization &mapping ，分別是區域性均值化、

PCANet: A Simple Deep Learning Baseline for Image Classification?--名詞解釋

1 上取樣與下采樣縮小影象（或稱為下采樣（subsampled）或降取樣（downsampled））的主要目的有兩個：使得影象符合顯示區域的大小生成對應影象的縮圖下采樣原理：對於一幅影象I尺寸為M*N，對其進行s倍下采樣，即得到(M/s)*(N/s)尺寸的得解析度影象，當然s應該是

BigGAN: A New State of the Art in Image Synthesis

“Best GAN samples ever yet? Very impressive ICLR submission! BigGAN improves Inception Scores by >100.”The above Tweet is from renowned Google DeepMind

Ranking Popular Deep Learning Libraries for Data Science

Much of our curriculum is based on feedback from corporate and government partners about the technologies they are using and learning. In addition to their

Deep Learning Courses For NLP Market Research Report 2018 by Coursera, Stanford University, Udemy , UpX Academy, Class Central,

Deep learning process for the NLP market confirms that increasing applicability in customer-centric organizations is one of the key factors that can positi

docker to create awesome Deep Learning Environments for R (or Python) PT I | AITopics

How long does it take you to install your complete GPU-enabled deep learning environment including RStudio or jupyter and all your packages? And do you hav

Embed,encode,attend,predict:the new deep learning formula for state-of-the -art NLP models

基於深度學習的四步法處理文字

1：詞嵌入（embed）

2： 詞編碼（Encode）

3：載入（Attend）

4：預測（Predict）

相關推薦

2：詞編碼（Encode）