1. 程式人生 > >NeuralTalk:一種基於Python+numpy使用語句描述影象的多模態遞迴神經網路的例程

NeuralTalk:一種基於Python+numpy使用語句描述影象的多模態遞迴神經網路的例程

NeuralTalk工程的流程如下:

The pipeline for the project looks as follows:

輸入資料使用Amazon Mechanical Turk收集的影象和5組語句描述的資料集。

The input is a dataset of images and 5 sentence descriptions that were collected with Amazon Mechanical Turk.

特別地,本程式碼基於Flickr8K, Flickr30K, MSCOCO資料集設計。

In particular, this code base is set up for Flickr8K, Flickr30K, and MSCOCO datasets.

在資料訓練階段,影象輸入到RNN,要求RNN根據當前單詞和上下文、通過神經網路的隱藏層預測語句中的單詞。

In the training stage, the images are fed as input to RNN and the RNN is asked to predict the words of the sentence, conditioned on the current word and previous context as mediated by the hidden layers of the neural network.

在此階段,利用反向傳播方法對網路的引數進行訓練。

In this stage, the parameters of the networks are trained with backpropagation.

在預測階段,將一組保留下來的影象傳遞給RNN,RNN每次預測生成一個單詞。

In the prediction stage, a witheld set of images is passed to RNN and the RNN generates the sentence one word at a time.

預測結果採用BLEU評分進行評估。

The results are evaluated with BLEU score.

該程式碼還包括用於在HTML中視覺化處理結果的實用工具。

The code also includes utilities for visualizing the results in HTML.

本程式碼的測試環境為Ubuntu 12.04,Python 2.7。

程式碼下載地址:

http://page5.dfpan.com/fs/4lcjb221e291b62f835/

更多精彩文章請關注微訊號:在這裡插入圖片描述