視訊動作識別--Two-Stream Convolutional Networks for Action Recognition in Videos

阿新 • • 發佈：2018-12-01

Two-Stream Convolutional Networks for Action Recognition in Videos NIPS2014
http://www.robots.ox.ac.uk/~vgg/software/two_stream_action/

本文針對視訊中的動作分類問題，這裡使用兩個獨立的CNN來分開處理視訊中的空間資訊和時間資訊 spatial 和 tempal，然後我們再後融合 late fusion。 spatial stream 從視訊中的每一幀影象做動作識別，tempal stream 通過輸入稠密光流的運動資訊來識別動作。兩個 stream 都通過 CNN網路來完成。將時間和空間資訊分開來處理，就可以利用現成的資料庫來訓練這兩個網路。

2 Two-stream architecture for video recognition
視訊可以很自然的被分為空間部分和時間部分，空間部分主要對應單張影象中的 appearance，傳遞視訊中描述的場景和物體的相關資訊。時間部分對應連續幀的運動，包含物體和觀察者（相機）的運動資訊。
這裡寫圖片描述

Each stream is implemented using a deep ConvNet, softmax scores of which are combined by late fusion. We consider two fusion methods: averaging and training a multi-class linear SVM [6] on stacked L 2 -normalised softmax scores as features.

Spatial stream ConvNet：這就是對單張影象進行分類，我們可以使用最新的網路結構，在影象分類資料庫上預訓練

3 Optical flow ConvNets
the input to our model is formed by stacking optical flow displacement fields between several consecutive frames. Such input explicitly describes the motion between video frames, which makes the recognition easier
對於 Optical flow ConvNets 我們將若干連續幀影象對應的光流場輸入到 CNN中，這種顯示的運動資訊可以幫助動作分類。

這裡我們考慮基於光流輸入的變體：
3.1 ConvNet input configurations
Optical flow stacking. 這裡我們將光流的水平分量和垂直分量分別打包當做特徵圖輸入 CNN， The horizontal and vertical components of the vector field can be seen as image channels
這裡寫圖片描述

Trajectory stacking，作為另一種運動表達方式，我們可以將運動軌跡資訊輸入 CNN
這裡寫圖片描述

Bi-directional optical flow
雙向光流的計算

Mean flow subtraction：這算是一種輸入的歸一化了，將均值歸一化到 0
It is generally beneficial to perform zero-centering of the network input, as it allows the model to better exploit the rectification non-linearities
In our case, we consider a simpler approach: from each displacement field d we subtract its mean vector.

Individual ConvNets accuracy on UCF-101
這裡寫圖片描述

Temporal ConvNet accuracy on HMDB-51
這裡寫圖片描述

Two-stream ConvNet accuracy on UCF-101
這裡寫圖片描述

Mean accuracy (over three splits) on UCF-101 and HMDB-51
這裡寫圖片描述

--------------------- 本文來自 O天涯海閣O 的CSDN 部落格，全文地址請點選：https://blog.csdn.net/zhangjunhit/article/details/77991038?utm_source=copy

視訊動作識別--Two-Stream Convolutional Networks for Action Recognition in Videos

視訊動作識別--Two-Stream Convolutional Networks for Action Recognition in Videos

24.Two-Stream Convolutional Networks for Action Recognition in Videos

譯：Two-Stream Convolutional Networks for Action Recognition in Videos.md

[論文閱讀筆記]Two-Stream Convolutional Networks for Action Recognition in Videos

Two-Stream Convolutional Networks for Action Recognition in Videos演算法筆記

【CV論文閱讀】Two stream convolutional Networks for action recognition in Vedios

25.Two-Stream Convolutional Networks for Dynamic Saliency Prediction

[行為識別] Two –Stream CNN for Action Recognition in Videos

論文閱讀筆記二十四：Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition（SPPNet CVPR2014）

SPP-net(Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition)

【筆記】SPP-Net : Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

【論文閱讀】Long-Term Recurrent Convolutional Networks for Visual Recognition and Description

SPP(Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition)

long term recurrent convolutional networks for visual recognition and description

深度學習論文翻譯解析（九）：Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

【論文閱讀】Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition

VGGnet論文總結（VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION）

FCN筆記（Fully Convolutional Networks for Semantic Segmentation）

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNTION（翻譯）

深度學習在視訊動作識別中的應用

視訊動作識別--Two-Stream Convolutional Networks for Action Recognition in Videos

相關推薦