1. 程式人生 > >【電腦科學】【2016】單目視訊三維人體姿態估計的深度學習模型

【電腦科學】【2016】單目視訊三維人體姿態估計的深度學習模型

在這裡插入圖片描述

本文為立陶宛維爾紐斯格迪米納斯技術大學(作者:Agnė Grinciūnaitė)的碩士論文,共68頁。

有一種視覺系統,它可以很容易地識別、跟蹤人體的位置、運動和行為,而不需要任何額外的感知手段。這個系統擁有一個稱為大腦的處理器,只經過幾個月的訓練就能稱職地完成以上任務。通過更多的訓練,它能夠將獲得的技能應用於更復雜的任務,例如理解所觀察物件的個人態度、意圖和情感狀態。這個系統被稱為人類,是迄今為止對今天的人工智慧創造者最有啟發性的藝術品。

令人印象深刻的是,複雜計算機視覺和機器學習的實現是最近才通過應用各種深度學習方法獲得的。令人驚訝的是,深度神經網路如此之快地變得流行起來,不僅在研究界,而且在商業界也得到了廣泛的應用。卷積神經網路以相當大的優勢完成計算機視覺中的一些挑戰,因此吸引了每個人的注意,從而產生重大影響。這些網路是由已知的神經生理學和認知功能所需的特性所激發的。

本文的目的是從觀察者的角度探討卷積神經網路處理人類在時空中感知他人位置的能力。採用一種新的三維卷積方法,從單目攝像機捕獲的運動資料中提取有價值的特徵,並直接回歸到3D攝像機座標空間中的關節位置。研究表明,這種神經網路能夠在選定資料集上達到最先進的處理能力。所獲得的結果指出,改進的演算法實現可以用於真實世界的各類應用,如人機互動、增強和虛擬現實、機器人技術、監視、智慧家庭等。

There exists a visual system which caneasily recognize and track human body position, movements and actions withoutany additional sensing. This system has the processor called brain and it iscompetent after being trained for some months. With a little bit more trainingit is also able to apply acquired skills for more complicated tasks such asunderstanding inter-personal attitudes, intentions and emotional states of theobserved moving person. This system is called a human being and is so far themost inspirational piece of art for today’s artificial intelligence creators.The most impressive results of complex computer vision and machine learningtasks were recently achieved by applying various deep learning methods. It isamazing how fast deep neural networks became popular and broadly used not onlyin research community but also in commercial world. The major impact was madeby convolutional neural networks being able to beat some challenges in computervision by quite a big margin and attract everybody’s attention. These networksare motivated by the known neurophysiology of the brain and its functionalproperties required for cognition. The goal of this thesis is to explore thecapabilities of convolutional neural network to deal with easily manageabletask for human-beings - perceiving other human’s location in spacetime from theperspective of the viewer. New approach of incorporating 3D convolutions to extractvaluable features from motion data captured by monocular video camera anddirectly regress to joint positions in 3D camera coordinate space is used. Thisresearch shows the ability of such a network to achieve state of the artresults on selected dataset. The achieved results imply that improvedrealization could possibly be used in real-world applications such ashuman-computer interaction, augmented and virtual reality, robotics,surveillance, smart homes, etc.

1 引言
2 理論基礎
3 專案相關工作
4 資料集
5 三維卷積神經網路
6 實驗及結果
7 結論

下載英文原文地址:

http://page5.dfpan.com/fs/4l7cajc2c2411229163/

更多精彩文章請關注微訊號:在這裡插入圖片描述