1. 程式人生 > >CVPR 2018摘要:第三部分

CVPR 2018摘要:第三部分

標題 NeuroNuggets: CVPR 2018 in Review, Part III CVPR 2018摘要:第三部分 by 啦啦啦 2 01

NeuroNuggets: CVPR 2018 in Review, Part III

The CVPR 2018 (Computer Vision and Pattern Recognition) conference is long over, but we can’t stop reviewing its wonderful papers; today, Part III is upon us! In the 

first part, we briefly reviewed the most interesting papers on GANs for computer vision from CVPR 2018; in the second part, added a human touch and talked about pose estimation and tracking for humans. Today, we turn to one of the main focal point of our own internal research here at Neuromation
synthetic data. As usual, the papers are in no particular order, and our reviews are very brief, so we definitely recommend to read the papers in full.

Synthetic data: imitate to learn

Synthetic data means data that has been generated artificially, either through 3D modeling and rendering (as usual for computer vision) or by other means, and then used to train machine learning models. Synthetic data is a surprising topic in machine learning, and the most surprising thing is for how long it had been mostly neglected. Some works on synthetic data can be traced to the 2000s, but before 2016 it basically attracted no interest at all. The only field where it had been used was to train self-driving cars, where the need for simulated environments and the impossibility to collect real datasets come together and make it the perfect situation for synthetic datasets.

Now the interest is rapidly growing: we now have the SUNCG dataset of simulated indoor environments, outdoor environments for driving and navigation, the SURREAL dataset of synthetic humans to learn pose estimation and tracking, and even recent works that apply GANs to generate and refine synthetic data (we hope to get back to this and explain how it works later). So let us see what CVPR 2018 authors have to say about synthetic data. Since this is our main focus, we will consider the works on synthetic data in slightly more detail than usual.

NeuroNuggets:CVPR 2018年回顧,第三部分

CVPR 2018(計算機視覺和模式識別)會議已經結束,但我們不能停止回顧其精彩的論文; 今天,我們學習第三部分。在第一部分中,我們簡要回顧了2018年CVPR中關於計算機視覺GAN的最有趣的論文; 在第二部分中,增加了人性化,並談到了人類的姿勢估計和跟蹤。 今天,我們轉向Neuromation中內部研究的主要焦點之一:合成數據。 像往常一樣,論文沒有特別的順序,我們的評論非常簡短,所以我們絕對建議完整閱讀論文。

合成數據:模仿學習

合成數據是指通過3D建模和渲染(通常用於計算機視覺)或通過其他方式人工生成的資料,然後用於訓練機器學習模型。 合成數據在機器學習中是一個令人驚訝的主題,最令人驚訝的是它被忽略了多長時間。 有關合成資料的一些著作可以追溯到2000年代,但在2016年之前它基本上沒有引起任何興趣。 它所使用的唯一領域是訓練自動駕駛汽車,對模擬環境的需求和收集真實資料集的不可能性使其成為合成數據集的完美情況。

現在興趣正在迅速增長:我們現在擁有模擬室內環境的SUNCG資料集,用於駕駛和導航的室外環境,用於學習姿勢估計和跟蹤的合成人的SURREAL資料集,以及甚至最近應用GAN來生成和改進合成的資料(我們希望回到這一點並解釋它之後的工作原理)。 那麼讓我們看看CVPR 2018作者對合成資料的看法。 由於這是我們的主要關注點,因此我們將比通常更詳細地考慮合成數據的工作。


by 老趙 2 02

Generating Synthetic Data from GANs: Augmentation and Adaptation in Feature Space

R. Volpi et al., Adversarial Feature Augmentation for Unsupervised Domain Adaptation
S. Sankaranarayanan et al., Generate To Adapt: Aligning Domains using Generative Adversarial Networks



There is a very interesting and promising field of using GANs to produce synthetic datasets to train other models. On the surface it makes little sense: if you have enough data to train a GAN, why not just use it to train the model? Or even better, if you have a trained GAN why don’t you just take the discriminator and use it for your problem?

But this idea becomes much more interesting in the domain adaptationsetting. Suppose you have a large source dataset and a small target dataset, and you need to use a model trained on the source dataset for the target, which might be completely unlabeled. Here adversarial domain adaptationtechniques train two networks, a generator and a discriminator, and use it to ensure that the network cannot distingush between the data distributions in the source and target datasets. This field was started in the ICML 2015 paper by Ganin and Lempitsky, where the discriminator is used to ensure that the features stay domain-invariant:



從GAN生成合成資料:特徵空間中的增強和自適應

R. Volpi等人,無監督域適應的對抗特徵增強
S. Sankaranarayanan等人,生成適應:對齊域使用生成性對抗網路




有一個非常有趣和有前途的領域,即使用GAN生成合成資料集來訓練其他模型。 從表面上看,沒有多大意義:如果你有足夠的資料訓練GAN,為什麼不用它來訓練模型呢? 或者甚至更好,如果你有一個訓練有素的GAN,你為什麼不採取鑑別器並將它用於你的問題?

但是這個想法在自適應域設定中變得更加有趣。 假設你有一個大的源資料集和一個小的目標資料集,並且需要使用針對目標的源資料集訓練的模型,該模型可能完全沒有標記。 這裡,對抗域適應技術訓練兩個網路,一個生成器和一個鑑別器,並用它來確保網路不能在源資料集和目標資料集中的資料分佈之間進行壓縮。 這個領域是在Ganin和Lempitsky的ICML2015論文中開始的,其中使用鑑別器來確保這些特徵保持域不變:




by 老趙 2 03

And here is a schematic depiction of how this idea was slightly generalized in the Adversarial Discriminative Domain Adaptation paper from 2017:



In the CVPR 2018 paper by Volpi et al., researchers from Italy and Stanford made the adversarial training work not on the original images but rather in the feature space itself. The GAN operated on features extracted by a pretrained network, which makes it possible to achieve better domain invariance and ultimately improve the quality of domain adaptation. Here is the overall training procedure as it was adapted by Volpi et al.:



Another approach in the same vein was presented in CVPR 2018 by Sankaranarayanan et al., researchers from the University of Maryland. They use GANs to leverage unsupervised data to bring the source and target distributions closer to each other in the feature space. Basically, the idea is to use the discriminator to control that images generated from an embedding remain realistic images for the source distribution even when the embedding was taken from a sample from the target distribution. Here is how it works, and, again, the authors report improved domain adaptation results:



以下是2017年對抗性判別領域適應論文中這一想法如何略微概括的示意圖:




在Volpi等人的CVPR 2018論文中,來自義大利和斯坦福的研究人員使對抗訓練不是在原始影象上,而是在特徵空間本身。 GAN對預訓練網路提取的特徵進行操作,這使得有可能實現更好的域不變性並最終提高域適應的質量。 以下是Volpi等人改編的整體培訓程式:




另一種方法是由Sankaranarayanan等人在馬里蘭大學的研究人員在2018年的CVPR中提出的。 他們使用GAN來利用無監督資料,使源和目標分佈在特徵空間中彼此更接近。 基本上,該想法是使用鑑別器來控制從嵌入產生的影象保持用於源分佈的真實影象,即使嵌入是從目標分佈的樣本中獲取的。 以下是它的工作原理,作者再次報告了改進的域適應結果:




by 老趙 2 04

How Well Should You Label? A Study of Label Quality

A. Zlateski et al., On the Importance of Label Quality for Semantic Segmentation



One of the main selling points of synthetic data has always been the pixel-perfect quality of labeling that you can easily achieve with synthetic data. A synthetic scene always comes with perfect segmentation — but just how important is it? The authors of this work studied how fine (or how coarsely) you have to label your training set to get good segmentation quality from modern convolutional architectures… and, of course, what better tool to perform this study than synthetic scenes.

The authors used their specially developed Auto City dataset:



你應該如何標記? 標籤質量研究

A. Zlateski等,關於標籤質量對語義分割的重要性




合成數據的主要賣點之一始終是畫素完美的標籤質量,你可以使用合成數據輕鬆實現。 合成場景總是帶有完美的分割 - 但它有多重要? 這項工作的作者研究瞭如何精確(或多麼粗略地)標記你的訓練集以從現代卷積體系結構中獲得良好的分割質量......當然,與合成場景相比,執行此研究的工具更好。

作者使用他們專門開發的Auto City資料集:



by 老趙 2 05

And in their experiments, the authors showed that the final segmentation quality, unsurprisingly, is indeed strongly correlated with the amount of time spent to produce the labels… but not so much with the quality of each individual label. This suggests that it is better to produce lots of coarse labels (say, with crowdsourcing) than to perform strict quality control for every label.



Soccer on Your Tabletop

K.Rematas et al., Soccer on Your Tabletop



Here at Neuromation, we love soccer (yes, the World Cup in Russia cost us a lot of work hours), and this research is just soooooooo cool. The authors present a system that can take a video stream of a soccer game and transform it… into a moving 3D reconstruction that can be projected onto your tabletop and viewed with an augmented reality device!

The system extracts bounding boxes of the players, analyzes the human figures with pose and depth estimation models and produces a quite accurate 3D scene reconstruction. Note how training a model specifically for the soccer domain really improves the results:



It additionally warms our hearts that they actually trained on synthetic data extracted from FIFA games! And the results are simply very cool all around:



在他們的實驗中,作者表明,毫不奇怪,最終的分割質量確實與生產標籤所花費的時間量密切相關,但與每個單獨標籤的質量無關。 這表明,生產大量粗標籤(例如,使用眾包)比對每個標籤執行嚴格的質量控制更好。




你的桌面遊戲裡的足球

K.Rematas等人,桌面遊戲裡的足球




在Neuromation,我們喜歡足球(是的,俄羅斯世界盃花了我們很多工作時間),這項研究真的太酷了。 作者提出了一個系統,可以拍攝足球比賽的視訊流並將其轉換為移動的3D重建,可以投影到桌面上並使用增強現實裝置進行觀看。

系統提取玩家的邊界框,使用姿勢和深度估計模型分析人物圖形併產生非常精確的3D場景重建。 請注意,專門針對足球領域的模型培訓如何真正改善結果:




它還激勵我們的心,他們實際上訓練從FIFA遊戲中提取的合成數據。 而且結果非常酷。




by 老趙 2 06

But wait, there is more…

Thank you for your attention! Next time we might take an even more detailed look at some of the CVPR 2018 papers regarding synthetic data and domain adaptation. Until then!

Sergey Nikolenko
Chief Research Officer, 
Neuromation

Aleksey Artamonov
Senior Researcher, 
Neuromation


還有更多......

感謝您的關注。 下次我們可以更詳細地瞭解一些關於合成數據和域適應的CVPR 2018論文。

Sergey Nikolenko
Chief Research Officer, 
Neuromation

Aleksey Artamonov
Senior Researcher, 
Neuromation


by 老趙 2