CVPR 2018摘要：第五部分

阿新 • • 發佈：2018-11-26

標題 What’s In a Face (CVPR in Review V) CVPR 2018摘要：第五部分 by 啦啦啦 2 01

What’s In a Face (CVPR in Review V)

I have said that she had no face; but that meant she had a thousand faces…

― C.S. Lewis, Till We Have Faces

Today we present to you another installment where we dive into the details about a few papers from the CVPR 2018 (Computer Vision and Pattern Recognition) conference. We’ve had four already: about GANs for computer vision, about pose estimation and tracking for humans, about synthetic data

, and, finally, about domain adaptation. In particular, in the fourth part we presented three papers on the same topic that had actually numerically comparable results.

Today, we turn to a different problem that also warrants a detailed comparison. We will talk about face generation, that is, about synthesizing a realistic picture of a human face, either from scratch or by changing some features of a real photo. Actually, we already touched upon this problem a while ago, in our

first post about GANs. But since then, generative adversarial networks (GANs) have been one of the very hottest topics in machine learning, and it is no wonder that new advances await us today. And again, it is my great pleasure to introduce Anastasia Gaydashenko with whom we have co-authored this text.

人臉有什麼（ CVPR 摘要第五部分）

我說她沒有面孔; 但那意味著她有一千個面孔......
- C.S. Lewis，直到我們面對面

今天我們向你介紹另一部分，我們將深入瞭解CVPR 2018（計算機視覺和模式識別）會議的一些論文的細節。我們已經有四個：關於計算機視覺的GAN，關於人類的姿勢估計和跟蹤，關於合成數據，以及最後關於域適應。特別在第四部分中，我們提出了三篇關於同一主題的論文，這些論文實際具有數字可比性。

今天，我們轉向一個不同的問題，也需要進行詳細的比較。我們將討論面部生成，即從頭開始或通過改變真實照片的某些特徵來合成人臉的真實影象。實際上，我們剛剛在關於GAN的第一篇文章中已經觸及了這個問題。但從那時起，生成對抗網路（GAN）一直是機器學習中最熱門的話題之一，難怪今天有新的進步等待著我們。再次，我很高興介紹Anastasia Gaydashenko，我們與他們共同撰寫了這篇文章。

by 老趙 2 02

GANs for Face Synthesis and the Importance of Loss Functions

We have already spoken many times about how important a model’s architecture and a good dataset are for deep learning. In this post, one recurrent theme will be the meaning and importance of loss functions, that is, the functions that a neural network actually represents. One could argue that the loss function is a part of the architecture, but in practice we usually think about them separately; e.g., the same basic architecture could serve a wide variety of loss functions with only minor changes, and that is something we will see today.

We chose these particular papers because we liked them best, but also because they are all using GANs and are all using them to modify pictures of faces while preserving the person’s identity. This is a well-established application of GANs; classical papers such as ADD used it to predict how a person changes with age or how he or she would look like if they had a different gender. The papers that we consider today bring this line of research one step further, parceling out certain parts of a person’s appearance (e.g., makeup or emotions) in such a way that it can become subject to manipulations.

Thus, in a way all of today’s papers are also solving the same problem and might be comparable with each other. The problem, though, is that the true evaluation of a model’s results basically could be done only by a human: you need to judge how realistic the new picture looks like. And in our case, the specific tasks and datasets are somewhat different too, so we will not have a direct comparison of the results, but instead we will extract and compare new interesting ideas.

On to the papers!

合成面部的GAN和損失函式的重要性

我們已經多次談到模型架構和良好的資料集對深度學習的重要性。在這篇文章中，一個反覆出現的主題將是損失函式的意義和重要性，即神經網路實際代表的函式。有人可能會說損失函式是架構的一部分，但在實踐中我們通常會分開考慮它們; 例如，相同的基本架構可以提供各種各樣的損失函式，只需要很小的改動，這就是我們今天將要看到的。

我們之所以選擇這些特別的論文，不僅是因為我們最喜歡它們，還因為它們都使用GAN，並且都在使用它們來修改面部圖片，同時保留了人的身份。這是GAN的成熟應用; 像ADD這樣的經典論文用它來預測一個人如果隨著年齡的變化而變化，或者如果他們擁有不同的性別，他們會是怎樣的面部。我們今天考慮的論文使這一系列研究更進了一步，以一種可能受到操縱的方式將一個人的外表（例如，化妝或情緒）的某些部分分開。

因此在某種程度上，今天的所有論文也解決了同樣的問題，並且可能相互比較。但問題是，對模型結果的真實評估基本上只能由人來完成：你需要判斷新圖片的真實程度。在我們的例子中，具體任務和資料集也有所不同，因此我們不會直接比較結果，而是提取和比較新的有趣想法。

一起看論文吧。

by 老趙 2 03

Towards Open-Set Identity Preserving Face Synthesis

The authors of the first paper, a joint work of researchers from the University of Science and Technology of China and Microsoft Research (full pdf), aim to disentangle identity and attributes from a single face image. The idea is to decompose a face’s representation into “identity” and “attributes” in such a way that identity corresponds to the person, and attributes correspond to basically everything that could be modified while still preserving identity. Then, using this extracted identity, we can add attributes extracted from a different face. Like this:

Fascinating, right? Let’s investigate how do they do it. There are quite a few novel interesting tricks in the paper, but the main contribution of this work is a new GAN-based architecture:

面向開集身份保持人臉合成

第一篇論文的作者，來自中國科學技術大學和微軟研究院研究人員的共同工作（完整pdf），旨在從單個面部影象中解開身份和屬性。這個想法是將一個面部的表示分解為“身份”和“屬性”，使身份對應人，屬性基本上對應於在保留身份的同時可以修改的所有內容。然後，使用提取的身份標識，我們可以新增從不同面部提取的屬性。像這樣：

很有意思吧？我們來研究他們是如何做到的。本文中有許多有趣的小技巧，但這項工作的主要貢獻是一個新的基於GAN的架構：

by 老趙 2 04

Here the network takes as input two pictures: the identity picture and the attributes picture that will serve as the source for everything except the person’s identity: pose, emotion, illumination, and even the background.

The main components of this architecture include:

identity encoder I that produces a latent representation (embedding) of the identity input xˢ;
attributes encoder A that does the same for the attributes input xᵃ;
mixed picture generator G that takes as input both embeddings (concatenated) and produces the picture x’ that is supposed to mix the identity of xˢ and the attributes of xᵃ;
identity classifier C checks whether the person in the generated picture x’ is indeed the same as in xˢ;
discriminator D that tries to distinguish real and generated examples to improve generator performance, in the usual GAN fashion.

This is the structure of the model used for training; when all components have been trained, for generation itself it suffices to use only the part inside the dotted line, so the networks C and D are only included in the training phase.

在這裡，網路將兩張圖片作為輸入：身份圖片，和作為除了人的身份之外的所有事物的來源：姿勢，情感，照明，甚至背景的屬性圖片。

該架構的主要組成部分包括：

身份編碼器 I 產生身份輸入 xˢ 的潛在表示（嵌入）;
屬性編碼器 A 對輸入 xᵃ 的屬性執行相同的操作;
混合影象生成器 G 將嵌入（連線）作為輸入併產生影象 x'，該影象應該混合 xˢ 的身份和 xᵃ 的屬性;
身份分類器 C 檢查生成的圖片 x' 中的人是否確實與 xˢ 中的人相同;
鑑別器 D 試圖區分真實和生成的例子，以通常的GAN方式提高生成器效能。

這是用於訓練的模型結構; 當所有成分都經過訓練時，為了生成本身，只需使用虛線內的部分就足夠了，因此網路 C 和 D 僅包含在訓練階段。

by 老趙 2 05

The main problem, of course, is how to disentangle identity from attributes. How can we tell the network what it should take from xˢ and what from xᵃ? The architecture outlined above does not answer this question by itself, the main work here is done by a careful selection of loss functions. There are quite a few of them; let us review them one by one. The NeuroNugget format does not allow for too many formulas, so we will try to capture the meaning of each part of the loss function:

the most straightforward part is the softmax classification loss Lᵢ that trains identity encoder I to recognize the identity of people shown on the photos; basically, we train I to serve as a person classifier and then use the last layer of this network as features fᵢ(xs);
the reconstruction loss Lᵣ is more interesting; we would like the result x’ to reconstruct the original image xᵃ anyway but there are two distinct cases here:
if the person on image xᵃ is the same as on the identity image xs, there is no question what we should do: we should reconstruct xᵃ as exactly as possible;
and if xᵃ and xˢ show two different people (we know all identities on the supervised training phase), we also want to reconstruct xa but with a lower penalty for “errors” (10 times lower in the authors’ experiments); we don’t actually want to reconstruct xᵃ exactly now but still want x’ to be similar to xᵃ;
the KL divergence loss Lkl is intended to help the attributes encoder Aconcentrate on attributes and “lose” the identity as much as possible; it serves as a regularizer to make the attributes vector distribution similar to a predefined prior (standard Gaussian);
the discriminator loss Lᵈ is standard GAN business: it shows how well Dcan discriminate between real and fake images; however, there is a twist here as well: instead of just including discriminator loss Lᵈ the network starts by using Lᵍᵈ, a feature matching loss that measures how similar the features extracted by D on some intermediate level from x’ and xa are; this is due to the fact that we cannot expect to fool D right away, the discriminator will always be nearly perfect at the beginning of training, and we have to settle for a weaker loss function first (see the CVAE-GAN paper for more details);
and, again, the same trick works for the identity classifier C; we use the basic classification loss Lᶜ but also augment it with the distance Lᵍᶜ between feature representations of x’ and xˢ on some intermediate layer of C.

主要問題是如何從屬性中分離出身份。我們怎樣才能告訴網路應該在 xˢ 應該採取什麼措施以及

xᵃ ？上面概述的體系結構本身並沒有回答這個問題，這裡的主要工作是通過仔細選擇損失函式來完成的。它們中有不少; 讓我們逐一篩選。 NeuroNugget 格式不允許太多公式，因此我們將嘗試捕獲損失函式的每個部分的含義：

最直接的部分是 softmax 分類損失 Lᵢ ，它訓練身份編碼器 I 識別照片上顯示的人的身份; 基本上，我們訓練 I 作為人物分類器，然後使用該網路的最後一層作為特徵 fᵢ（xs）;
重建損失 Lᵣ 更有趣; 我們希望結果 x’ 無論如何重建原始影象 xᵃ 但這裡有兩個不同的情況：
如果影象 xᵃ 上的人與身份影象 xs 上的人相同，毫無疑問我們應該做什麼：我們應該儘可能精確地重建 xᵃ ;
如果 xᵃ 和 xˢ 顯示兩個不同的人（我們知道監督訓練階段的所有身份），我們也想重建 xa ，但對“錯誤”的懲罰較低（在作者的實驗中低10倍）; 我們實際上並不想現在完全重建 xᵃ 但仍希望 x’ 與 xᵃ 相似;
KL分歧損失 Lkl 旨在幫助屬性編碼器 A 注意屬性並儘可能“丟失”身份; 它作為一個正則化器，使屬性向量分佈類似於預定義的先驗假設（標準高斯）;
鑑別器損失 Lᵈ 是標準的GAN：它顯示了 D能夠區分真實和假影象; 然而，這裡也有一個轉折：不僅僅包括鑑別器損失 Lᵈ ，網路開始使用Lᵍᵈ，一個特徵匹配損失，用於衡量 D 在 x’ 和 xa 的某個中間層上提取的特徵有多相似; 這是因為我們不能指望立即愚弄D，在訓練開始時鑑別器總是接近完美，我們必須首先解決較弱的損失函式（有關詳細資訊，請參閱CVAE-GAN論文）;
並且，同樣的技巧適用於身份分類器 C ; 我們使用基本分類損失 Lᶜ ，但也用 C 的某個中間層上 x’ 和 xˢ 的特徵表示之間的距離 Lᵍᶜ 來增加它。

by 老趙 2 06

(Disclaimer: I apologize for slightly messing up notation from the pictures but Medium actually does not support sub/superscripts so I had to make do with existing Unicode symbols.)

That was quite a lot to take in, wasn’t it? Well, this is how modern GAN-based architectures usually work: their final loss function is usually a sum of many different terms, each with its own motivation and meaning. But the resulting architecture works out very nicely; we can now train it in several different ways:

first, networks I and C are doing basically the same thing, identifying people; therefore, they can share both the architecture and the weights (which simplifies training), and we can even use a standard pretrained person identification network as a very good initialization for I and C;
next, we train the whole thing on a dataset of images of people with known identities; as we have already mentioned, we can pick pairs of xˢ and xᵃ as different images of the same person and have the network try to reconstruct xa exactly, or pick xˢ and xᵃ with different people and train with a lower weight of the reconstruction loss;
but even that is not all; publicly available labeled datasets of people are not diverse enough to train the whole architecture end-to-end, but, fortunately, it even allows for unsupervised training; if we don’t know the identity we can’t train I and C, so we have to ignore their loss functions, but we can still train the rest! And we have already seen that I and C are the easiest to train, so we can assume they have been trained well enough on the supervised part. Thus, we can simply grab some random faces from the Web and add them to the training set without knowing the identities.

（宣告：我為略微弄亂圖片中的符號而道歉但是Medium實際上不支援子/上標，所以我不得不使用現有的Unicode符號。）

這是相當多的東西，不是嗎？這就是現代基於GAN的架構通常的工作方式：它們的最終損失函式通常是許多不同術語的總和，每個術語都有自己的動機和意義。但是由此產生的結構非常好; 我們現在可以用幾種不同的方式訓練它：

首先，網路 I 和 C 基本上做同樣的事情，識別人; 因此，他們可以共享架構和權重（這簡化了訓練），我們甚至可以使用標準的預訓練人員識別網路作為 I 和 C 的非常好的初始化;
接下來，我們將整個事物訓練成具有已知身份的人的影象資料集; 正如我們已經提到的，我們可以成對選擇

xˢ 和 xᵃ 作為同一個人的不同影象，並讓網路嘗試精確地重建 xa，或者用不同的人選擇 xˢ 和 xᵃ 並以較低的重建損失進行訓練;

但即便如此也不是全部; 公開提供的人員標籤資料集不夠多樣化，無法對端到端的整個架構進行訓練，但幸運的是，它甚至允許無人監督的訓練; 如果我們不知道我們不能訓練 I 和 C 的身份，那麼我們必須忽略他們的損失功能，但我們仍然可以訓練剩下的。我們已經看到 I 和 C 是最容易訓練的，所以我們可以假設他們在受監督的部分訓練得很好。因此，我們可以簡單地從Web抓取一些隨機面，並在不知道身份的情況下將它們新增到訓練集中。

by 老趙 2 07

Thanks to the conscious and precise choice of the architecture, loss functions, and the training process, the results are fantastic! Here are two selections from the paper. In the first, we see transformations of faces randomly chosen from the training set with random faces for attributes:

And in the second, the identities never appeared in the training set! These are people completely unknown to the network (“zero-shot identities”, as the paper calls them)… and it still works just fine:

由於有意識和精確地選擇了架構，損失函式和訓練過程，結果非常棒。以下是論文中的兩個選項。在第一個中，我們看到從訓練集中隨機選擇的面部變換，其中包含屬性的隨機面部：

而在第二，身份從未出現在訓練集中。這些是網路完全不為人知的人（“零標識身份”，正如論文所稱）…它仍然可以正常工作：

by 老趙 2 08

PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup

This collaboration of researchers from Princeton, Berkeley, and Adobe (full pdf) works in the same vein as the previous paper but tackles a much more precise problem: can we add/modify the makeup on a photograph rather than all attributes at once, while keeping the face as recognizable as possible. A major problem here is, as it often happens in machine learning, with the data: a relatively direct approach would be quite possible if we had a large dataset of aligned photographs of faces with and without makeup… but of course we don’t. So how do we solve this?

The network still gets two images as an input: the source image from which we take the face and the reference image from which we take the makeup style. The model then produces the corresponding output; here are some sample results, and they are very impressive:

This unsupervised learning framework relies on a new model of a cycle-consistent generative adversarial network; it consists of the two asymmetric functions: the forward function encodes example-based style transfer, whereas the backward function removes the style. Here is how it works:

成對迴圈GAN：用於應用和刪除化妝的非對稱風格轉移

來自普林斯頓，伯克利和Adobe的研究人員（完整的pdf）的合作與前一篇論文的工作方式相同，但解決了更為精確的問題：我們可以在照片上新增/修改化妝而不是同時修改所有屬性，而保持臉部儘可能可識別。這裡的一個主要問題是，在機器學習中經常發生的資料：如果我們有一個大型資料集對齊有和沒有化妝的面部照片，那麼相對直接的方法是很有可能的…但當然我們沒有。那麼我們如何解決這個問題呢？

網路仍然獲得兩個影象作為輸入：我們從中獲取面部的源影象和我們採用化妝風格的參考影象。然後模型產生相應的輸出; 這裡有一些示例結果，它們非常令人印象深刻：

這種無監督的學習框架依賴於迴圈一致的生成對抗網路的新模型; 它由兩個非對稱函式組成：前向函式對基於示例的風格轉換進行編碼，而後向函式則刪除風格。下面是它的工作原理：

by 老趙 2 09

The picture shows two coupled networks designed to implement these functions: one that transfers makeup style (G) and another that can remove makeup (F); the idea is to make the output of their successive application to an input photo match the input.

Let us talk about losses again because they define the approach and capture the main new ideas in this work as well. The only notation we need for that is that X is the “no makeup” domain and Y is the domain of images with makeup. Now:

the discriminator DY tries to discriminate between real samples from domain Y (with makeup) and generated samples, and the generator Gaims to fool it; so here we use an adversarial loss to constrain the results of G to look similar to makeup faces from domain Y;
the same loss function is used for F for the same reason: to encourage it to generate images indistinguishable from no-makeup faces sampled from domain X;
but these loss functions are not enough; they would simply let the generator reproduce the same picture as the reference without any constraints imposed by the source; to prevent this, we use the identity loss for the composition of G and F: if we apply makeup to a face x from X and then immediately remove it, we should get back the input image x exactly;
now we have made the output of G to belong to Y (faces with makeup) and preserve the identity, but we still are not really using the reference makeup style in any way; to transfer the style, we use two different style losses:
style reconstruction loss Ls says that if we transfer makeup from a face y to a face x with G(x,y), then remove makeup from y with F(y), and then apply the style from G(x,y) back to F(y), we should get y back, i.e., G(F(y), G(x,y)) should be similar to y;
and then on top of all this, we add another discriminator DS that decides whether a given pair of faces have the same makeup; its style discriminator loss LP is the final element of the objective function.

圖為兩個耦合網路，旨在實現這些功能：一個傳遞化妝風格（G），另一個可以消除化妝（F）; 我們的想法是使其連續應用程式的輸出與輸入照片匹配。

讓我們再次討論損失，因為他們定義了方法並捕捉了這項工作中的主要新想法。我們需要的唯一符號是 X 是“無化妝”域，Y 是化妝影象的域。現在：

鑑別器 DY 試圖區分來自域 Y（帶化妝）的實際樣本和生成的樣本，並且生成器 G 旨在欺騙它; 所以在這裡我們使用對抗性損失將 G 的結果限制為類似於域 Y 的化妝面部;
由於同樣的原因，F 使用相同的損失函式：鼓勵它生成與從域 X 取樣的無化妝面部無法區分的影象;
但這些損失函式還不夠; 他們只是簡單地讓發生器重現與參考相同的影象，而不受源的任何限制; 為了防止這種情況，我們使用 G 和 F 組合的同一性損失：如果我們從 X 對面部 x 施加化妝然後立即將其移除，我們應該準確地取回輸入影象 x ;
現在我們已經使 G 的輸出屬於 Y（面部化妝）並保留了身份，但我們仍然沒有以任何方式使用參考化妝風格; 轉移風格，我們使用兩種不同的風格損失：
風格重建損失 Ls 表示如果我們用 G（x，y）將化妝從臉部 y 轉移到臉部 x，然後用 F（y）從y移除化妝，然後從 G（x，y）應用樣式到F（y），我們應該回到 y，即 G（F（y），G（x，y））應該與 y 相似;
然後在這一切之上，我們新增另一個鑑別器 DS，它決定一對給定的面部具有相同的構成; 它的風格鑑別器損失 LP 是目標函式的最終元素。

by 老趙 2 10

There is more to the paper than just loss functions. For example, another problem was how to acquire a dataset of photos for the training set. The authors found an interesting solution: use beauty-bloggers from YouTube! They collected a dataset from makeup tutorial videos (verified manually on Amazon Mechanical Turk), thus ensuring that it would contain a large variety of makeup styles in high resolution.

The results are, again, pretty impressive:

論文不僅僅是損失函式。例如，另一個問題是如何獲取訓練集的照片資料集。作者找到了一個有趣的解決方案：使用來自YouTube的美女博主。他們從化妝教程視訊中收集了一個數據集（在亞馬遜機械土耳其人手動驗證），從而確保它包含高解析度的各種化妝風格。

結果再次令人印象深刻：

by 老趙 2 11

The results become especially impressive if you compare them with previous state of the art models for makeup transfer:

We have a feeling that the next Prisma might very well be lurking somewhere nearby…

如果你與藝術模特化妝轉移之前的狀態對它們進行比較：結果會特別令人印象深刻：

我們有一種感覺，下一個Prisma很可能潛伏在附近的某個地方…

by 老趙 2 12

Facial Expression Recognition by De-expression Residue Learning

With the last paper for today (full pdf), we turn from makeup to a different kind of very specific facial features: emotions. How can we disentangle identity and emotions?

In this work, the proposed architecture contains two learning processes: the first is learning to generate standard neutral faces by conditional GANs (cGAN), and the second is learning from the intermediate layers of the resulting generator. To train the cGANs, we use pairs of face images that show some expression (input), and a neutral face image of the same subject (output):

The cGAN is learned as usual: the generator reconstructs the output based on the input image, and then tuples (input, target, yes) and (input, output, no) are given to the discriminator. The discriminator tries to distinguish generated samples from the ground truth while the generator tries to not only confuse the discriminator but also generate an image as close to the target image as possible (composite loss functions again, but this time relatively simple).

The paper calls this process de-expression (removing expression from a face), and the idea is that during de-expression, information related to the actual emotions is still recorded as an expressive component in the intermediate layers of the generator. Thus, for the second learning process we fix the parameters of the generator, and the outputs of intermediate layers are combined and used as input for deep models that do facial expression classification. The overall architecture looks like this:

去表達殘留學習的面部表情識別

隨著今天的最後一篇論文（完整pdf），我們從化妝轉向另一種非常特殊的面部特徵：情緒。我們怎樣才能解開身份和情感？

在這項工作中，提出的架構包含兩個學習過程：第一個是學習通過條件GAN（cGAN）生成標準中性面部，第二個是從生成的生成器的中間層學習。為了訓練 cGAN，我們使用顯示一些表情（輸入）的面部影象對和相同主題的中性面部影象（輸出）：

像往常一樣學習 cGAN：生成器基於輸入影象重建輸出，然後將元組（輸入，目標，是）和（輸入，輸出，否）給予鑑別器。鑑別器試圖區分生成的樣本和背景實況，而生成器不僅試圖混淆鑑別器而且還生成儘可能接近目標影象的影象（複合損失函式再次，但這次相對簡單）。

本文將此過程稱為去表達（從臉部去除表達），並且其思想是在去表達期間，與實際情緒相關的資訊仍被記錄為發生器的中間層中的表達元件。因此，對於第二學習過程，我們固定生成器的引數，並且中間層的輸出被組合並用作進行面部表情分類的深度模型的輸入。整體架構如下所示：

by 老趙 2 13

After neutral face generation, the expression information can be analyzed by comparing the neutral face and the query expression face at the pixel level or feature level. However, pixel-level difference is unreliable due to the variation between images (i.e., rotation, translation, or lighting). This can cause a large pixel-level difference even without any changes in the expression. The feature-level difference is also unstable, as the expression information may vary according to the identity information. Since the difference between the query image and the neutral image is recorded in the intermediate layers, the authors exploit the expressive component from the intermediate layers directly.

The following figure illustrates some samples of the de-expression residue, which are the expressive components for anger, disgust, fear, happiness, sadness, and surprise respectively; the pictures shows the corresponding histogram for each expressive component. As we can see, both expressive components and corresponding histograms are quite distinguishable:

And here are some sample results on different datasets. In all pictures, the first column is the input image, the third column is the ground-truth neutral face image of the same subject, and the middle is the output of the generative model:

As a result, the authors both get a nice network for de-expression, i.e., removing emotion from a face, and improve state of the art results for emotion recognition by training the emotion classifier on rich features captured by the de-expression network.

在生成中性面部之後，可以通過在畫素級別或特徵級別比較中性面部和查詢表達面部來分析表達資訊。然而，由於影象之間的變化（即，旋轉，平移或照明），畫素級差異是不可靠的。即使表示式沒有任何變化，這也會導致較大的畫素級差異。特徵級差異也是不穩定的，因為表達資訊可能根據身份資訊而變化。由於查詢影象和中性影象之間的差異被記錄在中間層中，因此作者直接利用來自中間層的表達成分。

下圖說明了去表達殘基的一些樣本，它們分別是憤怒，厭惡，恐懼，快樂，悲傷和驚訝的表達成分;圖片顯示了每個表達元件的相應直方圖。我們可以看到，表達元件和相應的直方圖都是可以區分的：

以下是不同資料集的一些示例結果。在所有圖片中，第一列是輸入影象，第三列是同一面部的背景真實中性面部影象，中間是生成模型的輸出：

結果，作者都獲得了用於去表達的良好網路，即，從臉部移除情緒，並且通過在由去表達網路捕獲的豐富特徵上訓練情緒分類器來改善用於情感識別的現有技術結果。

by 老趙 2 14

Final words

Thank you for reading! With this, we are finally done with CVPR 2018. It is hard to do justice to a conference this large; naturally, there were hundreds of very interesting papers that we have not been able to cover. But still, we hope it has been an interesting and useful selection. We will see you again soon in the next NeuroNugget installments. Good luck!

Sergey Nikolenko
Chief Research Officer, Neuromation

Anastasia Gaydashenko
former Research Intern at Neuromation, currently Machine Learning Intern at Cisco

結語

謝謝你的閱讀。有了這個，我們終於完成了2018年的CVPR。很難對這麼大的會議做出正確的判斷; 當然，有數百篇非常有趣的論文是我們無法涵蓋的。但是，我們仍然希望它是一個有趣和有用的選擇。我們很快會在下一個NeuroNugget分期介紹中再次見到你。好運。