facenet:triplet-loss理解與train_tripletloss.py程式碼理解

對於Facenet進行人臉特徵提取，演算法內容較為核心和比較難以理解的地方在於三元損失函式Triplet-loss。
在這裡插入圖片描述神經網路所要學習的目標是：使得Anchor到Positive的距離要比Anchor到Negative的距離要短（Anchor為一個樣本，Positive為與Anchor同類的樣本，Negative為與Anchor不同類的樣本）。通過學習使得類別內部的樣本距離大於不同類別樣本的距離即可。
通過數學表示式如下所示，alpha為一個常數：
在這裡插入圖片描述
損失函式的定義如下所示：

利用隨機梯度下降法不斷讓loss變小，第一個正規化為(a,p)距離，第二個正規化為(a,n)距離，alpha為常數，因此可以理解為網路學習的過程就是使得類內距離變小，類見距離變大。
損失函式確定好之後如何在訓練時尋找anchor對應的negative樣本和positive樣本成為一個要著重考慮的問題。理論上講，我們應當選擇hard positive與hard negative，如下論文所示。
在這裡插入圖片描述
但是這種做法是不可行的，而且這種訓練方式可能無法達到優的效果。（個人理解：這種方法最麻煩的是需要在整個訓練集上找到anchor對應的negative，相當耗時）
因此作者的辦法是在訓練時，在一個batch上尋找negative，選擇的標準是滿足如下表達式：
在這裡插入圖片描述
這裡找到的negative樣本叫做semi-hard，意思是在batch的範圍內，只要（a,n）的距離大於(a,p)的距離即可。
以上便是對triplet-loss的理解。
對於如何訓練該模型，可以參考github上 https://github.com/davidsandberg/facenet 的實現（tf版本），以下我做了一些關於train_tripletloss.py的註釋，註釋為個人理解，有不對的地方請大家指正，其中有些部分參考（
http://www.mamicode.com/info-detail-2096766.html ）.
另外說明一點，程式實現找到negative：
在滿足（a，n）距離-（a，p）<alpha關係的基礎上，隨機選擇一個樣本作為negative。
"""Training a face recognizer with TensorFlow based on the FaceNet paper
FaceNet: A Unified Embedding for Face Recognition and Clustering: http://arxiv.org/abs/1503.03832
"""
# MIT License 

# 
# Copyright (c) 2016 David Sandberg
# 
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
# 
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from datetime import datetime
import os.path
import time
import sys
import tensorflow as tf
import numpy as np
import importlib
import itertools
import argparse
import facenet
import lfw

from tensorflow.python.ops import data_flow_ops

from six.moves import xrange  # @UnresolvedImport

def main(args):
  
    network = importlib.import_module(args.model_def)

    subdir = datetime.strftime(datetime.now(), '%Y%m%d-%H%M%S')
    log_dir = os.path.join(os.path.expanduser(args.logs_base_dir), subdir)
    if not os.path.isdir(log_dir):  # Create the log directory if it doesn't exist
        os.makedirs(log_dir)
    model_dir = os.path.join(os.path.expanduser(args.models_base_dir), subdir)
    if not os.path.isdir(model_dir):  # Create the model directory if it doesn't exist
        os.makedirs(model_dir)

    # Write arguments to a text file
    facenet.write_arguments_to_file(args, os.path.join(log_dir, 'arguments.txt'))
        
    # Store some git revision info in a text file in the log directory
    src_path,_ = os.path.split(os.path.realpath(__file__))
    facenet.store_revision_info(src_path, log_dir, ' '.join(sys.argv))

    np.random.seed(seed=args.seed)
    train_set = facenet.get_dataset(args.data_dir)
    
    print('Model directory: %s' % model_dir)
    print('Log directory: %s' % log_dir)
    if args.pretrained_model:
        print('Pre-trained model: %s' % os.path.expanduser(args.pretrained_model))
    
    if args.lfw_dir:
        print('LFW directory: %s' % args.lfw_dir)
        # Read the file containing the pairs used for testing
        pairs = lfw.read_pairs(os.path.expanduser(args.lfw_pairs))
        # Get the paths for the corresponding images
        lfw_paths, actual_issame = lfw.get_paths(os.path.expanduser(args.lfw_dir), pairs)
        
    
    with tf.Graph().as_default():
        tf.set_random_seed(args.seed)
        global_step = tf.Variable(0, trainable=False)

        # Placeholder for the learning rate
        learning_rate_placeholder = tf.placeholder(tf.float32, name='learning_rate')
        
        batch_size_placeholder = tf.placeholder(tf.int32, name='batch_size')
        
        phase_train_placeholder = tf.placeholder(tf.bool, name='phase_train')
        
        image_paths_placeholder = tf.placeholder(tf.string, shape=(None,3), name='image_paths')
        labels_placeholder = tf.placeholder(tf.int64, shape=(None,3), name='labels')
        
        input_queue = data_flow_ops.FIFOQueue(capacity=100000,
                                    dtypes=[tf.string, tf.int64],
                                    shapes=[(3,), (3,)],
                                    shared_name=None, name=None)
        enqueue_op = input_queue.enqueue_many([image_paths_placeholder, labels_placeholder])
        
        nrof_preprocess_threads = 4
        images_and_labels = []
        for _ in range(nrof_preprocess_threads):
            filenames, label = input_queue.dequeue()
            images = []
            for filename in tf.unstack(filenames):
                file_contents = tf.read_file(filename)
                image = tf.image.decode_image(file_contents, channels=3)
                
                if args.random_crop:
                    image = tf.random_crop(image, [args.image_size, args.image_size, 3])
                else:
                    image = tf.image.resize_image_with_crop_or_pad(image, args.image_size, args.image_size)
                if args.random_flip:
                    image = tf.image.random_flip_left_right(image)
    
                #pylint: disable=no-member
                image.set_shape((args.image_size, args.image_size, 3))
                images.append(tf.image.per_image_standardization(image))
            images_and_labels.append([images, label])
    
        image_batch, labels_batch = tf.train.batch_join(
            images_and_labels, batch_size=batch_size_placeholder, 
            shapes=[(args.image_size, args.image_size, 3), ()], enqueue_many=True,
            capacity=4 * nrof_preprocess_threads * args.batch_size,
            allow_smaller_final_batch=True)
        image_batch = tf.identity(image_batch, 'image_batch')
        image_batch = tf.identity(image_batch, 'input')
        labels_batch = tf.identity(labels_batch, 'label_batch')

        # Build the inference graph
        prelogits, _ = network.inference(image_batch, args.keep_probability, 
            phase_train=phase_train_placeholder, bottleneck_layer_size=args.embedding_size,
            weight_decay=args.weight_decay)
        
        embeddings = tf.nn.l2_normalize(prelogits, 1, 1e-10, name='embeddings')
        # Split embeddings into anchor, positive and negative and calculate triplet loss
        anchor, positive, negative = tf.unstack(tf.reshape(embeddings, [-1,3,args.embedding_size]), 3, 1)
        triplet_loss = facenet.triplet_loss(anchor, positive, negative, args.alpha)
        
        learning_rate = tf.train.exponential_decay(learning_rate_placeholder, global_step,
            args.learning_rate_decay_epochs*args.epoch_size, args.learning_rate_decay_factor, staircase=True)
        tf.summary.scalar('learning_rate', learning_rate)

        # Calculate the total losses
        regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
        total_loss = tf.add_n([triplet_loss] + regularization_losses, name='total_loss')

        # Build a Graph that trains the model with one batch of examples and updates the model parameters
        train_op = facenet.train(total_loss, global_step, args.optimizer, 
            learning_rate, args.moving_average_decay, tf.global_variables())
        
        # Create a saver
        saver = tf.train.Saver(tf.trainable_variables(), max_to_keep=3)

        # Build the summary operation based on the TF collection of Summaries.
        summary_op = tf.summary.merge_all()

        # Start running operations on the Graph.
        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=args.gpu_memory_fraction)
        sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))        

        # Initialize variables
        sess.run(tf.global_variables_initializer(), feed_dict={phase_train_placeholder:True})
        sess.run(tf.local_variables_initializer(), feed_dict={phase_train_placeholder:True})

        summary_writer = tf.summary.FileWriter(log_dir, sess.graph)
        coord = tf.train.Coordinator()
        tf.train.start_queue_runners(coord=coord, sess=sess)

        with sess.as_default():

            if args.pretrained_model:
                print('Restoring pretrained model: %s' % args.pretrained_model)
                saver.restore(sess, os.path.expanduser(args.pretrained_model))

            # Training and validation loop
            epoch = 0
            while epoch < args.max_nrof_epochs:
                step = sess.run(global_step, feed_dict=None)
                epoch = step // args.epoch_size
                # Train for one epoch
                train(args, sess, train_set, epoch, image_paths_placeholder, labels_placeholder, labels_batch,
                    batch_size_placeholder, learning_rate_placeholder, phase_train_placeholder, enqueue_op, input_queue, global_step, 
                    embeddings, total_loss, train_op, summary_op, summary_writer, args.learning_rate_schedule_file,
                    args.embedding_size, anchor, positive, negative, triplet_loss)

                # Save variables and the metagraph if it doesn't exist already
                save_variables_and_metagraph(sess, saver, summary_writer, model_dir, subdir, step)

                # Evaluate on LFW
                if args.lfw_dir:
                    evaluate(sess, lfw_paths, embeddings, labels_batch, image_paths_placeholder, labels_placeholder, 
                            batch_size_placeholder, learning_rate_placeholder, phase_train_placeholder, enqueue_op, actual_issame, args.batch_size, 
                            args.lfw_nrof_folds, log_dir, step, summary_writer, args.embedding_size)

    return model_dir


def train(args, sess, dataset, epoch, image_paths_placeholder, labels_placeholder, labels_batch,
          batch_size_placeholder, learning_rate_placeholder, phase_train_placeholder, enqueue_op, input_queue, global_step, 
          embeddings, loss, train_op, summary_op, summary_writer, learning_rate_schedule_file,
          embedding_size, anchor, positive, negative, triplet_loss):
    batch_number = 0
    
    if args.learning_rate>0.0:
        lr = args.learning_rate
    else:
        lr = facenet.get_learning_rate_from_file(learning_rate_schedule_file, epoch)
    while batch_number < args.epoch_size:
        # Sample people randomly from the dataset
        #從dataset中隨機抽取樣本資料
        image_paths, num_per_class = sample_people(dataset, args.people_per_batch, args.images_per_person)
        
        print('Running forward pass on sampled images: ', end='')
        start_time = time.time()
        #樣本個數nrof_examples=batch中的人數people_per_batch×每個人對應的照片數量images_per_person
        nrof_examples = args.people_per_batch * args.images_per_person
        labels_array = np.reshape(np.arange(nrof_examples),(-1,3))
        image_paths_array = np.reshape(np.expand_dims(np.array(image_paths),1), (-1,3))
        sess.run(enqueue_op, {image_paths_placeholder: image_paths_array, labels_placeholder: labels_array})
        #emb_array為所有樣本對應的embedding特徵資訊
        emb_array = np.zeros((nrof_examples, embedding_size))
        #需要執行的batch數量nrof_batches為總樣本數除以batch_size
        nrof_batches = int(np.ceil(nrof_examples / args.batch_size))
        for i in range(nrof_batches):
            batch_size = min(nrof_examples-i*args.batch_size, args.batch_size)
            #喂資料，得到一個batch的emb和lab資訊
            emb, lab = sess.run([embeddings, labels_batch], feed_dict={batch_size_placeholder: batch_size, 
                learning_rate_placeholder: lr, phase_train_placeholder: True})
            emb_array[lab,:] = emb
        print('%.3f' % (time.time()-start_time))

        # Select triplets based on the embeddings
        print('Selecting suitable triplets for training')
        #select_triplets用來找尋triplets
        triplets, nrof_random_negs, nrof_triplets = select_triplets(emb_array, num_per_class, 
            image_paths, args.people_per_batch, args.alpha)
        selection_time = time.time() - start_time
        print('(nrof_random_negs, nrof_triplets) = (%d, %d): time=%.3f seconds' % 
            (nrof_random_negs, nrof_triplets, selection_time))

        # Perform training on the selected triplets
        #nrof_batches代表需要執行的batch數=nrof_triplets×3（圖片數量）/batch_size
        nrof_batches = int(np.ceil(nrof_triplets*3/args.batch_size))
        #triplet_paths表示所有選中的圖片路徑
        triplet_paths = list(itertools.chain(*triplets))
        labels_array = np.reshape(np.arange(len(triplet_paths)),(-1,3))
        triplet_paths_array = np.reshape(np.expand_dims(np.array(triplet_paths),1), (-1,3))
        sess.run(enqueue_op, {image_paths_placeholder: triplet_paths_array, labels_placeholder: labels_array})
        nrof_examples = len(triplet_paths)
        train_time = 0
        i = 0
        emb_array = np.zeros((nrof_examples, embedding_size))
        loss_array = np.zeros((nrof_triplets,))
        summary = tf.Summary()
        step = 0
        #不斷執行batch
        while i < nrof_batches:
            start_time = time.time()
            batch_size = min(nrof_examples-i*args.batch_size, args.batch_size)
            feed_dict = {batch_size_placeholder: batch_size, learning_rate_placeholder: lr, phase_train_placeholder: True}
            err, _, step, emb, lab = sess.run([loss, train_op, global_step, embeddings, labels_batch], feed_dict=feed_dict)
            emb_array[lab,:] = emb
            loss_array[i] = err
            duration = time.time() - start_time
            print('Epoch: [%d][%d/%d]\tTime %.3f\tLoss %2.3f' %
                  (epoch, batch_number+1, args.epoch_size, duration, err))
            batch_number += 1
            i += 1
            train_time += duration
            summary.value.add(tag='loss', simple_value=err)
            
        # Add validation loss and accuracy to summary
        #pylint: disable=maybe-no-member
        summary.value.add(tag='time/selection', simple_value=selection_time)
        summary_writer.add_summary(summary, step)
    return step
  
def select_triplets(embeddings, nrof_images_per_class, image_paths, people_per_batch, alpha):
    """ Select the triplets for training
    """
    trip_idx = 0
    #某個人的圖片的embedding在emb_arr中的開始索引
    emb_start_idx = 0
    num_trips = 0
    triplets = []
    
    # VGG Face: Choosing good triplets is crucial and should strike a balance between
    #  selecting informative (i.e. challenging) examples and swamping training with examples that
    #  are too hard. This is achieve by extending each pair (a, p) to a triplet (a, p, n) by sampling
    #  the image n at random, but only between the ones that violate the triplet loss margin. The
    #  latter is a form of hard-negative mining, but it is not as aggressive (and much cheaper) than
    #  choosing the maximally violating example, as often done in structured output learning.
    
    #遍歷每一個人
    for i in xrange(people_per_batch):
        #這個人有多少張圖片
        nrof_images = int(nrof_images_per_class[i])
        #遍歷第i個人所有照片對應的embedding特徵
        for j in xrange(1,nrof_images):
            #a_idx表示第j張圖片在emb_arr中的位置
            a_idx = emb_start_idx + j - 1
            #neg_dists_sqr表示這張圖片跟其他所有圖片的歐式距離
            neg_dists_sqr = np.sum(np.square(embeddings[a_idx] - embeddings), 
 
              
           
              
              
            
            相關推薦
			   
            
            
            
 

    

    
    facenet:triplet-loss理解與train_tripletloss.py程式碼理解
       
  
  
 對於Facenet進行人臉特徵提取，演算法內容較為核心和比較難以理解的地方在於三元損失函式Triplet-loss。 神經網路所要學習的目標是：使得Anchor到Positive的距離要比Anchor到Negative的距離要短（Anchor為一個樣本，Positive為與Anchor同類的 

  
 

    

    
    Java反射機制——動態代理的理解與程式碼體現！
       
 
 package com.ITcore.cn;
/**
 * @author 維宇——鮀城小帥
 *  作於：2018-10-9
 * 別代理物件
 * */
public interface Father {
	//購買
	public void buy();
}
 
   
 packa 

  
 

    

    
    detectron程式碼理解（三）：RPN構建與相應的損失函式
       
 
 1.RPN的構建 
 對RPN的構建在FPN.py的add_fpn_rpn_output函式中 
 def add_fpn_rpn_outputs(model, blobs_in, dim_in, spatial_scales):
    """Add RPN on FPN specific out 

  
 

    

    
    主題模型（LDA）(一)--通俗理解與簡單應用---一些程式碼
      
                這篇文章主要給一些不太喜歡數學的朋友們的，其中基本沒有用什麼數學公式。 
目錄

直觀理解主題模型
	LDA的通俗定義
	LDA分類原理
	LDA的精髓
	主題模型的簡單應用-希拉里郵件門

1.直觀理解主題模型



聽名字應該就知道他講的是什麼？假如有一篇文章text，通 

  
 

    

    
    奇異值分解的理解與oepncv程式碼
       
  
  
 特徵分解的缺點 
 特徵分解的推導與意義與opencv程式碼在上一篇部落格中介紹了特徵分解的原理和推導，特徵分解在一定的情況下可以很好的分解（實對稱矩陣），但是也有很大的侷限性 
  
  1. 只能對可對角化的方陣使用 2. 在可以對角化的情況下，特徵向量之間也不一定相互正交 
  
 為 

  
 

    

    
    深度學習基礎--loss與啟用函式--triplet loss
      
							
							
							triplet loss
  triplet是一個三元組，這個三元組是這樣構成的：從訓練資料集中隨機選一個樣本，該樣本稱為Anchor，然後再隨機選取一個和Anchor (記為x_a)屬於同一類的樣本和不同類的樣本,這兩個樣本對應的稱為Positive (記為x 

  
 

    

    
    關於C#託管程式碼與非託管程式碼的理解
       
 
 C#託管程式碼是什麼？ 
  
  託管程式碼（Managed Code）實際上就是中間語言（IL）程式碼。程式碼編寫完畢後進行編譯，此時編譯器把程式碼編譯成中間語言（IL），而不是能直接在你的電腦上執行的機器碼。程式集（Assembly）的檔案負責封裝中間語言，程式集中包含了描述所建立的方法、類以 

  
 

    

    
    Embedding理解與程式碼實現
       
 
  
  
 Embedding 字面理解是 “嵌入”，實質是一種對映，從語義空間到向量空間的對映，同時儘可能在向量空間保持原樣本在語義空間的關係，如語義接近的兩個詞彙在向量空間中的位置也比較接近。 
 下面以一個基於Keras的簡單的文字情感分類問題為例解釋Embedding的訓練過程： 
 首先， 

  
 

    

    
    py-faster-rcnn中demo.py程式碼與C++版本的程式碼對比: part01 鋪墊, demo.py引入的模組
      
							
							
							





★ python程式碼

tools/demo.py 中import的內容, 是整個程式碼流程的鋪墊, 理解了import的內容, 對理解後續的python程式碼和C++ 程式碼都有幫助.

demo.py的import內容:



import _ 

  
 

    

    
    (優秀漢諾塔演算法)對漢諾塔經典遞迴問題的理解與講解（部分引用大神程式碼，附連結。）
      
                
部落格大神的優秀漢諾塔程式碼：喜歡特別冷的冬天下著雪   （侵權聯絡）
本文只是在大神思路的基礎上加以理解。



[cpp] view
 plain copy

 print?


#include <stdio.h>
//第一個塔為初始塔，中間的塔為借用塔， 

  
 

    

    
    讀書筆記：機器學習實戰(2)——章3的決策樹程式碼和個人理解與註釋
      
							
							
							首先是對於決策樹的個人理解： 
通過尋找最大資訊增益（或最小資訊熵）的分類特徵，從部分已知類別的資料中提取分類規則的一種分類方法。 
資訊熵： 
 
其中，log底數為2，額，好吧，圖片我從百度截的。。 
這裡只解釋到它是一種資訊的期望值，深入的請看維基百科

 

  
 

    

    
    讀書筆記：機器學習實戰(5)——章6的支援向量機程式碼和個人理解與註釋
      
							
							
							時隔好久，前幾章部落格是去年看的時候寫的，後來只看書沒有繼續寫，再後來忙著專案，連書都很少看了。然後是忙完專案後的空白期的瘋狂看書，看了很多資料結構演算法，設計模式，程式碼整潔，專案可重構方面的書。年後重新把《機器學習實戰》後面的章節讀完，現在開始整理筆記。 
 

  
 

    

    
    A*算法的理解與簡單實現
      update   for   port   移動   ont   效率   print   估算   net   基本定義
一種尋路算法，特點是：啟發式的，效率高，基本思路比較簡單。
用途
尋路。在指定的地圖上，考慮到地圖上的移動代價，找到最優的路徑。
核心概念
開表，閉表，估值函數。
開表
開表，記錄了當前 

  
 

    

    
    mysql體系結構理解與分析
      interface   storage   編程語言   數據庫   結構圖       接觸mysql有一年多了，但是始終是一個偶爾用用的狀態，對其原理性的東西研究不夠，在不少mysql相關的暑假中提到mysql體系結構，很清楚解析了mysql的各個模塊分層和主要功能特性，在理解此功能特性後，會剛好的幫助我 

  
 

    

    
    分針網——每日分享：HTTP協議理解與應用總結
      http   應用總結   
	
		領取免費IT資料 加群：272292492
	
	
		更多文章：www.f-z.cn
	
	
		

	
	
		

	
	
		Request & Response
	


	
		

	


	
		Re 

  
 

    

    
    對WEB標準以及W3C的理解與認識
      ron   提高   搜索引擎   class   編程   簡單   命名   組織   事情   網頁主要由三個部分組成，表現、結構和行為。
我理解的就是：

html是名詞--表現
css是形容詞--結構
javascript是動詞--行為

以上這三個東西就形成了一個完整的網頁，但是js改變時，可以會 

  
 

    

    
    c語言函數指針的理解與使用
      tdi   是不是   使用   模塊   html   c語言函數   討論   編譯器   麻煩   轉載：http://www.cnblogs.com/haore147/p/3647262.html 1.函數指針的定義 　　顧名思義，函數指針就是函數的指針。它是一個指針，指向一個函數。看例子： A) c 

  
 

    

    
    一個例子加深對servlet與tcp協議的理解
      puts   一個   .com   images   發送信息   mage   交流   tcp協議   host   理解一下servlet
Java Servlet 是運行在 Web 服務器或應用服務器上的程序，它是作為來自 Web 瀏覽器或其他 HTTP 客戶端的請求和 HTTP 服務器上的數據庫或 

  
 

    

    
    accp8.0轉換教材第1章多線程理解與練習
      獲取   stack   創建   exc   同步方法   emp   默認   一個   ack   一.單詞部分：
①process進程 ②current當前的③thread線程④runnable可獲取的
⑤interrupt中斷⑥join加入⑦yield產生⑧synchronize同時發生
二.預習部 

  
 

    

    
    accp8.0轉換教材第9章JQuery相關知識理解與練習
      ntb   驗證   單詞   手機號碼   sdn   load   .com   read   要求   自定義動畫
一.單詞部分：
①animate動畫②remove移除③validity有效性 ④required匹配⑤pattern模式
二.預習部分
1.簡述JavaScript事件和jquery事件