TensorFlow Wide And Deep 模型詳解與應用

阿新 • • 發佈：2018-11-06

Wide and deep 模型是 TensorFlow 在 2016 年 6 月左右釋出的一類用於分類和迴歸的模型，並應用到了 Google Play 的應用推薦中 [1]。wide and deep 模型的核心思想是結合線性模型的記憶能力（memorization）和 DNN 模型的泛化能力（generalization），在訓練過程中同時優化 2 個模型的引數，從而達到整體模型的預測能力最優。

結合我們的產品應用場景同 Google Play 的推薦場景存在較多的類似之處，在經過調研和評估後，我們也將 wide and deep 模型應用到產品的推薦排序模型，並搭建了一套線下訓練和線上預估的系統。鑑於網上對 wide and deep 模型的相關描述和講解並不是特別多，我們將這段時間對 TensorFlow1.1 中該模型的調研和相關應用經驗分享出來，希望對相關使用人士帶來幫助。

wide and deep 模型的框架在原論文的圖中進行了很好的概述。wide 端對應的是線性模型，輸入特徵可以是連續特徵，也可以是稀疏的離散特徵，離散特徵之間進行交叉後可以構成更高維的離散特徵。線性模型訓練中通過 L1 正則化，能夠很快收斂到有效的特徵組合中。deep 端對應的是 DNN 模型，每個特徵對應一個低維的實數向量，我們稱之為特徵的 embedding。DNN 模型通過反向傳播調整隱藏層的權重，並且更新特徵的 embedding。wide and deep 整個模型的輸出是線性模型輸出與 DNN 模型輸出的疊加。

如原論文中提到的，模型訓練採用的是聯合訓練（joint training），模型的訓練誤差會同時反饋到線性模型和 DNN 模型中進行引數更新。相比於 ensemble learning 中單個模型進行獨立訓練，模型的融合僅在最終做預測階段進行，joint training 中模型的融合是在訓練階段進行的，單個模型的權重更新會受到 wide 端和 deep 端對模型訓練誤差的共同影響。因此在模型的特徵設計階段，wide 端模型和 deep 端模型只需要分別專注於擅長的方面，wide 端模型通過離散特徵的交叉組合進行 memorization，deep 端模型通過特徵的 embedding 進行 generalization，這樣單個模型的大小和複雜度也能得到控制，而整體模型的效能仍能得到提高。

å¾çæè¿°

圖 1 Wide and deep 模型示意圖

Wide And Deep 模型定義

定義 wide and deep 模型是比較簡單的，tutorial 中提供了比較完整的模型構建例項：

獲取輸入

模型的輸入是一個 python 的 dataframe。如 tutorial 的例項程式碼，可以通過 pandas.read_csv 從 CSV 檔案中讀入資料構建 data frame。

定義 feature columns

tf.contrib.layers 中提供了一系列的函式定義不同型別的 feature columns：

tf.contrib.layers.sparse_column_with_XXX 構建低維離散特徵

sparse_feature_a = sparse_column_with_hash_bucket(…) 
sparse_feature_b = sparse_column_with_hash_bucket(…)

tf.contrib.layers.crossed_column 構建離散特徵的組合

sparse_feature_a_x_sparse_feature_b = crossed_column([sparse_feature_a, sparse_feature_b], …)

tf.contrib.layers.real_valued_column 構建連續型實數特徵

real_feature_a = real_valued_column(…)

tf.contrib.layers.embedding_column 構建 embedding 特徵

sparse_feature_a_emb = embedding_column(sparse_id_column=sparse_feature_a, )

定義模型

定義分類模型：

m = tf.contrib.learn.DNNLinearCombinedClassifier(
 n_classes = n_classes, // 分類數目
 weight_column_name = weight_column_name, // 訓練例項的權重
 model_dir = model_dir, // 模型目錄
 linear_feature_columns = wide_columns, // 輸入線性模型的 feature columns
 linear_optimizer = tf.train.FtrlOptimizer(...), // 線性模型權重更新的 optimizer
 dnn_feature_columns = deep_columns, // 輸入 DNN 模型的 feature columns
 dnn_hidden_units=[100, 50]，// DNN 模型的隱藏層單元數目
 dnn_optimizer=tf.train.AdagradOptimizer(...) // DNN 模型權重更新的 optimizer
 )

需要指出的是：模型的 model_dir 同下面會提到的 export 模型的目錄是 2 個不同的目錄，model_dir 存放模型的 graph 和 summary 資料，如果 model_dir 存放了上一次訓練的模型資料，訓練時會從 model_dir 恢復上一次訓練的模型並在此基礎上進行訓練。我們用 tensorboard 載入顯示的模型資料也是從該目錄下生成的。模型 export 的目錄則主要是用於 tensorflow server 啟動時載入模型的 servable 例項，用於線上預測服務。

如果要使用迴歸模型，可以如下定義：

m = tf.contrib.learn.DNNLinearCombinedRegressor(
 weight_column_name = weight_column_name,
 linear_feature_columns = wide_columns, 
 linear_optimizer = tf.train.FtrlOptimizer(...), 
 dnn_feature_columns = deep_columns, 
 dnn_hidden_units=[100, 50]，
 dnn_optimizer=tf.train.AdagradOptimizer(...)
 )

訓練評測

訓練模型可以使用 fit 函式：m.fit(input_fn=input_fn(df_train))，評測使用 evaluate 函式：m.evaluate(input_fn=input_fn(df_test))。Input_fn 函式定義如何從輸入的 dataframe 構建特徵和標記：

def input_fn(df)
 // tf.constant 構建 constant tensor，df[k].values 是對應 feature column 的值構成的 list
 continuous_cols = {k: tf.constant(df[k].values) for k in CONTINUOUS_COLUMNS}
 
 // tf.SparseTensor 構建 sparse tensor，SparseTensor 由 indices,values, dense_shape 三
 // 個 dense tensor 構成，indices 中記錄非零元素在 sparse tensor 的位置，values 是
 // indices 中每個位置的元素的值，dense_shape 指定 sparse tensor 中每個維度的大小
 // 以下程式碼為每個 category column 構建一個 [df[k].size，1] 的二維的 SparseTensor。
 categorical_cols = { 
 k: tf.SparseTensor( indices=[[i, 0] for i in range(df[k].size)],
 values=df[k].values,
 dense_shape=[df[k].size, 1])
 for k in CATEGORICAL_COLUMNS
 }
 // 可以用以下示意圖來表示以上程式碼構建的 sparse tensor

å¾çæè¿°

 // label 是一個 constant tensor，記錄每個例項的 label
 label = tf.constant(df[LABEL_COLUMN].values)
 
 // features 是 continuous_cols 和 categorical_cols 的 union 構成的 dict
 // dict 中每個 entry 的 key 是 feature column 的 name，value 是 feature column 值的 tensor
 return features, label

輸出

模型通過 export 輸出到一個指定目錄，tensorflow serving 從該目錄載入模型提供線上預測服務：

m.export(export_dir=export_dir,input_fn = export._default_input_fn 
use_deprecated_input_fn=True,signature_fn=signature_fn)

input_fn 函式定義生成模型 servable 例項的特徵，signature_fn 函式定義模型輸入輸出的 signature。
由於在 TensorFlow1.0 之後 export 已經 deprecate，需要用 export_savedmodel 來替代，所以本文就不對 export 進行更多講解，只在文末給出我們是如何使用它的，建議所有使用者以後切換到最新的 API。

模型詳解

wide and deep 模型是基於 TF.learn API 來實現的，其原始碼實現主要在 tensorflow.contrib.learn.python.learn.estimators 中。以分類模型為例，wide 與 deep 結合的分類模型對應的類是 DNNLinearCombinedClassifier，實現在原始檔 dnn_linear_combined.py。我們先看看 DNNLinearCombinedClassifier 的初始化函式的完整定義，看構造一個 wide and deep 模型可以輸入哪些引數：

def __init__(self, model_dir=None, n_classes=2, weight_column_name=None, linear_feature_columns=None,
 linear_optimizer=None, joint_linear_weights=False, dnn_feature_columns=None, 
 dnn_optimizer=None, dnn_hidden_units=None, dnn_activation_fn=nn.relu, dnn_dropout=None,
 gradient_clip_norm=None, enable_centered_bias=False, config=None,
 feature_engineering_fn=None, embedding_lr_multipliers=None):

我們可以將類的建構函式中的引數分為以下幾組

基礎引數

model_dir

我們訓練的模型存放到 model_dir 指定的目錄中。如果我們需要用 tensorboard 來 DEBUG 模型，將 tensorboard 的 logdir 指向該目錄即可：tensorboard –logdir=$model_dir

n_classes

分類數。預設是二分類，>2 則進行多分類。

weight_column_name

定義每個訓練樣本的權重。訓練時每個訓練樣本的訓練誤差乘以該樣本的權重然後用於權重更新梯度的計算。如果需要為每個樣本指定權重，input_fn 返回的 features 裡需要包含一個以 weight_column_name 為列名的列，該列的長度為訓練樣本的數目，列中每個元素對應一個樣本的權重，資料型別是 float，如以下虛擬碼：

weight = tf.constant(df[WEIGHT_COLUMN_NAME].values, dtype=float32);
features[weight_column_name] = weight

config

指定執行時配置引數

eature_engineering_fn

對輸入函式 input_fn 輸出的 (features, label) 進行後處理生成新的 (features』, label』) 然後輸入給模型訓練函式 model_fn 使用。

call_model_fn():
 feature, labels = self._feature_engineering_fn(feature, labels)

線性模型相關引數

linear_feature_columns

線性模型的輸入特徵

linear_optimizer

線性模型的優化函式，定義權重的梯度更新演算法，預設採用 FTRL。所有預設支援的 linear_optimizer 和 dnn_optimizer 可以在 optimizer.py 的 OPTIMIZER_CLS_NAMES 變數中找到相關定義。

join_linear_weights

按照程式碼中的註釋，如果 join_linear_weights= true，線性模型的權重會存放在一個 tf.Variable 中，可以加快訓練，但是 linear_feature_columns 中的特徵列必須都是 sparse feature column 並且每個 feature column 的 combiner 必須是“sum”。經過自己線下的對比試驗，對模型的預測能力似乎沒有太大影響，對訓練速度有所提升，最終訓練模型時我們保持了預設值。

DNN 模型相關引數

dnn_feature_columns

DNN 模型的輸入特徵

dnn_optimizer

DNN 模型的優化函式，定義各層權重的梯度更新演算法，預設採用 Adagrad。

dnn_hidden_units

每個隱藏層的神經元數目

dnn_activation_fn

隱藏層的啟用函式，預設採用 RELU

dnn_dropout

模型訓練中隱藏層單元的 drop_out 比例

gradient_clip_norm

定義 gradient clipping，對梯度的變化範圍做出限制，防止 gradient vanishing 或 gradient explosion。wide and deep 中預設採用 tf.clip_by_global_norm。

embedding_lr_multipliers

embedding_feature_column 到 float 的一個 mapping。對指定的 embedding feature column 在計算梯度時乘以一個常數因子，調整梯度的變化速率。

看完模型的建構函式後，我們大概知道 wide 和 deep 端的模型各對應什麼樣的模型，模型需要輸入什麼樣的引數。為了更深入瞭解模型，以下我們對 wide and deep 模型的相關程式碼進行了分析，力求解決如下疑問： (1) 分別用於線性模型和 DNN 模型訓練的特徵是如何定義的，其內部如何實現；(2) 訓練中線性模型和 DNN 模型如何進行聯合訓練，訓練誤差如何反饋給 wide 模型和 deep 模型？下面我們重點針對特徵和模型訓練這兩方面進行解讀。

特徵

wide and deep 模型訓練一般是以多個訓練樣本作為 1 個批次 (batch) 進行訓練，訓練樣本在行維度上定義，每一行對應一個訓練樣本例項，包括特徵（feature column），標註（label）以及權重（weight），如圖 2。特徵在列維度上定義，每個特徵對應 1 個 feature column，feature column 由在列維度上的 1 個或者若干個張量 (tensor) 組成，tensor 中的每個元素對應一個樣本在該 feature column 上某個維度的值。feature column 的定義在可以在原始碼的 feature_column.py 檔案中找到，對應類為_FeatureColumn，該類定義了基本介面，是 wide and deep 模型中所有特徵類的抽象父類。

å¾çæè¿°

圖 2 feature_column, label, weight 示意圖

wide and deep 模型中使用的特徵包括兩大類：一類是連續型特徵，主要用於 deep 模型的訓練，包括 real value 型別的特徵以及 embedding 型別的特徵等；一類是離散型特徵，主要用於 wide 模型的訓練，包括 sparse 型別的特徵以及 cross 型別的特徵等。以下是所有特徵的一個彙總圖

å¾çæè¿°

圖 3 wide and deep 模型特徵類圖

圖中類與類的關係除了 inherit（繼承）之外，同時我們也標出了特徵類之間的構成關係：_BucketizedColumn 由_RealValueColumn 通過對連續值域進行分桶構成，_CrossedColumn 由若干_SparseColumn 或者_BucketizedColumn 或者_CrossedColumn 經過交叉組合構成。圖中左邊部分特徵屬於離散型特徵，右邊部分特徵屬於連續型特徵。

我們在實際使用的時候，通常情況下是呼叫 TensorFlow 提供的介面來構建特徵的。以下是構建各類特徵的介面：

sparse_column_with_integerized_feature() --> _SparseColumnIntegerized
 
sparse_column_with_hash_bucket() --> _SparseColumnHashed
 
sparse_column_with_keys() --> _SparseColumnKeys
 
sparse_column_with_vocabulary_file() --> _SparseColumnVocabulary
 
weighted_sparse_column() --> _WeightedSparseColumn
 
one_hot_column() --> _OneHotColumn
 
embedding_column() --> _EmbeddingColumn
 
shared_embedding_columns() --> List[_EmbeddingColumn]
 
scattered_embedding_column() --> _ScatteredEmbeddingColumn
 
real_valued_column() --> _RealValuedColumn
 
bucketized_column() -->_BucketizedColumn
 
crossed_column() --> _CrossedColumn

FeatureColumn 為模型訓練定義了幾個基本介面用於提取和轉換特徵，在後面講解具體 feature 時會有具體描述：

def insert_transformed_feature(self, columns_to_tensors):

“”“Apply transformation and inserts it into columns_to_tensors.
FeatureColumn 的特徵輸出和轉換函式。columns_to_tensor 是 FeatureColumn 到 tensors 的對映。

def _to_dnn_input_layer(self, input_tensor, weight_collection=None, trainable=True, output_rank=2):

“”“Returns a Tensor as an input to the first layer of neural network.”“”
構建 DNN 的 float tensor 輸入，參見後面對 RealValuedColumn 的講解。

def _deep_embedding_lookup_arguments(self, input_tensor):

“”“Returns arguments to embedding lookup to build an input layer.”“”
構建 DNN 的 embedding 輸入，參見後面對 EmbeddingColumn 的講解。

def _wide_embedding_lookup_arguments(self, input_tensor):

“”“Returns arguments to look up embeddings for this column.”“”
構建線性模型的輸入，參見後面對 SparseColumn 的講解。

我們從離散型的特徵（sparse 特徵）開始分析。離散型特徵可以看做由若干鍵值構成的特徵，比如使用者的性別。在實際實現中，每一個鍵值在 sparse column 內部對應一個整數 id。離散特徵的基類是_SparseColumn：

class _SparseColumn(_FeatureColumn,
 collections.namedtuple("_SparseColumn",
 ["column_name", "is_integerized",
 "bucket_size", "lookup_config",
 "combiner", "dtype"])):

collections.namedtuple 中的字串陣列是_SparseColumn 從對應的建立介面函式中接收的輸入引數的名稱。

def __new__(cls,
 column_name,
 is_integerized=False,
 bucket_size=None,
 lookup_config=None,
 combiner="sum",
 dtype=dtypes.string):

SparseFeature 是如何存放這些離散取值的呢？這個跟 bucket_size 和 lookup_config 這兩個引數相關。在實際定義中，有且只定義其中一個引數。通過使用哪一個引數我們可以把 sparse feature 分成兩類，定義 lookup_config 引數的特徵使用一個 in memory 的字典儲存 feature 的所有取值，包括後面會講到的_SparseColumnKeys，_SparseColumnVocabulary；定義 bucket_size 引數的特徵使用一個雜湊表來儲存特徵值，特徵值通過雜湊函式雜湊到各個桶，包括_SparseColumnHashed 和_SparseColumnIntegerized(is_integerized = True)。

dtype 指定特徵值的型別，除了字串型別 (dtypes.string）之外，spare feature column 還支援 64 位整數型別（dtypes.int64），預設我們認為輸入的離散特徵是字串，如果我們定義了 is_integerized = True，那麼我們認為特徵是一個整型的 id 型特徵，我們可以直接用特徵的取值作為特徵的 id，而不需要建立一個專門的對映。

combiner 引數對應的是樣本維度特徵的歸一化，如果特徵列在單個樣本上有多個取值，combiner 引數指定如何對單個樣本上特徵的多個取值進行歸一化。原始碼註釋中是這樣寫的：「combiner： A string specifying how to reduce if the sparse column is multivalent」，multivalent 的具體含義在 crossed feature column 的定義中有一個稍微清楚的解釋（combiner: A string specifying how to reduce if there are multiple entries in a single row）。combiner 可以指定 3 種歸一化方式：sum 對應無歸一化，sqrtn 對應 L2 歸一化，mean 對應 L1 歸一化。通常情況下采用 L2 歸一化，模型的準確度相對會更高。

SparseColumn 不能直接作為 DNN 的輸入，它只能用於直接構建線性模型的輸入：

def _wide_embedding_lookup_arguments(self, input_tensor):
 return _LinearEmbeddingLookupArguments( input_tensor=self.id_tensor(input_tensor),
 weight_tensor=self.weight_tensor(input_tensor),
 vocab_size=self.length,
 initializer=init_ops.zeros_initializer(),
 combiner=self.combiner)

_LinearEmbeddingLookupArguments 是一個 namedtuple（A new subclass of tuple with named fields）。input_tensor 是訓練樣本集中特徵的 id 構成的陣列，weight_tensor 中每個元素對應一個樣本中該特徵的權重，vocab_size 是特徵取值的個數，intiializer 是特徵初始化的函式，預設初始化為 0。

不過看原始碼中_SparseColumn 及其子類並沒有使用特徵權重：

 def weight_tensor(self, input_tensor):
 """Returns the weight tensor from the given transformed input_tensor."""
 return None

如果需要為_SparseColumn 的特徵賦予權重，可以使用_WeightedSparseColumn，構造介面函式為 weighted_sparse_column（Create a _SparseColumn by combing sparse_id_column and weight_column）

class _WeightedSparseColumn(_FeatureColumn, collections.namedtuple(
 "_WeightedSparseColumn",["sparse_id_column", "weight_column_name", "dtype"])):
 
 def __new__(cls, sparse_id_column, weight_column_name, dtype):
 return super(_WeightedSparseColumn, cls).__new__(cls, sparse_id_column, weight_column_name, dtype)

_WeightedSparseColumn 需要 3 個引數：sparse_id_column 對應 sparse feature column，是_SparseColumn 型別的物件，weight_column_name 為輸入中對應 sparse_id_column 的 weight column（input_fn 返回的 features dict 中需要有一個 weight_column_name 的 tensor）dtype 是 weight column 中每個元素的資料型別。這裡有幾個隱含要求：

（1）dtype 需要能夠轉換成浮點數型別，否則會拋 TypeError；
（2）weight_column_name 對應的 weight column 可以是一個 SparseTensor，也可以是一個常規的 dense tensor，程式會將 dense tensor 轉換成 SparseTensor，但是要求 weight column 最終對應的 SparseTensor 與 sparse_id_column 的 SparseTensor 有相同的索引 (indices) 和維度 (dense_shape)。

_WeightedSparseColumn 輸出特徵的 id tensor 和 weight tensor 的函式如下：

def insert_transformed_feature(self, columns_to_tensors):
 """Inserts a tuple with the id and weight tensors."""
 if self.sparse_id_column not in columns_to_tensors:
 self.sparse_id_column.insert_transformed_feature(columns_to_tensors)
 
 weight_tensor = columns_to_tensors[self.weight_column_name]
 if not isinstance(weight_tensor, sparse_tensor_py.SparseTensor):
 # The weight tensor can be a regular Tensor. In such case, sparsify it.
 // 我們輸入的 weight tensor 可以是一個常規的 Tensor，如通過 tf.Constants 構建的 tensor，
 // 這種情況下，會呼叫 dense_to_sparse_tensor 將 weight_tensor 轉換成 SparseTensor。
 weight_tensor = contrib_sparse_ops.dense_to_sparse_tensor(weight_tensor)
 
 // 最終使用的 weight_tensor 的資料型別是 float
 if not self.dtype.is_floating:
 weight_tensor = math_ops.to_float(weight_tensor)
 
 // 返回中對應該 WeightedSparseColumn 的一個二元組，二元組的第一個元素是 SparseFeatureColumn 呼叫 
 // insert_transformed_feature 後的 id_tensor，第二個元素是 weight tensor。
 columns_to_tensors[self] = tuple([columns_to_tensors[self.sparse_id_column],weight_tensor])
 
def id_tensor(self, input_tensor):
 """Returns the id tensor from the given transformed input_tensor."""
 return input_tensor[0]
 
def weight_tensor(self, input_tensor):
 """Returns the weight tensor from the given transformed input_tensor."""
 return input_tensor[1]

（1）sparse column from keys

這個是最簡單的離散特徵，類比於列舉型別，一般用於列舉的值不是太多的情況。建立基於 keys 的 sparse 特徵的介面是 sparse_column_with_keys(column_name, keys, default_value=-1, combiner=None)，對應類是 SparseColumnKeys，建構函式為：

def __new__(cls, column_name, keys, default_value=-1, combiner="sum"):
 return super(_SparseColumnKeys, cls).__new__(cls, column_name, combiner=combiner,
 lookup_config=_SparseIdLookupConfig(keys=keys, vocab_size=len(keys),
 default_value=default_value), dtype=dtypes.string)

keys 為一個字串列表，定義了所有的列舉值。構造特徵輸入的 keys 最後儲存在 lookup_config 裡面，每個 key 的型別是 string，並且對應 1 個 id，id 是該 key 在輸入的 keys 陣列中的下標。在模型實際訓練中使用的是每個 key 對應的 id。

SparseColumnKeys 輸入到模型前需要將列舉值的 key 轉換到相應的 id，這個轉換工作在函式 insert_transformed_feature 中實現：

def insert_transformed_feature(self, columns_to_tensors):
 """Handles sparse column to id conversion."""
 input_tensor = self._get_input_sparse_tensor(columns_to_tensors)
 """"Returns a lookup table that converts a string tensor into int64 IDs.This operation constructs a lookup table 
 to convert tensor of strings into int64 IDs. The mapping can be initialized from a string `mapping` 1-D 
 tensor where each element is a key and corresponding index within the tensor is the
 value.
 """
 table = lookup.index_table_from_tensor(mapping=tuple(self.lookup_config.keys),
 default_value=self.lookup_config.default_value, dtype=self.dtype, name="lookup")
 columns_to_tensors[self] = table.lookup(input_tensor)

（2）sparse column from vocabulary file

sparse column with keys 一般列舉都能滿足，如果列舉的值多了就不合適了，所以提供了一個從檔案載入列舉變數的介面：

sparse_column_with_vocabulary_file((column_name, vocabulary_file, num_oov_buckets=0, vocab_size=None,
default_value=-1, combiner="sum",dtype=dtypes.string)

對應的建構函式為：

def __new__(cls, column_name, vocabulary_file, num_oov_buckets=0, vocab_size=None, default_value=-1,
 combiner="sum", dtype=dtypes.string):

那麼從檔案中讀入的特徵值是存哪裡呢？看看這個建構函式最後返回的類例項：

return super(_SparseColumnVocabulary, cls).__new__(cls, column_name,combiner=combiner,
lookup_config=_SparseIdLookupConfig(vocabulary_file=vocabulary_file,num_oov_buckets=num_oov_buckets,
vocab_size=vocab_size,default_value=default_value), dtype=dtype)

如同_SparseColumnKeys，這個特徵也使用了_SparseIdLookupConfig 來儲存特徵值，vocabulary_file 指向定義列舉值的檔案，vocabulary_file 每一行對應一個列舉值，每個列舉值的 id 是該列舉值所在行號（注意，行號是從 0 開始的），vocab_size 定義列舉值的個數。_SparseIdLookupConfig 從特徵檔案中構建一個特徵值到 id 的雜湊表，我們看看 SparseColumnVocabulary 進行 vocabulary 到 id 的轉換時如何使用_SparseIdLookupConfig 物件。

def insert_transformed_feature(self, columns_to_tensors):
 """Handles sparse column to id conversion."""
 st = self._get_input_sparse_tensor(columns_to_tensors)
 if self.dtype.is_integer:
 // 輸入的整數數值型特徵轉換成字串形式
 sparse_string_values = string_ops.as_string(st.values)
 sparse_string_tensor = sparse_tensor_py.SparseTensor(st.indices,sparse_string_values, st.dense_shape)
 else:
 sparse_string_tensor = st
 
 """Returns a lookup table that converts a string tensor into int64 IDs.This operation constructs a lookup table 
 to convert tensor of strings into int64 IDs. The mapping can be initialized from a vocabulary file specified in
 `vocabulary_file`, where the whole line is the key and the zero-based line number is the ID.
 table = lookup.index_table_from_file(vocabulary_file=self.lookup_config.vocabulary_file, 
 num_oov_buckets=self.lookup_config.num_oov_buckets,vocab_size=self.lookup_config.vocab_size,
 default_value=self.lookup_config.default_value, name=self.name + "_lookup")
 columns_to_tensors[self] = table.lookup(sparse_string_tensor)

index_table_from_file 函式從 lookup_config 的字典檔案中構建 table。Table 變數是一個 string 到 int64 的 HashTable，如果定義了 num_oov_buckets，table 是 IdTableWithHashBuckets 物件（a string to id wrapper that assigns out-of-vocabulary keys to buckets）。

（3）sparse column with hash bucket

如果沒有 vocab 檔案定義列舉特徵，我們可以使用 hash bucket 特徵，使用該特徵的介面是
sparse_column_with_hash_bucket(column_name, hash_bucket_size, combiner=None,dtype=dtypes.string)
對應類_SparseColumnHashed 的建構函式為：def new(cls, column_name, hash_bucket_size, combiner=”sum”, dtype=dtypes.string):

ash_bucket_size 定義雜湊桶的個數，用於雜湊值取模。dtype 支援整數和字串。實際計算雜湊值的時候是將整數轉換成對應的字串表示形式，用字串計算雜湊值然後取模，轉換後的特徵值是 0 到 hash_bucket_size 的一個整數。

def insert_transformed_feature(self, columns_to_tensors):
 """Handles sparse column to id conversion."""
 input_tensor = self._get_input_sparse_tensor(columns_to_tensors)
 if self.dtype.is_integer:
 // 整數型別的輸入轉換成字串型別
 sparse_values = string_ops.as_string(input_tensor.values)
 else:
 sparse_values = input_tensor.values
 
 sparse_id_values = string_ops.string_to_hash_bucket_fast(sparse_values, self.bucket_size, name="lookup")
 
 // Sparse 特徵的雜湊值作為特徵值對應的 id 返回
 columns_to_tensors[self] = sparse_tensor_py.SparseTensor(input_tensor.indices, sparse_id_values,
 input_tensor.dense_shape)

（4）integerized sparse column

hash bucket 的 sparse 特徵取雜湊值的時候是將整數看做字串處理的，如果我們希望用整數本身的數值作為雜湊值，可以使用_SparseColumnIntegerized，對應的介面是

sparse_column_with_integerized_feature：
 def sparse_column_with_integerized_feature(column_name,hash_bucket_size,combiner="sum",
 dtype=dtypes.int64)
對應的類是_SparseColumnIntegerized： 
def __new__(cls, column_name, bucket_size, combiner="sum", dtype=dtypes.int64)
特徵的轉換函式定義：
def insert_transformed_feature(self, columns_to_tensors):
 """Handles sparse column to id conversion."""
 input_tensor = self._get_input_sparse_tensor(columns_to_tensors)
 
 // 直接對特徵值取模，取模後的值作為特徵值的 id
 sparse_id_values = math_ops.mod(input_tensor.values, self.bucket_size, name="mod")
 columns_to_tensors[self] = sparse_tensor_py.SparseTensor( input_tensor.indices, sparse_id_values, 
 input_tensor.dense_shape)

（5）crossed column

Crossed column 支援 1 個以上的離散型 feature column 進行笛卡爾積，組成高維度的交叉特徵。特徵之間進行交叉，可以將特徵之間的相關性引入模型，增強模型的表達能力。crossed column 僅支援以下 3 種離散特徵的交叉組合： _SparsedColumn, _BucketizedColumn 和_CrossedColumn，其介面定義為：

def crossed_column(columns,hash_bucket_size, combiner=」sum」,ckpt_to_load_from=None,
 tensor_name_in_ckpt=None, hash_key=None)

對應類為_CrossedColumn：

def __new__(cls, columns,hash_bucket_size,hash_key, combiner="sum",ckpt_to_load_from=None, 
 tensor_name_in_ckpt=None):

columns 對應一個 feature column 的集合，如 tutorial 中的例子：[age_buckets, education, occupation]；hash_bucket_size 引數指定 hash bucket 的桶個數，特徵交叉的組合個數越多，hash_bucket_size 也應相應增加，從而減小雜湊衝突。

交叉特徵生成模型輸入的邏輯可以分為如下兩步：

def insert_transformed_feature(self, columns_to_tensors):
 """Handles cross transformation."""
 def _collect_leaf_level_columns(cross):
 """Collects base columns contained in the cross."""
 leaf_level_columns = []
 for c in cross.columns:
 // 對 CrossedColumn 型別的 feature column 進行遞迴展開
 if isinstance(c, _CrossedColumn):
 leaf_level_columns.extend(_collect_leaf_level_columns(c))
 else:
 // SparseColumn 和 BucketizedColumn 作為葉子節點
 leaf_level_columns.append(c)
 return leaf_level_columns
 
 // 步驟 1： 將 crossed column 中的所有特徵進行遞迴展開，展開後的特徵值存放在 feature_tensors 陣列中
 
 feature_tensors = []
 for c in _collect_leaf_level_columns(self):
 if isinstance(c, _SparseColumn):
 feature_tensors.append(columns_to_tensors[c.name])
 else:
 if c not in columns_to_tensors:
 c.insert_transformed_feature(columns_to_tensors)
 if isinstance(c, _BucketizedColumn):
 feature_tensors.append(c.to_sparse_tensor(columns_to_tensors[c]))
 else:
 feature_tensors.append(columns_to_tensors[c])
 
// 步驟 2: 生成 cross feature 的 tensor，sparse_feature_cross 通過動態庫呼叫 SparseFeatureCross 函式，函式接
//口可參見 sparse_feature_cross_op.cc
 columns_to_tensors[self] = sparse_feature_cross_op.sparse_feature_cross(feature_tensors, 
 hashed_output=True,num_buckets=self.hash_bucket_size,hash_key=self.hash_key, name="cross")

在原始碼該部分的註釋中有一個例子說明 feature column 進行 cross 後的效果，我們用 1 個圖來將這部分註釋展示的更明確點：

å¾çæè¿°

圖 4 feature column 進行 cross 後的效果圖

需要指出的一點是：交叉特徵是沒有權重定義的。

對離散特徵進行交叉組合在預測模型中使用比較廣泛，但是該類特徵的一個侷限性是它對訓練資料中沒有見過的特徵組合泛化能力有限，後面我們談到的 embedding column 則是通過構建離散特徵的低維向量表示，強化離散特徵的泛化能力。

（6）real valued column

real valued feature column 對應連續型數值特徵，介面為

real_valued_column(column_name, dimension=1, default_value=None, dtype=dtypes.float32,normalizer=None):

對應類為_RealValuedColumn：

_RealValuedColumn(column_name, dimension, default_value, dtype,normalizer)

dimension 指定 feature column 的維度，預設值為 1，即 1 維浮點數陣列。dimension 也可以取大於 1 的整數，對應多維陣列。rea valued column 的特徵取值型別可以是 float32 或者 int，int 型別在輸入到模型之前會轉換成 float 型別。normalizer 定義在一批訓練樣本例項中，特徵在列維度的歸一化，相當於 column-level normalization。這個同 sparse feature column 的 combiner 不同，combiner 定義的是離散特徵在單個樣本維度的歸一化（example-level normalization），以下示意圖舉了個例子來說明兩者的區別：

å¾çæè¿°

圖 5 combiner 與 normalizer 的區別

normalizer 在 real valued feature column 輸入 DNN 時呼叫：

def insert_transformed_feature(self, columns_to_tensors):
 # Transform the input tensor according to the normalizer function.
 // _normalized_input_tensor 呼叫的是構造 real valued colum 時傳入的 normalizer 函式
 input_tensor = self._normalized_input_tensor(columns_to_tensors[self.name])
 columns_to_tensors[self] = math_ops.to_float(input_tensor)

real valued column 呼叫_to_dnn_input_layer 轉換為 DNN 的輸入。_to_dnn_input_layer 生成一個二維陣列，陣列的每一行是一個訓練樣本的 real valued column 的特徵值，該特徵值與其他連續型特徵拼接後構成 DNN 的輸入層。

def _to_dnn_input_layer(self,input_tensor,weight_collections=None,trainable=True,output_rank=2):
 // DNN 的輸入必須是 dense tensor，sparse tensor 需要呼叫 to_dense_tensor 轉換成 dense tensor
 input_tensor = self._to_dense_tensor(input_tensor)
 if input_tensor.dtype != dtypes.float32:
 input_tensor = math_ops.to_float(input_tensor)
 
 // 呼叫 dense_inner_flatten(input_tensor, output_rank)。
 // output_rank = 2，輸出 [batch_size, real value column』s input dimension]
 return _reshape_real_valued_tensor(input_tensor, output_rank, self.name)
 
def _to_dense_tensor(self, input_tensor):
 if isinstance(input_tensor, sparse_tensor_py.SparseTensor):
 default_value = (self.default_value[0] if self.default_value is not None else 0)
 // Sparse tensor 轉換成 dense tensor
 return sparse_ops.sparse_tensor_to_dense(input_tensor, default_value=default_value)
 // real valued column 直接返回 input tensor
 return input_tensor

（7）bucketized column

連續型特徵通過 bucketization 生成離散特徵，連續特徵離散化的優點在網上有一些相關討論，比如餐館的距離對使用者選擇的影響，我們通常會將距離劃分為若干個區間，如 100 米以內，1 公里以內等，這樣小幅度的距離差異不會對我們最終模型的預測造成太大影響，除非距離差異跨域了區間邊界。bucketized column 的介面定義為：def bucketized_column(source_column, boundaries) 對應類為_BucketizedColumn，建構函式定義：def new(cls, source_column, boundaries):source_column 必須是 real_valued_column，boundaries 是一個浮點數的列表，而且列表必須是遞增序的，比如 boundaries = [0, 100, 200] 定義了以下一組區間：（-INF，0），[0，100），[100，200），[200, INF)。

def insert_transformed_feature(self, columns_to_tensors):
 # Bucketize the source column.
 if self.source_column not in columns_to_tensors:
 self.source_column.insert_transformed_feature(columns_to_tensors)
 columns_to_tensors[self] = bucketization_op.bucketize(columns_to_tensors[self.source_column],
 boundaries=list(self.boundaries), name="bucketize")

bucketize 函式呼叫 tensorflow c++ core library 中的 BucketizeOp 類完成 feature 的 bucketization 功能。

（8）embedding column

sparse feature column 通過 embedding 轉換成連續型向量後可以作為 deep model 的輸入，前面談到了 cross column 的一個不足之處是在測試集合的泛化能力，通過 embedding column 將離散特徵連續化，根據標註學習特徵的向量形式，如同矩陣分解中學習物品的隱含因子向量或者詞向量模型中單詞的詞向量。embedding column 的介面形式是：

def embedding_column(sparse_id_column, dimension, combiner=None, initializer=None, 
 ckpt_to_load_from=None,tensor_name_in_ckpt=None, max_norm=None, trainable=True)

對應類為_EmbeddingColumn：

def __new__(cls,sparse_id_column,dimension,combiner="mean",initializer=None, ckpt_to_load_from=None,
 tensor_name_in_ckpt=None,shared_embedding_name=None, shared_vocab_size=None,max_norm=None,
 trainable = True):

sparse_id_column 是 SparseColumn 物件或者 WeightedSparseColumn 物件，dimension 是 embedding column 的向量維度。SparseColumn 的每個特徵取值對應一個整數 id，該整數 id 在 embedding column 中對應一個 dimension 維度的浮點數向量。combiner 引數指定在單個樣本上對特徵向量歸一化的方式，initializer 引數指定特徵向量的初始化函式，預設按 truncated normal distribution 初始化 (mean = 0, stddev = 1/ sqrt(length of sparse id column))。max_norm 限定每個樣本特徵向量做 L2 歸一化後的最大值：embedding_vector = embedding_vector * max_norm / L2_norm(embedding_vector)。

為了進一步理解 embedding column，我們可以畫一個簡易圖：

å¾çæè¿°

圖 6 embedding feature column 示意圖

如上圖，以 sparse_column_with_keys(column_name = 『gender』, keys = [『female』, 『male』]) 為例，假設 female 對應 id = 0, male 對應 id = 1，每個 id 在 embedding feature 中對應 1 個 6 維的浮點數向量。在實際訓練資料中，當 gender 特徵取值為』female』時，給到 DNN 輸入層的將是 id = 0 對應的向量（tf.embedding_lookup_sparse）。embedding_column 設定了一個 trainable 引數，指定是否根據模型訓練誤差更新特徵對應的 embedding。

embedding 特徵的變換函式：

def insert_transformed_feature(self, columns_to_tensors):
 if self.sparse_id_column not in columns_to_tensors:
 self.sparse_id_column.insert_transformed_feature(columns_to_tensors)
 columns_to_tensors[self] = columns_to_tensors[self.sparse_id_column]
 
def _deep_embedding_lookup_arguments(self, input_tensor):
 return _DeepEmbeddingLookupArguments(
 input_tensor=self.sparse_id_column.id_tensor(input_tensor),
 // sparse_id_column 為_SparseColumn 型別的物件時，weight_tensor = None
 // sparse_id_column 為_WeightedSparseColumn 型別物件時，weight_tensor = WeihgtedSparseColumn 的
 // weight tensor，weight_tensor 須滿足：
 // 1）weight_tensor.indices = input_tensor.indices
 // 2）weight_tensor.shape = input_tensor.shape
 weight_tensor=self.sparse_id_column.weight_tensor(input_tensor),
 // sparse feature column 的元素個數
 vocab_size=self.length,
 // embedding 的維度
 dimension=self.dimension,
 // embedding 的初始化函式
 initializer=self.initializer,
 // embedding 的行歸一化方法
 combiner=self.combiner,
 shared_embedding_name=self.shared_embedding_name,
 hash_key=None,
 max_norm=self.max_norm,
 trainable=self.trainable)

從_DeepEmbeddingLookupArguments 產生 sparse feature 的 embedding 的邏輯在函式_embeddings_from_arguments 實現:

def _embeddings_from_arguments(column, args, weight_collections,trainable, output_rank=2):
 // column 對應 embedding feature column 的 name，args 是 feature column 對應的
 // _DeepEmbeddingLookupArguments 物件，weight_collections 儲存 embedding 的權重，
 // output_rank 指定輸出 embedding 的 tensor 的 rank。
 
 input_tensor = layers._inner_flatten(args.input_tensor, output_rank)
 weight_tensor = layers._inner_flatten(args.weight_tensor, output_rank)
 
 // 考慮預設情況下構建 embedding: args.hash_key is None, args.shared_embedding_name is None
 
 // 獲取或建立 embedding 的 model variable
 // embeddings 是 [number of sparse feature id, embedding dimension] 的浮點數二維陣列
 // 每行對應一個 sparse feature id 的 embedding
 embeddings = contrib_variables.model_variable( name='weights'，shape=[args.vocab_size, 
 args.dimension], dtype=dtypes.float32,initializer=args.initializer,
 // If trainable, embedding vector 作為一個 model variable 新增到 GraphKeys.TRAINABLE_VARIABLES 
 trainable=(trainable and args.trainable),
 collections=weight_collections // weight_collections 儲存每個 feature id 的 weight
 )
 
 // 獲取每個 sparse feature id 的 embedding
 return embedding_ops.safe_embedding_lookup_sparse(embeddings, input_tensor,
 sparse_weights=weight_tensor, combiner=args.combiner, name=column.name + 'weights',
 max_norm=args.max_norm)

safe_embedding_lookup_sparse 呼叫 tf.embedding_lookup_sparse 獲取每個 sparse feature id 的 embedding。
tf.embedding_lookup_sparse 首先呼叫 tf.embedding_lookup 獲取 sparse feature id 的 embedding vector:

// sp_ids 是 input_tensor 的 id tensor
ids = sp_ids.values
 
embeddings = embedding_lookup (
 // params 對應 embeddings 矩陣，每個元素是 embedding_dimension 的 float tensor，可以將 params 看
 // 做一個 embedding tensor 的 partitions，partition 的策略由 partition_strategy 指定
 params, 
 // ids 對應 input_tensor 的 values 陣列
 ids,
 // id 分配到 params 的分配策略，有 mod 和 div 兩種，預設 mod，具體定義可參見 tf.embedding_lookup 的說明
 partition_strategy=partition_strategy, 
 // 限制 embedding 的最大 L2-Norm
 max_norm=max_norm
 )

如果 sparse_weights 不是 None，embedding 的值乘以 weights，
weights = sparse_weights.values
embeddings *= weights

根據 combiner，對 embedding 進行歸一化

segment_id = sp_ids.indices[;0]
 if combiner == "sum":
 // No normalization
 embeddings = math_ops.segment_sum(embeddings, segment_ids, name=name)
 elif combiner == "mean":
 // L1 normlization: embeddings = SUM(embeddings * weight) / SUM(weight)
 embeddings = math_ops.segment_sum(embeddings, segment_ids)
 weight_sum = math_ops.segment_sum(weights, segment_ids)
 embeddings = math_ops.div(embeddings, weight_sum, name=name)
 elif combiner == "sqrtn":
 // L2 normalization: embeddings = SUM(embeddings * weight^2) / SQRT(SUM(weight^2))
 embeddings = math_ops.segment_sum(embeddings, segment_ids)
 weights_squared = math_ops.pow(weights, 2)
 weight_sum = math_ops.segment_sum(weights_squared, segment_ids)
 weight_sum_sqrt = math_ops.sqrt(weight_sum)
 embeddings = math_ops.div(embeddings, weight_sum_sqrt, name=name)

（9）其他 feature columns

除了以上列舉的幾個 feature column，TensorFlow 還支援 one hot column，shared embedding column 和 scattered embedding column。one hot column 對 sparse feature column 進行 one-hot 編碼，如果離散特徵的取值較少，可以用 one hot feature column 進行編碼用於 DNN 的訓練。不同於 embedding column，one hot feature column 不支援通過模型訓練來更新其特徵的 embedding。shared embedding column 和 scattered embedding column 由於篇幅原因就不多談了。

TensorFlow Wide And Deep 模型詳解與應用

Wide And Deep 模型定義

獲取輸入

定義 feature columns

定義模型

訓練評測

輸出

模型詳解

基礎引數

線性模型相關引數

DNN 模型相關引數

特徵

（1）sparse column from keys

（2）sparse column from vocabulary file

（3）sparse column with hash bucket

（4）integerized sparse column

（5）crossed column

（6）real valued column

（7）bucketized column

（8）embedding column

（9）其他 feature columns

相關推薦