使用者畫像（2）使用keras框架搭建神經網路模型

阿新 • • 發佈：2018-12-10

import pickle import pandas as pd import tensorflow as tf import numpy as np import matplotlib.pyplot as plt from keras import backend as K from keras.models import Model from keras.layers import Dense, Input, Dropout, LSTM, BatchNormalization from keras.layers.embeddings import Embedding from keras.callbacks import ModelCheckpoint from sklearn.preprocessing import OneHotEncoder,LabelEncoder

train = pd.read_csv("…/data/train.csv")

datafile = open(’./data_sikuquanshus.pkl’, ‘rb’) print(“load data by pkl”)

text of train

X_train = pickle.load(datafile) X_test=pickle.load(datafile)

text of test

Y_train = pickle.load(datafile) Y_test=pickle.load(datafile) word_to_index = pickle.load(datafile) index_to_word = pickle.load(datafile) word_to_vec_map = pickle.load(datafile) datafile.close()

#################

我們使用的預訓練的 word embedding 是 40 萬個單詞的訓練結果，它們的特徵維數是 50

def pretrained_embedding_layer(word_to_vec_map, word_to_index): “”" 建立一個 Keras 的 Embedding() 層，並且載入之前已經訓練好的 embedding “”" print(“訓練embeddings”) # 詞典中單詞的個數+1，+1是 keras 模型的訓練要求，沒有什麼其他含義 vocab_len = len(word_to_index) + 1 # 獲取單詞的特徵維數，隨便找個單詞就行了 emb_dim = word_to_vec_map[“錯”].shape[0]

# 將 embedding 矩陣初始化為全 0 的，大小為 (vocab_len, emb_dim)
emb_matrix = np.zeros((vocab_len, emb_dim))  #(19529 ,300)
# 將 emb_matrix 的行號當做單詞的編號，然後將這個單詞的 embedding 放到這一行，這樣就把預訓練的 embedding 載入進來了
# 注意，由於單詞編號是從 1 開始的，所以行 0 是沒有 embedding 的，這就是為什麼前面要 +1
for word, index in word_to_index.items():  #(19529 ,300)
    emb_matrix[index, :] = word_to_vec_map[word]

# 建立 Keras 的Embedding 層
embedding_layer = Embedding(input_dim=vocab_len, output_dim=emb_dim, trainable=True)

# build embedding layer，在設定 embedding layer 的權重的時候，這一步是必須的
embedding_layer.build((None,))

# 將 emb_matrix 設定為 embedding_layer 的權重。
# 到這裡為止我們就建立了一個預訓練好的 embedding layer
embedding_layer.set_weights([emb_matrix])

return embedding_layer

其他所有的分類模型可以基於這個函式進行建立

def mother_model(input_shape, word_to_vec_map, word_to_index): “”" 返回：一個 Keras 的模型引數: input_shape – MAX_COMMENT_TEXT_SEQ word_to_vec_map word_to_index “”" # 建立輸入層，輸入的是句子的單詞編號列表 sentence_indices = Input(shape=input_shape, dtype=np.int32) #(?,1000) # 建立 word embedding 層 embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)

# 句子編號列表進入 embedding_layer 之後會返回對應的 embeddings
embeddings = embedding_layer(sentence_indices)  #(?,1000,300)
dr_r = 0.5

X = BatchNormalization()(embeddings)
X = LSTM(128, return_sequences=True)(X)   #輸出的hidden state 包含全部時間步的結果。
X = Dropout(dr_r)(X)
X = BatchNormalization()(X)
X, _, __ = LSTM(128, return_state=True)(X)  #都是最後一個時間步的 hidden state。 state_c 是最後一個時間步 cell state結果。
X = Dropout(dr_r)(X)

X = BatchNormalization()(X)
X = Dense(64, activation='relu')(X)

X = Dropout(dr_r)(X)

X = BatchNormalization()(X)
X = Dense(6, activation='sigmoid')(X)

model = Model(inputs=sentence_indices, outputs=X)

return model

MAX_COMMENT_TEXT_SEQ = 1000 toxic_model = mother_model((MAX_COMMENT_TEXT_SEQ,), word_to_vec_map, word_to_index) toxic_model.compile(loss=‘binary_crossentropy’, optimizer=‘adam’, metrics=[‘accuracy’])

model_dir = ‘./models’ filepath = model_dir + ‘/model-{epoch:02d}.h5’ checkpoint = ModelCheckpoint(filepath,monitor=‘val_loss’,save_best_only=True, verbose=1) callbacks_list = [checkpoint] train_result = toxic_model.fit(X_train, Y_train, epochs=1, batch_size=1000, validation_split=0.07, callbacks = callbacks_list) loss,accuracy=toxic_model.evaluate(X_test,Y_test) print(“loss”,loss) print(“accuracy”,accuracy)

使用者畫像（2）使用keras框架搭建神經網路模型

train = pd.read_csv("…/data/train.csv")

text of train

text of test

我們使用的預訓練的 word embedding 是 40 萬個單詞的訓練結果，它們的特徵維數是 50

其他所有的分類模型可以基於這個函式進行建立

使用者畫像（2）使用keras框架搭建神經網路模型

TensorFlow 深度學習框架（2）-- 反向傳播優化神經網路

Keras深度學習框架學習筆記系列（2）- Keras的安裝與配置

Django（2）django框架的搭建

基於.net EF6 MVC5+WEB Api 的Web系統框架總結（2）-業務專案搭建

預約系統（二） MVC框架搭建

微信公眾平臺開發教程（三）基礎框架搭建

多研究些架構，少談些框架（ 2 ）：微服務和充血模型

crm銷售管理系統（一）SSM框架搭建

Onvif 學習筆記（3）Onvif框架搭建

深度學習筆記（2）--slim框架

JavaEE（SSM教材練習） BOOT客戶管理系統（二）——ssm框架搭建

大資料實時計算Spark學習筆記（2）—— Spak 叢集搭建

（初稿）SQL Server 複製（Replication）系列（2）——事務複製搭建

（更新視訊教程）Tensorflow object detection API 搭建屬於自己的物體識別模型（2）——訓練並使用自己的模型

ROS探索（2）——模擬器的搭建

vue.js搭建使用者管理系統練手（二）----頁面框架搭建

Tensorflow 框架搭建神經網路（四）

設計模式學習筆記（3）：觀察者模式（2）

使用IDEA進行struts2+Spring+mybatis+maven框架整合（二）struts框架搭建

使用者畫像（2）使用keras框架搭建神經網路模型

train = pd.read_csv("…/data/train.csv")

text of train

text of test

我們使用的預訓練的 word embedding 是 40 萬個單詞的訓練結果，它們的特徵維數是 50

其他所有的分類模型可以基於這個函式進行建立

相關推薦