圖半監督學習——標記傳播

阿新 • • 發佈：2019-01-08

從書上301~304頁的介紹可知，圖半監督學習具有兩個明顯的缺點：

處理大規模資料時效能欠佳；

難以直接對新樣本進行分類。

下面採用sklearn的半監督學習模組來驗證上述特性。
選用iris資料集的第1、3項屬性開展測試，sklearn 的半監督學習演算法是利用標記傳播進行學習，具體又分為標記傳播（Label Propagating）和標記擴散（Label Spreading），兩者的差異在官方文件裡已經說的很清楚，此處不再詳述，此處採用的是標記傳播，分別用不同比例的標記和未標記進行標記傳播，並與SVM的結果進行對比（標記傳播演算法不能直接對新樣本進行分類，sklearn中的label_propagating

採取的方法是對新樣本重新進行一次學習來預測其標記），具體程式碼如下：

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn import svm
from sklearn.semi_supervised import label_propagation
import time

rng = np.random.RandomState(0)

iris = datasets.load_iris()

X = iris.data[:, [0,2]]
y = iris.target

y_30 = np.copy(y)
y_30[rng.rand(len(y)) < 0.3 
] = -1
y_80 = np.copy(y)
y_80[rng.rand(len(y)) < 0.8] = -1
# create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors
lu80 = (label_propagation.LabelSpreading().fit(X, y_80), X, y_80)
lu30 = (label_propagation.LabelSpreading().fit(X, y_30), X, y_30)
l20 = (label_propagation.LabelSpreading().fit(X[y_80 != -1 
], y_80[y_80 != -1]), X[y_80 != -1], y_80[y_80 != -1])
l70 = (label_propagation.LabelSpreading().fit(X[y_30 != -1], y_30[y_30 != -1]), X[y_30 != -1], y_30[y_30 != -1])
l100 = (label_propagation.LabelSpreading().fit(X, y), X, y)
rbf_svc = (svm.SVC().fit(X, y), X, y)

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, .02), np.arange(y_min, y_max, .02))

titles = [ 'spreading with 80% Du, 20% Dl', 'spreading with 30% Du, 70% Dl',
          'spreading with 20% Dl', 'spreading with 70% Dl', 
          'spreading with 100% Dl',  'SVC with RBF kernel']
color_map = {-1: (.5, .5, .5), 0: (0, 0, .9), 1: (1, 0, 0), 2: (.8, .6, 0)}
fig = plt.figure(figsize=(14,18))
for i, (clf, x, y_train) in enumerate((lu80, lu30, l20, l70, l100, rbf_svc)):
    start = time.time()
    ax = fig.add_subplot(3, 2, i + 1)
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, cmap=plt.cm.Paired,alpha=.5)

    colors = [color_map[y] for y in y_train]
    ax.scatter(x[:, 0], x[:, 1], c=colors, cmap=plt.cm.Paired)
    end = time.time()
    ax.set_title('{0} : {1:.3f}s elapsed'.format(titles[i], end-start), fontsize=16)
    ax.set_xlabel(iris.feature_names[0])
    ax.set_ylabel(iris.feature_names[2])
plt.show()

由下圖可見，標記傳播的效能相比於SVM下降了一倍（從執行時間來看），而從分類結果來看，採用20%有標記資料和80%未標記進行半監督學習就能達到大約使用70%~100%有標記資料的分類效果，且分類結果與SVM的結果接近，表明這種標記傳播的半監督學習的效果還是比較好的。
這裡寫圖片描述

圖半監督學習——標記傳播

圖半監督學習——標記傳播

機器學習之圖半監督學習LabelPropagation

機器學習之圖半監督學習LabelSpreading

半監督學習演算法——標籤傳播演算法(LPA)與其擴充套件

目標追蹤論文之狼吞虎嚥(5):基於張量的圖嵌入半監督學習及其在判別式目標追蹤的應用

【GCN】圖卷積網路的半監督學習脈絡

結合圖拉普拉斯的半監督學習

半監督學習（四）——基於圖的半監督學習

監督學習，無監督學習和半監督學習

詳解使用EM算法的半監督學習方法應用於樸素貝葉斯文本分類

有監督學習、無監督學習、半監督學習

sklearn半監督學習

偽標籤：教你玩轉無標籤資料的半監督學習方法

【IM】關於半監督學習的理解

半監督學習演算法——ATDA(Asymmetric Tri-training for Unsupervised Domain Adaptation)

從零開始-Machine Learning學習筆記(29)-半監督學習

[深度學習]半監督學習、無監督學習之Autoencoders自編碼器(附程式碼)

Strong Baselines for Neural Semi-supervised Learning under Domain Shift半監督學習

[深度學習]半監督學習、無監督學習之DCGAN深度卷積生成對抗網路(附程式碼)

機器學習與深度學習系列連載：第一部分機器學習（十三）半監督學習（semi-supervised learning）

圖半監督學習——標記傳播

相關推薦