TensorFlow實戰— —K-Means聚類

阿新 • • 發佈：2019-01-09

TensorFlow是Google最近開源的人工智慧庫。

這裡寫圖片描述

TensorFlow使用了data-flow graphs（DFG），如下圖

這裡寫圖片描述

從圖中可以看出，DFG是表示計算表示式的一種樹形結構圖。每個節點代表一個運算，非葉子節點是運算子，葉子節點是直接的值（確定的值或者不確定的值），箭頭方向表示了不同節點間的依賴關係。

TensorFlow目前提供兩種API，Python和C++，目前只有Linux和MacOS的安裝方法，暫時沒有Windows版本。官方還推薦用虛擬環境如Docker，VirtualEnv。官方給出了相應的使用說明，一般的程式猿可以參考閱讀官方文件。但假如你是個菜鳥，可以安裝下面的步驟慢慢學習~(貌似網站被牆了。。。)

1.安裝

2.閱讀官方的栗子，知道TensorFlow的程式碼是怎麼寫的，有個大概印象

3.閱讀關於基本元件的官方解釋

4.閱讀這個使用TensorFlow解決一個普通機器學習問題的詳細栗子

5.基本瞭解之後，就可以去看一下Python API或者C++ API

下面給出一段使用TensorFlow寫的K-Means聚類的方法

import tensorflow as tf
from random import choice, shuffle
from numpy import array


def TFKMeansCluster(vectors, noofclusters) 
:
    """
    K-Means Clustering using TensorFlow.
    'vectors' should be a n*k 2-D NumPy array, where n is the number
    of vectors of dimensionality k.
    'noofclusters' should be an integer.
    """

    noofclusters = int(noofclusters)
    assert noofclusters < len(vectors)

    #Find out the dimensionality 

    dim = len(vectors[0])

    #Will help select random centroids from among the available vectors
    vector_indices = list(range(len(vectors)))
    shuffle(vector_indices)

    #GRAPH OF COMPUTATION
    #We initialize a new graph and set it as the default during each run
    #of this algorithm. This ensures that as this function is called
    #multiple times, the default graph doesn't keep getting crowded with
    #unused ops and Variables from previous function calls.

    graph = tf.Graph()

    with graph.as_default():

        #SESSION OF COMPUTATION

        sess = tf.Session()

        ##CONSTRUCTING THE ELEMENTS OF COMPUTATION

        ##First lets ensure we have a Variable vector for each centroid,
        ##initialized to one of the vectors from the available data points
        centroids = [tf.Variable((vectors[vector_indices[i]]))
                     for i in range(noofclusters)]
        ##These nodes will assign the centroid Variables the appropriate
        ##values
        centroid_value = tf.placeholder("float64", [dim])
        cent_assigns = []
        for centroid in centroids:
            cent_assigns.append(tf.assign(centroid, centroid_value))

        ##Variables for cluster assignments of individual vectors(initialized
        ##to 0 at first)
        assignments = [tf.Variable(0) for i in range(len(vectors))]
        ##These nodes will assign an assignment Variable the appropriate
        ##value
        assignment_value = tf.placeholder("int32")
        cluster_assigns = []
        for assignment in assignments:
            cluster_assigns.append(tf.assign(assignment,
                                             assignment_value))

        ##Now lets construct the node that will compute the mean
        #The placeholder for the input
        mean_input = tf.placeholder("float", [None, dim])
        #The Node/op takes the input and computes a mean along the 0th
        #dimension, i.e. the list of input vectors
        mean_op = tf.reduce_mean(mean_input, 0)

        ##Node for computing Euclidean distances
        #Placeholders for input
        v1 = tf.placeholder("float", [dim])
        v2 = tf.placeholder("float", [dim])
        euclid_dist = tf.sqrt(tf.reduce_sum(tf.pow(tf.sub(
            v1, v2), 2)))

        ##This node will figure out which cluster to assign a vector to,
        ##based on Euclidean distances of the vector from the centroids.
        #Placeholder for input
        centroid_distances = tf.placeholder("float", [noofclusters])
        cluster_assignment = tf.argmin(centroid_distances, 0)

        ##INITIALIZING STATE VARIABLES

        ##This will help initialization of all Variables defined with respect
        ##to the graph. The Variable-initializer should be defined after
        ##all the Variables have been constructed, so that each of them
        ##will be included in the initialization.
        init_op = tf.initialize_all_variables()

        #Initialize all variables
        sess.run(init_op)

        ##CLUSTERING ITERATIONS

        #Now perform the Expectation-Maximization steps of K-Means clustering
        #iterations. To keep things simple, we will only do a set number of
        #iterations, instead of using a Stopping Criterion.
        noofiterations = 100
        for iteration_n in range(noofiterations):

            ##EXPECTATION STEP
            ##Based on the centroid locations till last iteration, compute
            ##the _expected_ centroid assignments.
            #Iterate over each vector
            for vector_n in range(len(vectors)):
                vect = vectors[vector_n]
                #Compute Euclidean distance between this vector and each
                #centroid. Remember that this list cannot be named
                #'centroid_distances', since that is the input to the
                #cluster assignment node.
                distances = [sess.run(euclid_dist, feed_dict={
                    v1: vect, v2: sess.run(centroid)})
                             for centroid in centroids]
                #Now use the cluster assignment node, with the distances
                #as the input
                assignment = sess.run(cluster_assignment, feed_dict = {
                    centroid_distances: distances})
                #Now assign the value to the appropriate state variable
                sess.run(cluster_assigns[vector_n], feed_dict={
                    assignment_value: assignment})

            ##MAXIMIZATION STEP
            #Based on the expected state computed from the Expectation Step,
            #compute the locations of the centroids so as to maximize the
            #overall objective of minimizing within-cluster Sum-of-Squares
            for cluster_n in range(noofclusters):
                #Collect all the vectors assigned to this cluster
                assigned_vects = [vectors[i] for i in range(len(vectors))
                                  if sess.run(assignments[i]) == cluster_n]
                #Compute new centroid location
                new_location = sess.run(mean_op, feed_dict={
                    mean_input: array(assigned_vects)})
                #Assign value to appropriate variable
                sess.run(cent_assigns[cluster_n], feed_dict={
                    centroid_value: new_location})

        #Return centroids and assignments
        centroids = sess.run(centroids)
        assignments = sess.run(assignments)
        return centroids, assignments

TensorFlow實戰— —K-Means聚類

TensorFlow是Google最近開源的人工智慧庫。這裡寫圖片描述 TensorFlow使用了data-flow graphs（DFG），如下圖這裡寫圖片描述從圖中可以看出，DFG是表示計算表示式的一種樹形結構圖。每個節點代表一個運算，非葉子節

R語言實戰k-means聚類和關聯規則演算法

1、R語言關於k-means聚類資料集格式如下所示： ,河東路與嶴東路&河東路與聚賢橋路,河東路與嶴東路&新悅路與嶴東路,河東路與嶴東路&火炬路與聚賢橋路,河東路與嶴東路&

K-Means 聚類實戰

首先生成原始資料點： import numpy as np import matplotlib.pyplot as plt from sklearn.cluster import KMeans def gen_clusters(): mean1 = [0,0] c

機器學習實戰之k-means聚類_程式碼註釋

#-*- coding: UTF-8 -*- from numpy import * def loadDataSet(fileName):#函式的輸入為檔名稱，函式的主要作用是將檔案中的每行內容轉換成浮點型， # 每行

Spark 實戰，第 4 部分: 使用 Spark MLlib 做 K-means 聚類分析

引言提起機器學習 (Machine Learning)，相信很多計算機從業者都會對這個技術方向感到興奮。然而學習並使用機器學習演算法來處理資料卻是一項複雜的工作，需要充足的知識儲備，如概率論，數理統計，數值逼近，最優化理論等。機器學習旨在使計算機具有人類一樣的學習能力和模仿能力，這也是實現人工智慧的核

mahout in Action2.2-聚類介紹-K-means聚類算法

過程 swing 浪漫 res cto 等等算法結合 -m 聚類介紹本章包含 1 實戰操作了解聚類 2.了解相似性概念 3 使用mahout執行一個簡單的聚類實例 4.用於聚類的各種不同的

K-Means聚類

rom distance 標簽 fit margin out 結果 nbsp k-means聚類聚類（clustering）　　用於找出不帶標簽數據的相似性的算法 K-Means聚類算法簡介　　與廣義線性模型和決策樹類似，K-Means參數的最優解也是以成本

K-Means 聚類算法原理分析與代碼實現

oat 得到 ssi targe fan readline txt __name__ 輸出轉自穆晨閱讀目錄前言現實中的聚類分析問題 - 總統大選 K-Means 聚類算法 K-Means性能優化二分K-Means算法小結回到頂部前言在

通過IDEA及hadoop平臺實現k-means聚類算法

綜合 tle tostring html map apache cnblogs cos textfile 有段時間沒有操作過，發現自己忘記一些步驟了，這篇文章會記錄相關步驟，並隨時進行補充修改。 1 基礎步驟，即相關環境部署及數據準備數據文件類型為.csv文件，excel

K-Means 聚類

src 選擇高頻進行效率需求沒有框架容易機器學習中的算法主要分為兩類，一類是監督學習，監督學習顧名思義就是在學習的過程中有人監督，即對於每一個訓練樣本，有對應的標記指明它的類型。如識別算法的訓練集中貓的圖片，在訓練之前會人工打上標簽，告訴電腦這些像素組合在一

【轉】使用scipy進行層次聚類和k-means聚類

歐氏距離 generate https then con method 感覺 long average scipy cluster庫簡介 scipy.cluster是scipy下的一個做聚類的package, 共包含了兩類聚類方法: 1. 矢量量化(scipy.cluste

CS229 Machine Learning學習筆記:Note 7(K-means聚類、高斯混合模型、EM算法)

learn 不同的 inf ear 公式 course splay alt spa K-means聚類 ng在coursera的機器學習課上已經講過K-means聚類，這裏不再贅述高斯混合模型問題描述聚類問題：給定訓練集\(\{x^{(1)},\cdots,x^{(m

數學模型：3.非監督學習--聚類分析和K-means聚類

rand tar 聚類分析復制 clust tle 降維算法 generator pro 1. 聚類分析聚類分析（cluster analysis）是一組將研究對象分為相對同質的群組（clusters）的統計分析技術 ---->> 將觀測對象的群體按照

吳恩達老師機器學習筆記K-means聚類演算法（二）

運用K-means聚類演算法進行影象壓縮趁熱打鐵，修改之前的演算法來做第二個練習—影象壓縮原始圖片如下：程式碼如下： X =imread('bird.png'); % 讀取圖片 X =im2double(X); % unit8轉成double型別 [m,n,z]=size

吳恩達老師機器學習筆記K-means聚類演算法（一）

今天接著學習聚類演算法以後堅決要八點之前起床學習！不要浪費每一個早晨。 K-means聚類演算法聚類過程如下：原理基本就是先從樣本中隨機選擇聚類中心，計算樣本到聚類中心的距離，選擇樣本最近的中心作為該樣本的類別。最後某一類樣本的座標平均值作為新聚類中心的座標，如此往復。原

使用Java實現K-Means聚類演算法

第一次寫部落格，隨便寫寫。關於K-Means介紹很多，還不清楚可以查一些相關資料。個人對其實現步驟簡單總結為4步: 1.選出k值,隨機出k個起始質心點。 2.分別計算每個點和k個起始質點之間的距離,就近歸類。 3.最終中心點集可以劃分為k類,

機器學習（十二）讓你輕鬆理解K-means 聚類演算法

前言你還記得菜市場賣菜的嗎？書店賣書的或者是平時去超市買東西時的物品，它們是不是都根據相似性擺放在一起了呢，飲料、啤酒、零食分佈在各自區域，像這樣各級事物的相似特點或特性組織在一起的方法，在機器學習裡面即成為

K-means聚類演算法原理簡單介紹

K-means演算法（1. 剛開始隨機選擇兩個點作為簇重心，然後計算每個資料點離這個重心的距離並把這些點歸為兩個類）（上一步的結果如下圖，所有離藍色叉近的點被標為藍色了，紅色亦然）

【Python例項第20講】手寫數字識別問題的K-Means聚類

機器學習訓練營——機器學習愛好者的自由交流空間（qq 群號：696721295）在這個例子裡，我們在手寫數字識別資料集上，比較 K-means 聚類演算法對於不同的初始化策略對執行時間和結果質量的影響。我們也利用不同的聚類質量測度判別聚類標籤對於參考標籤的擬合優度。這裡使

【機器學習】接地氣地解釋K-means聚類演算法

俗話說“物以類聚，人以群分”，這句話在K-means聚類演算法裡面得到了充分的繼承。而K-means演算法的實際應用範圍可謂是大到無法估量，基本可以說，只要你想不到，沒有聚類聚不起來的東西！ &nbs

TensorFlow實戰— —K-Means聚類

相關推薦