原始碼解讀----之_k-means++初始化質心的方法(被k_means呼叫)

阿新 • • 發佈：2019-01-11

本文是個人的理解，由於剛接觸並且自身能力也有限，也許會存在誤解，歡迎留言指正，本人一定虛心請教，謝謝

def _k_init(X, n_clusters, x_squared_norms, random_state, n_local_trials=None):
"""根據k-means++初始化質心
    @:parameter X : 輸入資料,應該是雙精度(dtype = np.float64)。
    @:parameter n_clusters : integer質心數
    @:parameter x_squared_norms : array, shape (n_samples,)每個資料點的歐幾里得範數的平方
 
    @:parameter random_state : numpy.RandomState#隨機數生成器，用於初始化中心
    @:parameter n_local_trials : integer, optional
        The number of seeding trials for each center (except the first),
        of which the one reducing inertia the most is greedily chosen.
        Set to None to make the number of trials depend logarithmically
 
        on the number of seeds (2+log(k)); this is the default.
    通過一種特別的方式對K-means聚類選擇初始簇中心,從而加快收斂速度
    """
n_samples, n_features = X.shape

    centers = np.empty((n_clusters, n_features), dtype=X.dtype)

    assert x_squared_norms is not None, 'x_squared_norms None in _k_init'
# 如果沒有設定seeding trials 則在此設定
 
if n_local_trials is None:
# This is what Arthur/Vassilvitskii tried, but did not report
        # specific results for other than mentioning in the conclusion
        # that it helped.
n_local_trials = 2 + int(np.log(n_clusters))

    # 隨機的選擇第一個中心
center_id = random_state.randint(n_samples)
    if sp.issparse(X):
centers[0] = X[center_id].toarray()
    else:
centers[0] = X[center_id]

    # 初始化最近距離的列表，並計算當前概率
closest_dist_sq = euclidean_distances(
        centers[0, np.newaxis], X, Y_norm_squared=x_squared_norms,
squared=True)#計算X與中心的距離的平方得到距離矩陣
current_pot = closest_dist_sq.sum()#距離矩陣的和
    # 選擇其餘n_clusters-1點
for c in range(1, n_clusters):
# 通過概率的比例選擇中心點候選點
        # 離已經存在的中心最近的距離的平方
rand_vals = random_state.random_sample(n_local_trials) * current_pot
        #將rand_vals插入原有序陣列距離矩陣的累積求和矩陣中，並返回插入元素的索引值
candidate_ids = np.searchsorted(stable_cumsum(closest_dist_sq),
rand_vals)

        # 計算離中心候選點的距離
distance_to_candidates = euclidean_distances(
            X[candidate_ids], X, Y_norm_squared=x_squared_norms, squared=True)

        # 決定哪個中心候選點是最好
best_candidate = None
best_pot = None
best_dist_sq = None
        for trial in range(n_local_trials):
# Compute potential when including center candidate
            #多個數組的對應位置上元素大小的比較：返回每個索引位置上的最小值
new_dist_sq = np.minimum(closest_dist_sq,
distance_to_candidates[trial])
            new_pot = new_dist_sq.sum()#求和
            # 如果是到目前為止最好的實驗結果則儲存該結果
if (best_candidate is None) or (new_pot < best_pot):
best_candidate = candidate_ids[trial]
                best_pot = new_pot
                best_dist_sq = new_dist_sq

        # Permanently add best center candidate found in local tries
        #把從試驗中選出的最好的中心候選點新增到中心點集中
if sp.issparse(X):
centers[c] = X[best_candidate].toarray()
        else:
centers[c] = X[best_candidate]
        current_pot = best_pot
        closest_dist_sq = best_dist_sq

    return centers

參考地址：https://github.com/scikit-learn/scikit-learn

原始碼解讀----之_k-means++初始化質心的方法(被k_means呼叫)

本文是個人的理解，由於剛接觸並且自身能力也有限，也許會存在誤解，歡迎留言指正，本人一定虛心請教，謝謝 def _k_init(X, n_clusters, x_squared_norms, random_state, n_local_trials=None): """根

Spring原始碼解讀-Spring IoC容器初始化之資源定位

**IoC初始化過程首先spring IoC容器的初始化，要分成三大部分，BeanDefinition的 Resource定位、載入和註冊三個基本過程。今天我要說的就是資原始檔的定位，IoC容

【kubernetes/k8s原始碼分析】kubelet原始碼分析之容器網路初始化原始碼分析

一. 網路基礎 1.1 網路名稱空間的操作建立網路名稱空間： ip netns add 名稱空間內執行命令： ip netns exec 進入名稱空間： ip netns exec bash 1.2 bridge-nf-c

jquery原始碼分析之jQuery物件初始化

在jquery實際應用中可通過$或jQuery生成jquery物件，如$("#hr_three")可生成jquery物件，jquery是如何做到的呢？jquery載入時，入口為如下匿名方法，(function( global, factory ) {...} ([color=

Dubbo原始碼解析之客戶端初始化及服務呼叫

準備 dubbo 版本：2.5.4 客戶端初始化過程初始化過程先上時序圖，幫助理解客戶端初始化過程。 ReferenceBean 是客戶端初始化入口，其實現 InitializingBean 介面，在 bean 初始化過程中會呼叫其 afterProper

Lumen開發：lumen原始碼解讀之初始化(2)——門面(Facades)與資料庫(db)

緊接上一篇 $app->withFacades();//為應用程式註冊門面。 $app->withEloquent();//為應用程式載入功能強大的庫。先來看看withFacades() /** * Register the facades

Lumen開發：lumen原始碼解讀之初始化(1)——app例項

先來看看入口檔案public/index.php //請求頭 header('Content-Type: application/json; charset=utf-8'); /* |-------------------------------------------------

tomcat原始碼解讀之初始化過程

之前我拜讀了《How Tomcat Works》這本書，對tomcat的架構與裡面的實現有了一定的瞭解，現在藉著tomcat的原始碼來深入瞭解這個精巧的藝術品。首先從初始化開始， getServer().init(); 前面類載入的過程與digester讀

Spring原始碼之ApplicationContext(九)初始化剩餘的單例

這裡所指的剩餘的單例，其實就是非延遲載入單例。在Spring的原始碼中，是通過finishBeanFactoryInitialization的方法來執行的。我們按照慣例，先來看一張時序圖。（相關資源可到這裡下載：http://pan.baidu.com/s/1sjSo9a9

rocketmq之原始碼分析broker入口BrokerController初始化過程（十六）

接著上一章的BrokerController的基礎功能講，本章主要介紹的是BrokerController的初始化操作，在初始化的

spring5 原始碼深度解析— IOC 之 bean 的初始化

一個 bean 經歷了 createBeanInstance() 被創建出來，然後又經過一番屬性注入，依賴處理，歷經千辛萬苦，千錘百煉，終於有點兒 bean 例項的樣子，能堪大任了，只需要經歷最後一步就破繭成蝶了。這最後一步就是初始化，也就是 initializeBean()，所

Spring-IOC源碼解讀2-容器的初始化過程

創建對象配置文件 instance tee rem leg source lag 1. IOC容器的初始化過程：IOC容器的初始化由refresh()方法啟動，這個啟動包括：BeanDifinition的Resource定位，加載和註冊三個過程。初始化的過程不包含Bea

跟廠長學PHP7內核（五）：一步步分析生命周期之模塊初始化階段

hle 持久 globals .post lean nco ror sign trie 上篇我們講到了模塊初始化階段，並得知它是由php_module_startup函數來實現的。該階段的主要作用是初始化變量、常量；註冊各種函數，比如工具、詞法、語法函數等；解析配置文件；

【1】pytorch torchvision原始碼解讀之Alexnet

最近開始學習一個新的深度學習框架PyTorch。框架中有一個非常重要且好用的包：torchvision，顧名思義這個包主要是關於計算機視覺cv的。這個包主要由3個子包組成，分別是：torchvision.datasets、torchvision.models、torchvision.trans

NSQ原始碼分析(一)——nsqd的初始化及啟動流程

nsq原始碼地址：https://github.com/nsqio/nsq 版本1.1.0 NSQ原始碼分析系列是我通過閱讀nsq的原始碼及結合網上的相關文章整理而成，由於在網上沒有找到很詳細和完整的文章，故自己親自整理了一份。如果有錯誤的地方，還請指正，希望這系列的文章給您帶來

java原始碼解讀之HashMap

1:首先下載openjdk(http://pan.baidu.com/s/1dFMZXg1),把原始碼匯入eclipse,以便看到jdk原始碼 Windows-Prefe

規範程式設計之變數的初始化

一個程序crash的問題， ExceptionClass: Native (NE)Exception Type: SIGSEGV CurrentExecuting Process: pid: 386, tid: 399/system/bin/Tyservice Backtrace:#00 pc

PyTorch原始碼解讀之torch.utils.data.DataLoader(轉)

原文連結 https://blog.csdn.net/u014380165/article/details/79058479 寫得特別好！最近正好在學習pytorch，學習一下！ PyTorch中資料讀取的一個重要介面是torch.utils.data.DataLoade

PyTorch原始碼解讀之torchvision.models(轉)

原文地址：https://blog.csdn.net/u014380165/article/details/79119664 PyTorch框架中有一個非常重要且好用的包：torchvision，該包主要由3個子包組成，分別是：torchvision.datasets、torchvision.mode

PyTorch原始碼解讀之torchvision.transforms（轉）

原文地址：https://blog.csdn.net/u014380165/article/details/79167753 PyTorch框架中有一個非常重要且好用的包：torchvision，該包主要由3個子包組成，分別是：torchvision.dat

原始碼解讀----之_k-means++初始化質心的方法(被k_means呼叫)

相關推薦