1. 程式人生 > >『科學計算』層次聚類實現

『科學計算』層次聚類實現

長度 del python axis for 啤酒 4.0 由於 mic

層次聚類理論自行百度,這裏是一個按照我的理解的簡單實現,

我們先看看數據,

啤酒名 熱量 鈉含量 酒精 價格
Budweiser 144.00 19.00 4.70 .43
Schlitz 181.00 19.00 4.90 .43
Ionenbrau 157.00 15.00 4.90 .48
Kronensourc 170.00 7.00 5.20 .73
Heineken 152.00 11.00 5.00 .77
Old-milnaukee 145.00 23.00 4.60 .26
Aucsberger 175.00 24.00 5.50 .40
Strchs-bohemi 149.00 27.00 4.70 .42
Miller-lite 99.00 10.00 4.30 .43
Sudeiser-lich 113.00 6.00 3.70 .44
Coors 140.00 16.00 4.60 .44
Coorslicht 102.00 15.00 4.10 .46
Michelos-lich 135.00 11.00 4.20 .50
Secrs 150.00 19.00 4.70 .76
Kkirin 149.00 6.00 5.00 .79
Pabst-extra-l 68.00 15.00 2.30 .36
Hamms 136.00 19.00 4.40 .43
Heilemans-old 144.00 24.00 4.90 .43
Olympia-gold- 72.00 6.00 2.90 .46
Schlite-light 97.00 7.00 4.20 .47

程序如下,

import numpy as np
import pandas as pd

data = pd.read_csv(‘./bear.txt‘, delim_whitespace=True)
X = np.array(data.ix[:,1:])
names = [[name] for name in data.ix[:,0]]

def cluster_step(X,names):
    dis = np.empty([len(X),len(X)])
    for i in range(len(X)):
        for j in range(len(X)):
            dis[i][j] = np.sqrt(np.sum(np.square(X[i] - X[j])))
            if i == j:
                dis[i][j] = 999
    x, y = [(np.argmin(dis))//len(X), np.mod(np.argmin(dis),len(X))]
    X[x] = (X[x] + X[y])/2
    X = np.delete(X, y, axis=0)
    names[x].extend(names[y])
    names.remove(names[y])
    return x, y, X, names, dis

def cluster(X, num, names):
    classes = len(X)
    while classes != num:
        _x, _y, X, names, _dis = cluster_step(X, names)
        with open(‘./result.txt‘, ‘a‘) as f:
            f.write(‘\r‘+str(_x))
            f.write(‘\r‘+str(_y))
            f.write(‘\r‘ + str(_dis[_x,_y]))
            f.write(‘\r‘+str(_dis))
            f.write(‘\r‘+str(names))
            f.flush()
        classes -= 1
    return names

if __name__==‘__main__‘:
    names = cluster(X, 4, names)

規則是每次合並後去中心點(每一步會合並兩個位置,取均值做新位置)作為類簇位置,距離使用的是歐式距離。

實際上由於每次合並後下一次的節點會減少,和最初的20個點就對不上了,頭疼了好一會,後來想到在每一次叠代中把每一個種類名按照類去合並,這樣就不需要在最後利用索引去復原啤酒種類了,感覺挺機智。由於這樣直接說不直觀,我下面給出中間輸出,

[[‘Budweiser‘], [‘Schlitz‘], [‘Ionenbrau‘], [‘Kronensourc‘], [‘Heineken‘], [‘Old-milnaukee‘, ‘Heilemans-old‘], [‘Aucsberger‘], [‘Strchs-bohemi‘], [‘Miller-lite‘], [‘Sudeiser-lich‘], [‘Coors‘], [‘Coorslicht‘], [‘Michelos-lich‘], [‘Secrs‘], [‘Kkirin‘], [‘Pabst-extra-l‘], [‘Hamms‘], [‘Olympia-gold-‘], [‘Schlite-light‘]]

[[‘Budweiser‘], [‘Schlitz‘], [‘Ionenbrau‘], [‘Kronensourc‘], [‘Heineken‘], [‘Old-milnaukee‘, ‘Heilemans-old‘], [‘Aucsberger‘], [‘Strchs-bohemi‘], [‘Miller-lite‘, ‘Schlite-light‘], [‘Sudeiser-lich‘], [‘Coors‘], [‘Coorslicht‘], [‘Michelos-lich‘], [‘Secrs‘], [‘Kkirin‘], [‘Pabst-extra-l‘], [‘Hamms‘], [‘Olympia-gold-‘]]

[[‘Budweiser‘, ‘Old-milnaukee‘, ‘Heilemans-old‘], [‘Schlitz‘], [‘Ionenbrau‘], [‘Kronensourc‘], [‘Heineken‘], [‘Aucsberger‘], [‘Strchs-bohemi‘], [‘Miller-lite‘, ‘Schlite-light‘], [‘Sudeiser-lich‘], [‘Coors‘], [‘Coorslicht‘], [‘Michelos-lich‘], [‘Secrs‘], [‘Kkirin‘], [‘Pabst-extra-l‘], [‘Hamms‘], [‘Olympia-gold-‘]]

[[‘Budweiser‘, ‘Old-milnaukee‘, ‘Heilemans-old‘], [‘Schlitz‘], [‘Ionenbrau‘], [‘Kronensourc‘], [‘Heineken‘], [‘Aucsberger‘], [‘Strchs-bohemi‘], [‘Miller-lite‘, ‘Schlite-light‘], [‘Sudeiser-lich‘], [‘Coors‘, ‘Hamms‘], [‘Coorslicht‘], [‘Michelos-lich‘], [‘Secrs‘], [‘Kkirin‘], [‘Pabst-extra-l‘], [‘Olympia-gold-‘]]

... ... ...

每次list長度減少1,某個子list長度加一這樣

查看一下輸出,

names
Out[1]:
[[‘Budweiser‘,
‘Old-milnaukee‘,
‘Heilemans-old‘,
‘Secrs‘,
‘Strchs-bohemi‘,
‘Ionenbrau‘,
‘Heineken‘,
‘Kkirin‘,
‘Coors‘,
‘Hamms‘,
‘Michelos-lich‘],
[‘Schlitz‘, ‘Aucsberger‘, ‘Kronensourc‘],
[‘Miller-lite‘, ‘Schlite-light‘, ‘Coorslicht‘, ‘Sudeiser-lich‘],
[‘Pabst-extra-l‘, ‘Olympia-gold-‘]]

『科學計算』層次聚類實現