『科学计算』层次聚类实现
层次聚类理论自行百度,这里是一个按照我的理解的简单实现,
我们先看看数据,
啤酒名 热量 钠含量 酒精 价格
Budweiser 144.00 19.00 4.70 .43
Schlitz 181.00 19.00 4.90 .43
Ionenbrau 157.00 15.00 4.90 .48
Kronensourc 170.00 7.00 5.20 .73
Heineken 152.00 11.00 5.00 .77
Old-milnaukee 145.00 23.00 4.60 .26
Aucsberger 175.00 24.00 5.50 .40
Strchs-bohemi 149.00 27.00 4.70 .42
Miller-lite 99.00 10.00 4.30 .43
Sudeiser-lich 113.00 6.00 3.70 .44
Coors 140.00 16.00 4.60 .44
Coorslicht 102.00 15.00 4.10 .46
Michelos-lich 135.00 11.00 4.20 .50
Secrs 150.00 19.00 4.70 .76
Kkirin 149.00 6.00 5.00 .79
Pabst-extra-l 68.00 15.00 2.30 .36
Hamms 136.00 19.00 4.40 .43
Heilemans-old 144.00 24.00 4.90 .43
Olympia-gold- 72.00 6.00 2.90 .46
Schlite-light 97.00 7.00 4.20 .47
程序如下,
import numpy as np import pandas as pd data = pd.read_csv('./bear.txt', delim_whitespace=True) X = np.array(data.ix[:,1:]) names = [[name] for name in data.ix[:,0]] def cluster_step(X,names): dis = np.empty([len(X),len(X)]) for i in range(len(X)): for j in range(len(X)): dis[i][j] = np.sqrt(np.sum(np.square(X[i] - X[j]))) if i == j: dis[i][j] = 999 x, y = [(np.argmin(dis))//len(X), np.mod(np.argmin(dis),len(X))] X[x] = (X[x] + X[y])/2 X = np.delete(X, y, axis=0) names[x].extend(names[y]) names.remove(names[y]) return x, y, X, names, dis def cluster(X, num, names): classes = len(X) while classes != num: _x, _y, X, names, _dis = cluster_step(X, names) with open('./result.txt', 'a') as f: f.write('\r'+str(_x)) f.write('\r'+str(_y)) f.write('\r' + str(_dis[_x,_y])) f.write('\r'+str(_dis)) f.write('\r'+str(names)) f.flush() classes -= 1 return names if __name__=='__main__': names = cluster(X, 4, names)
[['Budweiser'], ['Schlitz'], ['Ionenbrau'], ['Kronensourc'], ['Heineken'], ['Old-milnaukee', 'Heilemans-old'], ['Aucsberger'], ['Strchs-bohemi'], ['Miller-lite'], ['Sudeiser-lich'], ['Coors'], ['Coorslicht'], ['Michelos-lich'], ['Secrs'], ['Kkirin'], ['Pabst-extra-l'], ['Hamms'], ['Olympia-gold-'], ['Schlite-light']]
[['Budweiser'], ['Schlitz'], ['Ionenbrau'], ['Kronensourc'], ['Heineken'], ['Old-milnaukee', 'Heilemans-old'], ['Aucsberger'], ['Strchs-bohemi'], ['Miller-lite', 'Schlite-light'], ['Sudeiser-lich'], ['Coors'], ['Coorslicht'], ['Michelos-lich'], ['Secrs'], ['Kkirin'], ['Pabst-extra-l'], ['Hamms'], ['Olympia-gold-']]
[['Budweiser', 'Old-milnaukee', 'Heilemans-old'], ['Schlitz'], ['Ionenbrau'], ['Kronensourc'], ['Heineken'], ['Aucsberger'], ['Strchs-bohemi'], ['Miller-lite', 'Schlite-light'], ['Sudeiser-lich'], ['Coors'], ['Coorslicht'], ['Michelos-lich'], ['Secrs'], ['Kkirin'], ['Pabst-extra-l'], ['Hamms'], ['Olympia-gold-']][['Budweiser', 'Old-milnaukee', 'Heilemans-old'], ['Schlitz'], ['Ionenbrau'], ['Kronensourc'], ['Heineken'], ['Aucsberger'], ['Strchs-bohemi'], ['Miller-lite', 'Schlite-light'], ['Sudeiser-lich'], ['Coors', 'Hamms'], ['Coorslicht'], ['Michelos-lich'], ['Secrs'], ['Kkirin'], ['Pabst-extra-l'], ['Olympia-gold-']]
... ... ...
每次list长度减少1,某个子list长度加一这样
查看一下输出,
names
Out[1]:
[['Budweiser',
'Old-milnaukee',
'Heilemans-old',
'Secrs',
'Strchs-bohemi',
'Ionenbrau',
'Heineken',
'Kkirin',
'Coors',
'Hamms',
'Michelos-lich'],
['Schlitz', 'Aucsberger', 'Kronensourc'],
['Miller-lite', 'Schlite-light', 'Coorslicht', 'Sudeiser-lich'],
['Pabst-extra-l', 'Olympia-gold-']]