k-means 聚类前的数据分析

原始数据

Say you are given a data set where each observed example has a set of features, but has nolabels. Labels are an essential ingredient to a supervised algorithm like Support Vector Machines, which learns a hypothesis function to predict labels given features. So we can't run supervised learning. What can we do?

One of the most straightforward tasks we can perform on a data set without labels is to find groups of data in our dataset which are similar to one another -- what we call clusters.


#!/usr/bin/python

import matplotlib.pyplot as plt

def readfile(filename):
    datamat = []
    with open(filename, 'r') as f:
        for line in f.readlines():
            linestrlist = line.strip().split('\t')
            linelist = list(map(float, linestrlist))
            datamat.append(linelist)

    return datamat

if __name__ == "__main__":
    datamat = []
    datamat = readfile("C:\\kmeans.txt")
    vectors_set = []
    for val in enumerate(datamat):
        vectors_set.append(val[1])
    x_data = [v[0] for v in vectors_set]
    y_data = [v[1] for v in vectors_set]
    plt.plot(x_data, y_data, 'r*', label='Original data')
    plt.legend()
    plt.show()
K-means聚类时候，需要给定K的值，这个时候可以先画出图，大致判断一下。

posted @ 2019-02-26 09:25 东宫得臣阅读(280) 评论(0) 编辑收藏举报

刷新页面返回顶部