HelenLee01 - 博客园

摘要：使用K-means做词聚类需要用到word2vec做词向量化预处理。# @Author : LinYimeng代码传送门：# -*- coding: utf-8 -*-# @Author : LinYimen... 阅读全文

posted @ 2020-03-03 11:25 HelenLee01 阅读(251) 评论(0) 推荐(0) 编辑

摘要： plt.cm.Spectral的简单示例：实验：分别给6个元素分配颜色，其中第1，2，5，6个元素的颜色一致，第2，3个元素的颜色一致。plt.cm.Spectral([1,1,0,0,1,1]) 完整代码：#... 阅读全文

posted @ 2020-03-03 11:18 HelenLee01 阅读(1068) 评论(0) 推荐(0) 编辑

摘要： K-means是一种常用的聚类算法，进阶版展示如下，代码传送门：import randomfrom sklearn import datasetsimport numpy as npimport matplotl... 阅读全文

posted @ 2020-03-02 13:47 HelenLee01 阅读(2464) 评论(0) 推荐(0) 编辑

摘要：升级版K-means聚类：tf-idf+PCA降维+k-means，代码传送门：# coding:utf-8 # 2.0 使用jieba进行分词,彻底放弃低效的NLPIR,用TextRank算法赋值权重(实测te... 阅读全文

posted @ 2020-03-02 12:01 HelenLee01 阅读(241) 评论(0) 推荐(0) 编辑

摘要： K-means是一种常用的聚类算法，展示如下：Created on 2016-01-06 @author: Eastmount代码传送门：# coding=utf-8 """ Created on 2016-0... 阅读全文

posted @ 2020-03-02 11:53 HelenLee01 阅读(265) 评论(0) 推荐(0) 编辑

摘要： K-means是一种常用的聚类算法，入门版展示如下，代码传送门：# -*- coding: utf-8 -*-import jieba from sklearn.feature_extraction.text i... 阅读全文

posted @ 2020-03-02 11:39 HelenLee01 阅读(211) 评论(0) 推荐(0) 编辑

摘要：使用jieba分词，需要去停用词，这里分享一下常见停用词。[【]】':：；;1、2、3、4、5、6、7、8、9、10、、(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)rnbrstrong&liem... 阅读全文

posted @ 2020-03-02 11:16 HelenLee01 阅读(310) 评论(0) 推荐(0) 编辑

摘要：使用jieba分词，去停用词，添加自定义字典。#encoding=utf-8import jiebafilename = "gp.txt"stopwords_file = "stopwords.txt"jieba... 阅读全文

posted @ 2020-03-02 11:13 HelenLee01 阅读(497) 评论(0) 推荐(0) 编辑

摘要：读取txt文本内容并使用jieba进行分词。import jiebafR = open('gp.txt', 'r', encoding='UTF-8')sent = fR.read()sent_list = ji... 阅读全文

posted @ 2020-03-02 11:12 HelenLee01 阅读(81) 评论(0) 推荐(0) 编辑

摘要：读取excel数据，使用xlrd制作散点图。import matplotlib.pyplot as plt import xlrddata = xlrd.open_workbook('测试3.xlsx') # 读... 阅读全文

posted @ 2020-03-02 11:09 HelenLee01 阅读(760) 评论(0) 推荐(0) 编辑

HelenLee