2020 年 3月 2 日随笔档案 - HelenLee01

摘要： K-means是一种常用的聚类算法，进阶版展示如下，代码传送门：import randomfrom sklearn import datasetsimport numpy as npimport matplotl... 阅读全文

posted @ 2020-03-02 13:47 HelenLee01 阅读(2464) 评论(0) 推荐(0) 编辑

摘要：升级版K-means聚类：tf-idf+PCA降维+k-means，代码传送门：# coding:utf-8 # 2.0 使用jieba进行分词,彻底放弃低效的NLPIR,用TextRank算法赋值权重(实测te... 阅读全文

posted @ 2020-03-02 12:01 HelenLee01 阅读(241) 评论(0) 推荐(0) 编辑

摘要： K-means是一种常用的聚类算法，展示如下：Created on 2016-01-06 @author: Eastmount代码传送门：# coding=utf-8 """ Created on 2016-0... 阅读全文

posted @ 2020-03-02 11:53 HelenLee01 阅读(265) 评论(0) 推荐(0) 编辑

摘要： K-means是一种常用的聚类算法，入门版展示如下，代码传送门：# -*- coding: utf-8 -*-import jieba from sklearn.feature_extraction.text i... 阅读全文

posted @ 2020-03-02 11:39 HelenLee01 阅读(211) 评论(0) 推荐(0) 编辑

摘要：使用jieba分词，需要去停用词，这里分享一下常见停用词。[【]】':：；;1、2、3、4、5、6、7、8、9、10、、(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)rnbrstrong&liem... 阅读全文

posted @ 2020-03-02 11:16 HelenLee01 阅读(310) 评论(0) 推荐(0) 编辑

摘要：使用jieba分词，去停用词，添加自定义字典。#encoding=utf-8import jiebafilename = "gp.txt"stopwords_file = "stopwords.txt"jieba... 阅读全文

posted @ 2020-03-02 11:13 HelenLee01 阅读(497) 评论(0) 推荐(0) 编辑

摘要：读取txt文本内容并使用jieba进行分词。import jiebafR = open('gp.txt', 'r', encoding='UTF-8')sent = fR.read()sent_list = ji... 阅读全文

posted @ 2020-03-02 11:12 HelenLee01 阅读(81) 评论(0) 推荐(0) 编辑

摘要：读取excel数据，使用xlrd制作散点图。import matplotlib.pyplot as plt import xlrddata = xlrd.open_workbook('测试3.xlsx') # 读... 阅读全文

posted @ 2020-03-02 11:09 HelenLee01 阅读(760) 评论(0) 推荐(0) 编辑

摘要：读取excel内容并用print输出。import pandas as pddf=pd.read_excel('测试.xlsx')#这个会直接默认读取到这个Excel的第一个表单data=df.head()#... 阅读全文

posted @ 2020-03-02 11:06 HelenLee01 阅读(1568) 评论(0) 推荐(0) 编辑

摘要：使用python处理excel的内容时，第一步当然是读取excel的内容。import pandas as pd#1：读取指定行print("----读取指定的单行，数据会存在列表里面----")df=pd.... 阅读全文

posted @ 2020-03-02 11:03 HelenLee01 阅读(12414) 评论(0) 推荐(0) 编辑

HelenLee