复合数据类型,英文词频统计
本次作业来源https://edu.cnblogs.com/campus/gzcc/GZCC-16SE1/homework/2753
1.列表,元组,字典,集合分别如何增删改查及遍历。
列表操作:
list1 = ['speakingSirqin', 'softqin', 1999, 2000] list2 = [1, 2, 3, 4, 5 ] list3 = ["a", "b", "c", "d"] #输出 print(list1) print(list2) print(list3) print(list2[0:2])#下标0开始,下标2结束,但不包含下标2所对应的上键(元素)' #增加 list1.insert(2,'lili') list1.insert(5,'qin') print(list1) #删除 list2.pop(0) print(list2) #修改 list3[1]='A' print(list3) #查找 index=list3.index('c') print("找到c在列表3的下标为:",index)
2.元组
tup1 = ('Google', 'Runoob', 1997, 2000) tup2 = (1, 2, 3, 4, 5, 8) tup3 = "a", "b", "c", "d" #输出 print(tup1) print(tup2) print(tup3) print(tup1[0]) print(tup2[1:3]) #连接元组 tup4=tup1+tup2+tup3 print(tup4)
3.词频统计
import pandas as pd
file=open('artical.txt',encoding='utf-8')
text=file.read()
text=text.lower()
for i in str('''?!",.'''):
text=text.replace(i,'')
text=text.split()
# 统计单词数量
exclude = ['a', 'the', 'and', 'if', 'you', 'in', 'but', 'not', 'it', ' s', 'if', "i"]
dict={}
for i in text:
if i not in exclude:
if i not in dict:
dict[i]=text.count(i)
print(dict)
# 排序单词数量
word=list(dict.items())
word.sort(key=lambda x: x[1], reverse=True)
print(word)
# 输出前二十位的单词
for i in range(20):
print(word[i])
pd.DataFrame(data=word).to_csv('b.csv',encoding='utf-8')
输出TOP(20)
('no', 44)
("there's", 12)
('get', 11)
('let', 9)
('away', 9)
('way', 9)
('for', 9)
('broken', 9)
('to', 8)
('be', 7)
('that', 7)
("don't", 7)
("i'm", 7)
('hope', 7)
('girl', 6)
('wanna', 6)
('cause', 5)
('one', 5)
("can't", 5)
('gotta', 5)
可视化:词云
排序好的单词列表word保存成csv文件
import pandas as pd
pd.DataFrame(data=word).to_csv('big.csv',encoding='utf-8')
线上工具生成词云:
https://wordart.com/create
posted on 2019-03-25 16:08 hyf751190951 阅读(135) 评论(0) 编辑 收藏 举报