复合数据类型，英文词频统计

本次作业来源https://edu.cnblogs.com/campus/gzcc/GZCC-16SE1/homework/2753

1.列表，元组，字典，集合分别如何增删改查及遍历。

列表操作：

list1 = ['speakingSirqin', 'softqin', 1999, 2000]
list2 = [1, 2, 3, 4, 5 ]
list3 = ["a", "b", "c", "d"]
#输出
print(list1)
print(list2)
print(list3)
print(list2[0:2])#下标0开始,下标2结束，但不包含下标2所对应的上键(元素)'
#增加
list1.insert(2,'lili')
list1.insert(5,'qin')
print(list1)
#删除
list2.pop(0)
print(list2)
#修改
list3[1]='A'
print(list3)
#查找
index=list3.index('c')
print("找到c在列表3的下标为：",index)

2.元组

tup1 = ('Google', 'Runoob', 1997, 2000)
tup2 = (1, 2, 3, 4, 5, 8)
tup3 = "a", "b", "c", "d"
#输出
print(tup1)
print(tup2)
print(tup3)
print(tup1[0])
print(tup2[1:3])
#连接元组
tup4=tup1+tup2+tup3
print(tup4)

3.词频统计

import pandas as pd
file=open('artical.txt',encoding='utf-8')
text=file.read()
text=text.lower()
for i in str('''?!",.'''):
text=text.replace(i,'')
text=text.split()

# 统计单词数量
exclude = ['a', 'the', 'and', 'if', 'you', 'in', 'but', 'not', 'it', ' s', 'if', "i"]
dict={}
for i in text:
if i not in exclude:
if i not in dict:
dict[i]=text.count(i)
print(dict)

# 排序单词数量
word=list(dict.items())
word.sort(key=lambda x: x[1], reverse=True)
print(word)

# 输出前二十位的单词
for i in range(20):
print(word[i])

pd.DataFrame(data=word).to_csv('b.csv',encoding='utf-8')

输出TOP(20)

('no', 44)
("there's", 12)
('get', 11)
('let', 9)
('away', 9)
('way', 9)
('for', 9)
('broken', 9)
('to', 8)
('be', 7)
('that', 7)
("don't", 7)
("i'm", 7)
('hope', 7)
('girl', 6)
('wanna', 6)
('cause', 5)
('one', 5)
("can't", 5)
('gotta', 5)

可视化：词云

排序好的单词列表word保存成csv文件

import pandas as pd
pd.DataFrame(data=word).to_csv('big.csv',encoding='utf-8')

线上工具生成词云：
https://wordart.com/create

posted on 2019-03-25 16:08 hyf751190951 阅读(135) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

hyf751190951