复合数据类型

列表，元组，字典，集合增删改查及遍历：

#列表的增删改查遍历
list1 = list('this is a list')
list1.append('!') #末尾增加元素
list1.pop(-1) #删除指定index的元素默认是-1 return被删除元素的值
list1.remove('!')
list1[1:5] = []#修改
#查找
if 'a' in list1:
index = list1.index('a') # 查找元素下标

#遍历列表方法
for i in list:
print ("序号：%s 值：%s" % (list.index(i) + 1, i))

#元组，操作和列表类似，但是元组不能修改
#创建
tuple1 = ()
tuple1 = 1,
tuple1 = 1,2,3
tuple1 = tuple([1,2,3,4])

#字典的增删改查遍历
dict1['key3']='value3' #字典可以自动添加
dict1.setdefault('key5','N/A') #如果不存在，就设置默认值
del dict1['key3'] #删除
dict1['key1']='new_value_1' #修改
dict1['key1'] #查找

集合的增删改查遍历

setA = {'a','b','c','d','f','f'}
print(setA) #结果为{'c', 'b', 'f', 'a', 'd'}，因为set只会保留不重复数据,且是无序的
setA.add('g') #添加元素
#setA.remove('g') #删除
print([v for v in setA]) #因为set没有索引、没有key，所以不能取单个值，也不能改单个值,也没这个必要。但是可以遍历，使用这种方法也可以做筛选
print(sorted(setA)) #set的排序

总结列表，元组，字典，集合的联系与区别：

字典和集合是用{}，元组是用（），列表是用[]，他们用不同的括号来表示数据。

字典和集合已经元组是无序的，列表是有序的。

元组是不可直接改变里面的变量，其他三种结构可以改变里面的变量。

字典是不可以重复的，其他三种结构是可以有重复变量。

import pandas as pd

file = open('test.txt','r',encoding='utf8');
text=file.read();
textList=text.split()
textDict={}
for word in textList:
    textDict[word.lower()]=textList.count(word)
words=list(textDict.items())
words.sort(key=lambda x:x[1],reverse=True)
print(words)

# 过滤的单词
removeWord=[
    'the','were',
    'and','that',
    'of','this',
    'a','at',
    'to','by',
    'in','for',
    'with',
    'their',
    'was',
    'on'
]
i=0
while i< len(words):
    if words[i][0] in removeWord: #查找单词列表是否在要过滤的单词表中，有的，从单词列表中去掉该单词
        words.remove(words[i])
        if i!=0: #因为去掉单词后，列表内容会前移一位，所以索引要减一
            i=i-1
    else:
        i=i+1

for k in range(20):
    print(words[k])
 
pd.DataFrame(data=words).to_csv('big.csv',encoding='utf-8')

生成的云词如下：

posted @ 2019-03-11 16:08 一觉不觉已千年阅读(302) 评论(0) 编辑收藏举报

星雪凄天银河垂

复合数据类型

公告