处理序列的几个小技巧：保持原序去重，命名切片以及Counter类

一. 去重并保持原来元素的顺序

def dedupe(items):
    h = []
    for item in items:
        if item not in h:
            h.append(item)
    return h

#a = [1, 5, 2, 1, 9, 1, 5, 10]
a = [ {'x':1, 'y':2}, {'x':1, 'y':3}, {'x':1, 'y':2}, {'x':2, 'y':4}]
b = dedupe(a)
print(b)

二. 命名切片

假定你有一段代码要从一个记录字符串中几个固定位置提取出特定的数据字段，用切片取出想要的字符

record = '....................100 .......513.25 ..........'
cost = int(record[20:23]) * float(record[31:37])

有时候代码量比较多的话，直接写切片不能清楚它的含义；为了便于理解，可以用内置函数slice()把切片命名

SHARES = slice(20, 23)
PRICE = slice(31, 37)
cost = int(record[SHARES]) * float(record[PRICE])

如果你有一个切片对象 a，你可以分别调用它的 a.start , a.stop , a.step 属性来获取更多的信息

>>> a = slice(5, 50, 2)
>>> a.start
5
>>> a.stop
50
>>> a.step
2

三. 找出序列中出现最多次数的元素，使用collections模块中的Counter类来处理

>>> from collections import Counter
>>> words = ['look', 'into', 'my', 'eyes', 'you', 'will', 'see', 'my', 'eyes', 'in', 'your', 'eyes']
>>> morewords = ['why','are','you','not','looking','in','my','eyes']
>>> mycount1 = Counter(words)
>>> mycount2 = Counter(morewords)

# Counter对象就是一个字典，将每个单词和出现次数作为一个键值对
>>> mycount1
Counter({'eyes': 3, 'my': 2, 'in': 1, 'will': 1, 'look': 1, 'into': 1, 'you': 1, 'see': 1, 'your': 1})
>>> mycount2
Counter({'looking': 1, 'are': 1, 'not': 1, 'in': 1, 'eyes': 1, 'you': 1, 'why': 1, 'my': 1})
>>> mycount1['eyes']
3
>>> mycount1['eyes'] + 1
4

# 取出出现次数最多的2个单词，返回一个列表
>>> top_two = mycount1.most_common(2)
>>> top_two
[('eyes', 3), ('my', 2)]

#Counter对象可以进行加减运算
>>> c = mycount1 + mycount2
>>> c
Counter({'eyes': 4, 'my': 3, 'in': 2, 'you': 2, 'looking': 1, 'your': 1, 'look': 1, 'will': 1, 'see': 1, 'are': 1, 'not': 1, 'into': 1, 'why': 1})
>>> d = mycount1 - mycount2
>>> d
Counter({'eyes': 2, 'your': 1, 'look': 1, 'into': 1, 'will': 1, 'my': 1, 'see': 1})

posted @ 2018-07-26 17:19 坚强的小蚂蚁阅读(262) 评论(0) 收藏举报

刷新页面返回顶部

坚强的小蚂蚁

每天都要有收获

处理序列的几个小技巧：保持原序去重，命名切片以及Counter类

一. 去重并保持原来元素的顺序

二. 命名切片

三. 找出序列中出现最多次数的元素，使用collections模块中的Counter类来处理

公告