Python读CookBook之数据结构和算法

1.将序列分解为单独的变量

任何序列（可迭代的变量）都可以通过一个简单的赋值操作来分解为单独的变量。唯一的要求是变量的总数和结构要与序列相吻合

data = ["Mike", 22, 73, (2017, 12, 28)]
name, age, score, (year, month, date) = data
print(name, age, score, year, month, date)

Mike 22 73 2017 12 28

分解操作时，可以用一个用不到的变量名来丢弃某一变量

data = ["Mike", 22, 73, (2017, 12, 28)]
_, age, score, (_, _, date) = data
print(age, score, date)

22 73 28

2.从任意长度的可迭代对象中分解元素

使用*表达式可以表示被*修饰的变量代表n个元素的列表 n 可以为0 可以为无限大

record = ("Jack", 22, "15012345678", "18099883311")
name, age, *phone = record
print(name, age, phone)

Jack 22 ['15012345678', '18099883311']

注意：分解一个元素时，只能有一个被*修饰的变量

3.保存最后N个元素

from collections import deque


def search(lines, pattern, history=5):
    previous_lines = deque(maxlen=history)
    for line in lines:
        if pattern in line:
            yield line, previous_lines
        previous_lines.append(line)


if __name__ == "__main__":
    with open("D:/Test1.txt") as f:
        for line, prelines in search(f, "456", 5):
            for pline in prelines:
                print(pline, end="")
            print(line, end="")
            print("-"*20)

123
456
--------------------

collection模块的deque能很好的完成这个工作，切deque在头尾位置插入数据时时间复杂度都为 O（1）

4.找到最大最小N个元素

①找最大最小的元素

使用 min() max()函数,时间复杂度 O（n）-- n为序列的长度

num = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
maxnum = max(num)
minnum = min(num)
print(maxnum, "----", minnum)

42 ---- -4

②相对于列表长度极小（例如 N=2）

使用heapq库中的和heapify使序列成堆的形式分布，且第一个元素永远是最小的那个元素,此时，使用heappop()函数会弹出最小的那个元素，第二小的取而代之处于首元素的位置。

num = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
heap = list(num)
heapq.heapify(heap)
print(heap)
print("="*50)

print(heapq.heappop(heap))
print(heap)
print("="*50)

print(heapq.heappop(heap))
print(heap)
print("="*50)

[-4, 2, 1, 23, 7, 2, 18, 23, 42, 37, 8]
==================================================
-4
[1, 2, 2, 23, 7, 8, 18, 23, 42, 37]
==================================================
1
[2, 2, 8, 23, 7, 37, 18, 23, 42]
==================================================

该方法时间复杂度为O（logn） n 为序列长度

③N相对数组长度小（例如N = 4）

使用heapq模块中的 nlargest() nsmallest()函数，这两个函数可以接受一个key作为参数

data = [
    {"name": "Jack", "age": 21, "score": 99},
    {"name": "Ben", "age": 22, "score": 90},
    {"name": "Mark", "age": 20, "score": 72},
    {"name": "Cook", "age": 20, "score": 53},
    {"name": "Antony", "age": 23, "score": 94},
    {"name": "Chris", "age": 24, "score": 62},
    {"name": "Ken", "age": 22, "score": 81},
    {"name": "Jackie", "age": 20, "score": 85},
    {"name": "David", "age": 22, "score": 89},
    {"name": "Jackson", "age": 23, "score": 89},
    {"name": "Lucy", "age": 22, "score": 77}
]

scoreMax = heapq.nlargest(4, data, key=lambda s: s["score"])
scoreMin = heapq.nsmallest(4, data, key=lambda s: s["score"])
print(scoreMax)
print(scoreMin)

[{'name': 'Jack', 'age': 21, 'score': 99}, {'name': 'Antony', 'age': 23, 'score': 94}, {'name': 'Ben', 'age': 22, 'score': 90}, {'name': 'David', 'age': 22, 'score': 89}]
[{'name': 'Cook', 'age': 20, 'score': 53}, {'name': 'Chris', 'age': 24, 'score': 62}, {'name': 'Mark', 'age': 20, 'score': 72}, {'name': 'Lucy', 'age': 22, 'score': 77}]

上面现象可以看出，有相同数据时，优先选取顺序在前的

④当N接近于序列的大小

使用sorted()并进行切片操作

num = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
lst = sorted(num)
lstmax = lst[:8]
print(lstmax)
lstrev = sorted(num, reverse=True)
lstmin = lstrev[:8]
print(lstmin)

或

num = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]
lst = sorted(num)
lstmax = lst[:8]
print(lstmax)
lstmin = lst[-8:]
print(lstmin)

5.实现优先级队列

使用heapq（堆操作）的heappush heappop实现这一操作

import heapq


class PriorityQueue(object):
    def __init__(self):
        self._queue = []
        self._index = 0

    def push(self, item, priority):
        heapq.heappush(self._queue, (-priority, self._index, item))
        self._index += 1

    def pop(self):
        return heapq.heappop(self._queue)[-1]


class Item(object):
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return self.name


if __name__ == "__main__":
    q = PriorityQueue()
    q.push(Item("Jack"), 1)
    q.push(Item("Mike"), 2)
    q.push(Item("Ben"), 3)
    q.push(Item("David"), 1)
    for i in range(q._index):
        print(q.pop())

Ben
Mike
Jack
David

6.在字典中将键映射到多个值上：

使用collection模块中的defaultdict类来实现

当属性为list

from collections import defaultdict



d = defaultdict(list)
d["a"].append(1)
d["a"].append(1)
d["b"].append(2)
d["c"].append(3)
d["d"].append(4)

for key, values in d.items():
    print(key, ":", values)

a : [1, 1]
b : [2]
c : [3]
d : [4]

当属性为set

from collections import defaultdict


d = defaultdict(set)
d["a"].add(1)
d["a"].add(1)
d["b"].add(2)
d["c"].add(3)
d["d"].add(4)

for key, values in d.items():
    print(key, ":", values)

a : {1}
b : {2}
c : {3}
d : {4}

不过这种方法会预先建立一个空的表项

也可通过普通字典的setdefault属性来实现这个功能

d = {}
d.setdefault("a", []).append(1)
d.setdefault("a", []).append(2)
d.setdefault("b", []).append(3)
d.setdefault("c", []).append(4)

for key, values in d.items():
    print(key, ":", values)

a : [1, 2]
b : [3]
c : [4]

不过这种方法每次都会创建一个新实例 [] 或者（）

列举一个循环插入的示例：

from collections import defaultdict


d = defaultdict(list)

for key, values in pairs:
    d[key].append[values]

7.让字典保持有序

使用collection模块中的OrderedDict

from collections import OrderedDict


d = OrderedDict()
d["foo"] = 1
d["bar"] = 2
d["spam"] = 3
d["grok"] = 4
d["foo"] = 5

for k in d:
    print(k, ":", d[k])

foo : 5
bar : 2
spam : 3
grok : 4

由此可见，更改已经插入的键的值不会影响该项在排序字典中的位置

OrderedDict由一组双向链表维护，大小为普通字典内存的两倍

可适用于JSON格式文件编码时控制各字段的顺序

8.与字典有关的计算问题

prices = {
    "ACME": 45.23,
    "AAPL": 612.78,
    "IBM": 205.55,
    "HPQ": 37.20,
    "FB": 10.75
}

print(max(zip(prices.keys(), prices.values())))
print(min(zip(prices.keys(), prices.values())))
print("-"*10)
prices_sorted = sorted(zip(prices.keys(), prices.values()))
for k in prices_sorted:
    print(k)

('IBM', 205.55)
('AAPL', 612.78)
----------
('AAPL', 612.78)
('ACME', 45.23)
('FB', 10.75)
('HPQ', 37.2)
('IBM', 205.55)

zip可以反转key和value，且不改变字典原结构，属于迭代器，只能被消费一次

如果比较字典只会用key进行比较

如果我们换一种方式，操作如下

prices = {
    "ACME": 45.23,
    "AAPL": 612.78,
    "IBM": 205.55,
    "HPQ": 37.20,
    "FB": 10.75
}

minItem = min(prices, key=lambda k: prices[k])
maxItem = max(prices, key=lambda k: prices[k])
minValue = prices[minItem]
maxValue = prices[maxItem]
print(minItem, maxItem, "="*5, minValue, maxValue)

FB AAPL ===== 10.75 612.78

9.在两个字典中寻找相同点

通过keys() items() 的 + - & 计算进行操作

a = {"x": 1, "y": 2, "z": 3}
b = {"w": 10, "x": 11, "y": 2}
# Find Common Keys
print(a.keys() & b.keys())
# Find keys in a but not in b
print(a.keys() - b.keys())
# Find {keys,valus} in commom
print(a.items() & b.items())
# Create a new dictionary with certain keys removed
c = {key: a[key] for key in a.keys() - {"z", "w"}}
print(c)

{'y', 'x'}
{'z'}
{('y', 2)}
{'y': 2, 'x': 1}

10.从序列中移除重复项目且保持元素间顺序不变

如过序列中的值可哈希 ---- 生存期内不可变的对象，有一个__hash__()方法，如整数、浮点数、字符串、元组

def dedupe(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)


a = [1, 5, 2, 1, 9, 1, 5, 10]
lst = list(dedupe(a))
print(lst)

[1, 5, 2, 9, 10]

如果值不可哈希

def dedupe(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)


b = [{"x": 1, "y": 2},
     {"x": 1, "y": 3},
     {"x": 1, "y": 2},
     {"x": 2, "y": 4},
     ]
lst = list(dedupe(b, key=lambda k: (k["x"], k["y"])))
print(lst)
lst2 = list(dedupe(b, key=lambda k: (k["x"])))
print(lst2)

想办法将不可哈希的项改为可哈希的项

set也可以去重复，但是无法保证原来的顺序不变

11.对切片命名

s = "Hello World"
a = slice(2, 5)
print(s[a])

llo

可以使用indice(size)将slice限定在安全的范围内

s = "HelloWorld"
a = slice(5, 50, 2)
print(a.start)
print(a.stop)
print(a.step)
print(a.indices(len(s)))
for i in range(*a.indices(len(s))):
    print(s[i])

5
50
2
(5, 10, 2)
W
r
d

这样就不会因为切片的大小问题出现IndexError

12.找出序列中出现最多次数的元素

collection中的Counter类实现此功能

from collections import Counter


words = [
    'ear', 'head', 'nose', 'ear', 'look', 'see',
    'head', 'ear', 'nose', 'ear', 'read', 'see',
    'head', 'see', 'watch', 'look', 'hair', 'see',
    'ear', 'big', 'small', 'do', 'hair', 'nose',
    'head', 'big', 'large', 'ear', 'do', 'ear'
]

word_counter = Counter(words)
most_three_couter = word_counter.most_common(3)
print(most_three_couter)

[('ear', 7), ('head', 4), ('see', 4)]

手动增加计数

words = [
    'ear', 'head', 'nose', 'ear', 'look', 'see',
    'head', 'ear', 'nose', 'ear', 'read', 'see',
    'head', 'see', 'watch', 'look', 'hair', 'see',
    'ear', 'big', 'small', 'do', 'hair', 'nose',
    'head', 'big', 'large', 'ear', 'do', 'ear'
]

addwords = ['ear', 'head', 'small', 'big', 'do']

word_counter = Counter(words)

for word in addwords:
    word_counter[word] += 1

print(word_counter)

或

words = [
    'ear', 'head', 'nose', 'ear', 'look', 'see',
    'head', 'ear', 'nose', 'ear', 'read', 'see',
    'head', 'see', 'watch', 'look', 'hair', 'see',
    'ear', 'big', 'small', 'do', 'hair', 'nose',
    'head', 'big', 'large', 'ear', 'do', 'ear'
]

addwords = ['ear', 'head', 'small', 'big', 'do']

word_counter = Counter(words)

word_counter.update(addwords)

print(word_counter)

Counter({'ear': 8, 'head': 5, 'see': 4, 'nose': 3, 'big': 3, 'do': 3, 'look': 2, 'hair': 2, 'small': 2, 'read': 1, 'watch': 1, 'large': 1})

Counter可以做加减法

words = [
    'ear', 'head', 'nose', 'ear', 'look', 'see',
    'head', 'ear', 'nose', 'ear', 'read', 'see',
    'head', 'see', 'watch', 'look', 'hair', 'see',
    'ear', 'big', 'small', 'do', 'hair', 'nose',
    'head', 'big', 'large', 'ear', 'do', 'ear'
]

addwords = ['ear', 'head', 'small', 'big', 'do']

word_counter = Counter(words)
addwords_counter = Counter(addwords)
mix = word_counter + addwords_counter
print(mix)
subtract = word_counter - addwords_counter
print(subtract)

Counter({'ear': 8, 'head': 5, 'see': 4, 'nose': 3, 'big': 3, 'do': 3, 'look': 2, 'hair': 2, 'small': 2, 'read': 1, 'watch': 1, 'large': 1})
Counter({'ear': 6, 'see': 4, 'head': 3, 'nose': 3, 'look': 2, 'hair': 2, 'read': 1, 'watch': 1, 'big': 1, 'do': 1, 'large': 1})

13.通过公共键对字典列表排序

使用operator中的itemgetter函数进行排序

from operator import itemgetter

data = [
    {'ID': 1, "Name": "Ben", "score": 88},
    {'ID': 2, "Name": "Jack", "score": 92},
    {'ID': 3, "Name": "Mike", "score": 73},
    {'ID': 4, "Name": "Mark", "score": 81},
    {'ID': 5, "Name": "Lucy", "score": 95},
]

rows_by_Name = sorted(data, key=itemgetter('Name'))
rows_by_Score = sorted(data, key=itemgetter('score'))
print(rows_by_Name)
print(rows_by_Score)

[{'ID': 1, 'Name': 'Ben', 'score': 88}, {'ID': 2, 'Name': 'Jack', 'score': 92}, {'ID': 5, 'Name': 'Lucy', 'score': 95}, {'ID': 4, 'Name': 'Mark', 'score': 81}, {'ID': 3, 'Name': 'Mike', 'score': 73}]
[{'ID': 3, 'Name': 'Mike', 'score': 73}, {'ID': 4, 'Name': 'Mark', 'score': 81}, {'ID': 1, 'Name': 'Ben', 'score': 88}, {'ID': 2, 'Name': 'Jack', 'score': 92}, {'ID': 5, 'Name': 'Lucy', 'score': 95}]

可以用lamda来代替itemgetter 但是通常itemgetter效率高

14.对不原生支持比较操作的对象排序

比如必将两个对象就可以通过对象的属性进行比较

class User(object):

    def __init__(self, id):
        self.id = id

    def __repr__(self):
        return "User({})".format(self.id)


users = [User(2), User(3), User(1)]
print(users)
user_ordered = sorted(users, key=lambda k: k.id)
print(user_ordered)

[User(2), User(3), User(1)]
[User(1), User(2), User(3)]

也可以使用operator中的attrgetter（）

from operator import attrgetter


class User(object):

    def __init__(self, id):
        self.id = id

    def __repr__(self):
        return "User({})".format(self.id)


users = [User(2), User(3), User(1)]
print(users)
user_ordered = sorted(users, key=attrgetter('id'))
print(user_ordered)

同理使用attrgetter的效率高一些

15.根据字段将记录分组

通过itemgetter与itertools模块中的groupby实现

from operator import itemgetter
from itertools import groupby


data = [
    {"Name": "Jack", "Age": 21},
    {"Name": "Ben", "Age": 22},
    {"Name": "Lucy", "Age": 23},
    {"Name": "Lily", "Age": 23},
    {"Name": "Mike", "Age": 21},
    {"Name": "Martin", "Age": 21},
    {"Name": "Susan", "Age": 23},
    {"Name": "Rose", "Age": 22},
    {"Name": "Hill", "Age": 22},
]

sortdata = sorted(data, key=itemgetter('Age'))

for age, rows in groupby(sortdata,key=itemgetter("Age")):
    print(age)
    for row in rows:
        print(" ", row)

21
  {'Name': 'Jack', 'Age': 21}
  {'Name': 'Mike', 'Age': 21}
  {'Name': 'Martin', 'Age': 21}
22
  {'Name': 'Ben', 'Age': 22}
  {'Name': 'Rose', 'Age': 22}
  {'Name': 'Hill', 'Age': 22}
23
  {'Name': 'Lucy', 'Age': 23}
  {'Name': 'Lily', 'Age': 23}
  {'Name': 'Susan', 'Age': 23}

分组之前先对序列进行排序

16.筛选序列中的元素：

内建函数fiter

def is_int(Val):
    if isinstance(Val, int):
        return True
    else:
        return False


a = [1, 2, "aaa", "b", 5, "cc"]
b = list(filter(is_int, a))
print(b)

[1, 2, 5]

itertools库中的compress

from itertools import compress


name = ["Jack", "Lucy", "Ben", "Lily", "Mike"]
age = [8, 11, 12, 9, 11]

more = [n > 10 for n in age]
print(more)

l = list(compress(name, more))
print(l)

[False, True, True, False, True]
['Lucy', 'Ben', 'Mike']

得到一组BOOL变量后使用compress

17.从字典中获取子集

字典推导式

data = {
    "a": 1,
    "b": 3,
    "c": 5,
    "d": 7,
    "f": 8
}

sondata1 = {k: v for k, v in data.items() if v > 4}
dataname = ["a", "c", "f"]
sondata2 = {k: v for k, v in data.items() if k in dataname}
print(sondata1)
print(sondata2)

{'c': 5, 'd': 7, 'f': 8}
{'a': 1, 'c': 5, 'f': 8}

18.将名称映射到序列的元素中

使用collection模块中的namedturple

from collections import  namedtuple


name = ("Data", ["Name", "Id"])
Data = namedtuple(name[0], name[1])
data = Data("Mike", 1)
print(data)


def change_data(s):
    return data._replace(**s)


a = {"Name": "Jack", "Id": 2}
b = change_data(a)
print(b)

Data(Name='Mike', Id=1)
Data(Name='Jack', Id=2)

元素不能改变，通过_replace改变，数据量大使用类__slot___方式实现

19.同时对数据做转换和运算

生成器列表

num = [1, 2, 3, 4, 5]
sum = [n * n for n in num]
print(sum)

[1, 4, 9, 16, 25]

20.将多个映射合并为单个映射

collection模块中的ChainMap

有相同的键会使用第一个字典的值，增删改查操作总是会影响第一个字典

from collections import ChainMap


a = {"x": 1, "y": 2}
b = {"z": 3, "y": 4}
c = ChainMap(a, b)
print(c)
print(c["y"])
a["x"] = 5
print(c)

ChainMap({'x': 1, 'y': 2}, {'z': 3, 'y': 4})
2
ChainMap({'x': 5, 'y': 2}, {'z': 3, 'y': 4})

同样可以建立一个用于合成两个字典的新字典使用update

a = {"x": 1, "y": 2}
b = {"z": 3, "y": 4}
c = dict(a)
c.update(b)
print(c)
a["x"] = 5
print(c)

{'x': 1, 'y': 4, 'z': 3}
{'x': 1, 'y': 4, 'z': 3}

posted @ 2017-12-29 22:50 BeBestJackie 阅读(326) 评论(0) 收藏举报

刷新页面返回顶部

BeBestJackie

天道酬勤

Python读CookBook之数据结构和算法

公告