itertools模块

itertools模块能够快速创建迭代器,提供了一些非常有用的用于操作迭代对象的函数。

官网：https://docs.python.org/3.6/library/itertools.html

无限迭代器

**`itertools.count`（start = 0，step = 1 ）**

创建一个迭代器，返回以数字start开头的均匀间隔值。通常用作map()生成连续数据点的参数。此外，用于zip()添加序列号。大致相当于：

def count(start=0, step=1):
    # count(10) --> 10 11 12 13 14 ...
    # count(2.5, 0.5) -> 2.5 3.0 3.5 ...
    n = start
    while True:
        yield n
        n += step

当使用浮点数进行计数时，有时可以通过替换乘法代码来实现更高的准确性，例如：。(start + step * ifor i in count())

`itertools.cycle`（iterable）

使迭代器返回迭代中的元素并保存每个元素的副本。当iterable耗尽时，返回保存副本中的元素。无限期地重复。大致相当于：

def cycle(iterable):
    # cycle('ABCD') --> A B C D A B C D A B C D ...
    saved = []
    for element in iterable:
        yield element
        saved.append(element)
    while saved:
        for element in saved:
              yield element


# from itertools import *
import time
start_time = time.time()
for i in cycle('abcd'):
    print(i)
    stop_time = time.time()
    if stop_time-start_time >= 1:
        break

注意，该工具包的这个成员可能需要大量的辅助存储（取决于可迭代的长度）。

`itertools.repeat`（object，times）

创建一个一遍又一遍地返回对象的迭代器。除非指定了times参数，否则无限期运行。用作map()被调用函数的不变参数的参数。还用于zip()创建元组记录的不变部分。

大致相当于：

def repeat(object, times=None):
    # repeat(10, 3) --> 10 10 10
    if times is None:
        while True:
            yield object
    else:
        for i in range(times):
            yield object

# from itertools import *
for i in repeat('abcd',3):
    print(i)

#重复的一个常见用途是为map 或zip提供常量值流
print(list(map(pow, range(10), repeat(2))))

迭代器终止于最短的输入序

**`itertools.accumulate`（iterable [，func ] ）**

创建一个迭代器，返回累积的总和，或其他二进制函数的累计结果（通过可选的func参数指定）。如果提供了func，它应该是两个参数的函数。输入可迭代的元素可以是可以被接受为func的参数的任何类型。（例如，使用add的默认操作，元素可以是包含Decimalor的任何可添加类型Fraction。）如果输入iterable为空，则输出iterable也将为空。

import operator
def accumulate(iterable, func=operator.add):
    'Return running totals'
    # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
    # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
    it = iter(iterable)
    try:
        total = next(it)
    except StopIteration:
        return
    yield total
    for element in it:
        total = func(total, element)
        yield total

# from itertools import *
for i in accumulate([1,2,3,4,5,6,7,8]):
    print(i)

func参数有很多用途。它可以设置 min()为运行最小值，max()运行最大值或 operator.mul()运行产品。可以通过累积利息和应用付款来建立摊销表。可以通过在iterable中提供初始值并仅使用func参数中的累计total来建模一阶递归关系。

**`itertools.chain`（* iterables ）**

创建一个迭代器，它返回第一个iterable中的元素，直到它耗尽，然后进入下一个iterable，直到所有的iterables都用完为止。用于将连续序列作为单个序列处理。大致相当于：

def chain(*iterables):
    # chain('ABC', 'DEF') --> A B C D E F
    for it in iterables:
        for element in it:
            yield element

`itertools.compress`（data，selectors）

创建一个迭代器，从数据中过滤元素，只返回那些在选择器中具有相应元素的元素True。当数据或选择器可迭代用尽时停止。大致相当于：

def compress(data, selectors):
    # compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F
    return (d for d, s in zip(data, selectors) if s)

# from itertools import *
for i in compress([1,2,3,4,5,6,7,8],[True,False,True,False,False]):
    print(i)

`itertools.dropwhile`（predicate，iterable）

只要predicate为真，就创建一个从迭代中删除元素的迭代器; 之后，返回每个元素。注意，在predicate首次变为false之前，迭代器不会产生任何输出，因此它可能具有较长的启动时间。大致相当于：

def dropwhile(predicate, iterable):
    # dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1
    iterable = iter(iterable)
    for x in iterable:
        if not predicate(x):
            yield x
            break
    for x in iterable:
        yield x

`itertools.filterfalse`（predicate，iterable）

创建一个迭代器，从迭代中过滤元素，只返回predicate所在的元素False。如果是predicate为None，则返回false项。大致相当于：

def filterfalse(predicate, iterable):
    # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8
    if predicate is None:
        predicate = bool
    for x in iterable:
        if not predicate(x):
            yield x

`itertools.groupby`（iterable，key = None ）：分组

创建一个从迭代中返回连续键和组的迭代器。关键是计算每个元素的键值的函数。如果未指定或是None，则键默认为标识函数并返回元素不变。通常，可迭代需要已经在相同的键函数上排序。

操作groupby()类似于uniqUnix中的过滤器。每次键函数的值发生变化时，它都会生成一个中断或新组（这就是为什么通常需要使用相同的键函数对数据进行排序）。这种行为不同于SQL的GROUP BY，它聚合了常见元素而不管它们的输入顺序如何。

返回的组本身是一个迭代器，它与底层的iterable共享groupby()。由于源是共享的，因此当groupby() 对象处于高级时，前一个组将不再可见。因此，如果稍后需要该数据，则应将其存储为列表，groupby() 大致相当于：

class groupby:
    # [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
    # [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
    def __init__(self, iterable, key=None):
        if key is None:
            key = lambda x: x
        self.keyfunc = key
        self.it = iter(iterable)
        self.tgtkey = self.currkey = self.currvalue = object()
    def __iter__(self):
        return self
    def __next__(self):
        while self.currkey == self.tgtkey:
            self.currvalue = next(self.it)    # Exit on StopIteration
            self.currkey = self.keyfunc(self.currvalue)
        self.tgtkey = self.currkey
        return (self.currkey, self._grouper(self.tgtkey))
    def _grouper(self, tgtkey):
        while self.currkey == tgtkey:
            yield self.currvalue
            try:
                self.currvalue = next(self.it)
            except StopIteration:
                return
            self.currkey = self.keyfunc(self.currvalue)

`itertools.islice`（）

创建一个迭代器，从迭代中返回所选元素。如果start为非零，则跳过iterable中的元素，直到达到start。之后，连续返回元素，除非将step设置为高于导致跳过项目的步骤。如果停止的None，那么迭代继续进行，直到迭代器被耗尽，如果在所有; 否则，它停在指定位置。与常规切片不同，islice()不支持start，stop或step的负值。可用于从内部结构已展平的数据中提取相关字段（例如，多行报表可在每第三行列出名称字段）。大致相当于：

def islice(iterable, *args):
    # islice('ABCDEFG', 2) --> A B
    # islice('ABCDEFG', 2, 4) --> C D
    # islice('ABCDEFG', 2, None) --> C D E F G
    # islice('ABCDEFG', 0, None, 2) --> A C E G
    s = slice(*args)
    start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1
    it = iter(range(start, stop, step))
    try:
        nexti = next(it)
    except StopIteration:
        # Consume *iterable* up to the *start* position.
        for i, element in zip(range(start), iterable):
            pass
        return
    try:
        for i, element in enumerate(iterable):
            if i == nexti:
                yield element
                nexti = next(it)
    except StopIteration:
        # Consume to *stop*.
        for i, element in zip(range(i + 1, stop), iterable):
            pass

如果是start为None，则迭代从零开始。如果是step为None，则步骤默认为1。

`itertools.starmap`（function，iterable）

创建一个迭代器，使用从iterable中获取的参数来计算函数。使用而不是map()在参数参数已经从单个可迭代的元组中分组时（数据已经“预先压缩”）。之间的差异map()和starmap()相似之处之间的区别function(a,b)和function(*c)。大致相当于：

def starmap(function, iterable):
    # starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000
    for args in iterable:
        yield function(*args)

`itertools.takewhile`（predicate，iterable）

只要predicate为true，创建一个迭代器，从迭代器返回元素。大致相当于：

def takewhile(predicate, iterable):
    # takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4
    for x in iterable:
        if predicate(x):
            yield x
        else:
            break

`itertools.tee`（iterable，n = 2 ）

从单个iterable中返回n个独立迭代器。

以下Python代码有助于解释tee的作用（尽管实际实现更复杂并且仅使用单个底层 FIFO队列）。

大致相当于：

def tee(iterable, n=2):
    it = iter(iterable)
    deques = [collections.deque() for i in range(n)]
    def gen(mydeque):
        while True:
            if not mydeque:             # when the local deque is empty
                try:
                    newval = next(it)   # fetch a new value and
                except StopIteration:
                    return
                for d in deques:        # load it to all the deques
                    d.append(newval)
            yield mydeque.popleft()
    return tuple(gen(d) for d in deques)

一旦tee()进行了拆分，原始的可迭代不应该在其他任何地方使用; 否则，迭代可以在没有通知tee对象的情况下进行。

这个itertool可能需要大量的辅助存储（取决于需要存储多少临时数据）。一般来说，如果一个迭代器使用了大部分或全部数据的另一迭代开始前，它是更快地使用 list()替代tee()。

`itertools.zip_longest`（* iterables，fillvalue =None）

创建一个聚合来自每个迭代的元素的迭代器。如果迭代的长度不均匀，则使用fillvalue填充缺失值。迭代继续，直到最长的可迭代用尽。大致相当于：

class ZipExhausted(Exception):
    pass

def zip_longest(*args, **kwds):
    # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
    fillvalue = kwds.get('fillvalue')
    counter = len(args) - 1
    def sentinel():
        nonlocal counter
        if not counter:
            raise ZipExhausted
        counter -= 1
        yield fillvalue
    fillers = repeat(fillvalue)
    iterators = [chain(it, sentinel(), fillers) for it in args]
    try:
        while iterators:
            yield tuple(map(next, iterators))
    except ZipExhausted:
        pass

如果其中一个迭代可能是无限的，那么该zip_longest() 函数应该包含一些限制调用次数的东西（例如islice()或takewhile()）。如果未指定，则 fillvalue默认为None。

组合迭代器

**`itertools.product`（* iterables，repeat = 1 ）：无序可重复排列**

输入迭代的笛卡儿乘积。

大致相当于生成器表达式中的嵌套for循环。例如，返回相同的。product(A, B)((x,y) for x in A for y inB)

嵌套循环像里程表一样循环，最右边的元素在每次迭代时前进。此模式创建了一个词典排序，以便在输入的可迭代内容进行排序时，产品元组按排序顺序发出。

要计算iterable与其自身的乘积，请使用可选的repeat关键字参数指定重复次数。例如，意思相同。product(A, repeat=4)product(A, A, A, A)

此函数大致等同于以下代码，但实际实现不会在内存中构建中间结果：

def product(*args, repeat=1):
    # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy
    # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111
    pools = [tuple(pool) for pool in args] * repeat
    result = [[]]
    for pool in pools:
        result = [x+[y] for x in result for y in pool]
    for prod in result:
        yield tuple(prod)

`itertools.permutations`（iterable，r =None）：所有可能的排序，没有重复的元素

返回迭代中元素的连续r长度排列。

如果未指定r或是None，则r默认为iterable的长度，并生成所有可能的全长排列。

排列以字典排序顺序发出。因此，如果输入iterable被排序，则排列元组将按排序顺序生成。

元素根据其位置而不是其价值被视为唯一元素。因此，如果输入元素是唯一的，则每个排列中都不会有重复值。

大致相当于：

def permutations(iterable, r=None):
    # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
    # permutations(range(3)) --> 012 021 102 120 201 210
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r > n:
        return
    indices = list(range(n))
    cycles = list(range(n, n-r, -1))
    yield tuple(pool[i] for i in indices[:r])
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
            else:
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                yield tuple(pool[i] for i in indices[:r])
                break
        else:
            return

代码permutations()也可以表示为子序列 product()，过滤以排除具有重复元素的条目（来自输入池中相同位置的条目）：

def permutations(iterable, r=None):
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    for indices in product(range(n), repeat=r):
        if len(set(indices)) == r:
            yield tuple(pool[i] for i in indices)

返回的项目数为when 或0 。n! / (n-r)!0 <= r <= nr > n。

`itertools.combinations`（iterable，r ）：按排序顺序，没有重复元素

返回输入iterable中元素的r个子序列。

组合以字典排序顺序发出。因此，如果对输入iterable进行排序，则将按排序顺序生成组合元组。

元素根据其位置而不是其价值被视为唯一元素。因此，如果输入元素是唯一的，则每个组合中将不存在重复值。

大致相当于：

def combinations(iterable, r):
    # combinations('ABCD', 2) --> AB AC AD BC BD CD
    # combinations(range(4), 3) --> 012 013 023 123
    pool = tuple(iterable)
    n = len(pool)
    if r > n:
        return
    indices = list(range(r))
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != i + n - r:
                break
        else:
            return
        indices[i] += 1
        for j in range(i+1, r):
            indices[j] = indices[j-1] + 1
        yield tuple(pool[i] for i in indices)

代码combinations()也可以表示为permutations()过滤条目后的子序列，其中元素不按排序顺序（根据它们在输入池中的位置）：

def combinations(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    for indices in permutations(range(n), r):
        if sorted(indices) == list(indices):
            yield tuple(pool[i] for i in indices)

返回的项目数为when 或0 。n! / r! / (n-r)!0 <= r <= nr > n。

`itertools.combinations_with_replacement`（iterable，r ）：按排序顺序，具有重复元素

返回输入iterable中元素的r长度子序列，允许单个元素重复多次。

组合以字典排序顺序发出。因此，如果对输入iterable进行排序，则将按排序顺序生成组合元组。

元素根据其位置而不是其价值被视为唯一元素。因此，如果输入元素是唯一的，则生成的组合也将是唯一的。

大致相当于：

def combinations_with_replacement(iterable, r):
    # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
                break
        else:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        yield tuple(pool[i] for i in indices)

代码combinations_with_replacement()也可以表示为product()过滤条目后的子序列，其中元素不按排序顺序（根据它们在输入池中的位置）：

def combinations_with_replacement(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    for indices in product(range(n), repeat=r):
        if sorted(indices) == list(indices):
            yield tuple(pool[i] for i in indices)

返回的项目的数量时。(n+r-1)! / r! / (n-1)!n > 0。

posted @ 2019-06-13 15:35 码迷-wjz 阅读(628) 评论(0) 收藏举报

刷新页面返回顶部

码迷-wjz

itertools模块

无限迭代器

itertools.count（start = 0，step = 1 ）

itertools.cycle（iterable）

itertools.repeat（object，times）

迭代器终止于最短的输入序

itertools.accumulate（iterable [，func ] ）

itertools.chain（* iterables ）

itertools.compress（data，selectors）

itertools.dropwhile（predicate，iterable）

itertools.filterfalse（predicate，iterable）

itertools.groupby（iterable，key = None ）：分组

itertools.islice（）

itertools.starmap（function，iterable）

itertools.takewhile（predicate，iterable）

itertools.tee（iterable，n = 2 ）

itertools.zip_longest（* iterables，fillvalue =None）

组合迭代器

itertools.product（* iterables，repeat = 1 ）：无序可重复排列

itertools.permutations（iterable，r =None）：所有可能的排序，没有重复的元素

itertools.combinations（iterable，r ）：按排序顺序，没有重复元素

itertools.combinations_with_replacement（iterable，r ）：按排序顺序，具有重复元素

公告

**`itertools.count`（start = 0，step = 1 ）**

`itertools.cycle`（iterable）

`itertools.repeat`（object，times）

**`itertools.accumulate`（iterable [，func ] ）**

**`itertools.chain`（* iterables ）**

`itertools.compress`（data，selectors）

`itertools.dropwhile`（predicate，iterable）

`itertools.filterfalse`（predicate，iterable）

`itertools.groupby`（iterable，key = None ）：分组

`itertools.islice`（）

`itertools.starmap`（function，iterable）

`itertools.takewhile`（predicate，iterable）

`itertools.tee`（iterable，n = 2 ）

`itertools.zip_longest`（* iterables，fillvalue =None）

**`itertools.product`（* iterables，repeat = 1 ）：无序可重复排列**

`itertools.permutations`（iterable，r =None）：所有可能的排序，没有重复的元素

`itertools.combinations`（iterable，r ）：按排序顺序，没有重复元素

`itertools.combinations_with_replacement`（iterable，r ）：按排序顺序，具有重复元素