欢迎来到Louis的博客

人生三从境界:昨夜西风凋碧树,独上高楼,望尽天涯路。 衣带渐宽终不悔,为伊消得人憔悴。 众里寻他千百度,蓦然回首,那人却在灯火阑珊处。
扩大
缩小

python标准库——collections模块 的Counter类

1.collections模块

collections模块自Python 2.4版本开始被引入,包含了dict、set、list、tuple以外的一些特殊的容器类型,分别是:

  • OrderedDict类:排序字典,是字典的子类。引入自2.7。
  • namedtuple()函数:命名元组,是一个工厂函数。引入自2.6。
  • Counter类:为hashable对象计数,是字典的子类。引入自2.7。
  • deque:双向队列。引入自2.4。
  • defaultdict:使用工厂函数创建字典,使不用考虑缺失的字典键。引入自2.5。

2.Counter类

Counter类的目的是用来跟踪值出现的次数。它是一个无序的容器类型,以字典的键值对形式存储,其中元素作为key,其计数作为value。计数值可以是任意的Interger(包括0和负数)。

2.1 创建

下面的代码说明了Counter类创建的四种方法:

from collections import Counter

c1 = Counter()   #创建一个空的Counter对象
c2 = Counter('collections') #从一个可interable的对象(list,tuple,dict,string等)创建
c3 = Counter({'k1': 'v1', 'k2': 'v2', 'k3':'v1'}) #从一个字典对象创建
c4 = Counter(a=4, b=2) #关键字参数生成的Counter
print(c1)
print(c2)
print(c3)
print(c4)
'''
Counter()
Counter({'c': 2, 'o': 2, 'l': 2, 'e': 1, 't': 1, 'i': 1, 'n': 1, 's': 1})
Counter({'v1': 1, 'v2': 1, 'k2': 1, 'k1': 1})
Counter({'a': 4, 'b': 2})
'''

2.2 计数值的访问与缺失的键

当所访问的键不存在时,返回0,而不是KeyError;否则返回它的计数。

from collections import Counter
c = Counter(['a', 'b', 'c'])
c['a']
Out[4]: 1
c['d']
Out[5]: 0

2.3 Counter提供的方法

class Counter(dict): #继承字典的方法
    '''Dict subclass for counting hashable items.  Sometimes called a bag
    or multiset.  Elements are stored as dictionary keys and their counts
    are stored as dictionary values.

    >>> c = Counter('abcdeabcdabcaba')  # count elements from a string #生成一个字符串的Counter对象

    >>> c.most_common(3)                # three most common elements  #列出出现次数最多的3个元素
    [('a', 5), ('b', 4), ('c', 3)]
    >>> sorted(c)                       # list all unique elements   #列出所有元素,sortd方法操作字典时只会返回键组成的列表,sortd得出的结果默认是升序
    ['a', 'b', 'c', 'd', 'e']
    >>> ''.join(sorted(c.elements()))   # list elements with repetitions #elements方法返回一个可迭代对象,sortd对齐进行排序,join方法将所有的元素进行拼接,返回一个字符串。
    'aaaaabbbbcccdde'
    >>> sum(c.values())                 # total of all counts #继承dict的values方法,取到values,再求和
    15

    >>> c['a']                          # count of letter 'a' #继承字典的方法,可以通过元素(key)取到出现的次数(value)
    5
    >>> for elem in 'shazam':           # update counts from an iterable #用一个可迭代对象更新counts
    ...     c[elem] += 1                # by adding 1 to each element's count 
    >>> c['a']                          # now there are seven 'a'
    7
    >>> del c['b']                      # remove all 'b'  #与字典删除元素一样,del dict[key],删除当前元素的统计
    >>> c['b']                          # now there are zero 'b'
    0

    >>> d = Counter('simsalabim')       # make another counter  
    >>> c.update(d)                     # add in the second counter  #使用update方法更新Counter
    >>> c['a']                          # now there are nine 'a'
    9

    >>> c.clear()                       # empty the counter    #清空Counter计数器
    >>> c
    Counter()

    Note:  If a count is set to zero or reduced to zero, it will remain #如果一个元素在计数器中的条目被设置为0或者减至0,改元素会依然保留在计速器中,只有删除该元素或者计数器被清零
    in the counter until the entry is deleted or the counter is cleared:

    >>> c = Counter('aaabbc')
    >>> c['b'] -= 2                     # reduce the count of 'b' by two
    >>> c.most_common()                 # 'b' is still in, but its count is zero
    [('a', 3), ('c', 1), ('b', 0)]

    '''
    # References:
    #   http://en.wikipedia.org/wiki/Multiset
    #   http://www.gnu.org/software/smalltalk/manual-base/html_node/Bag.html
    #   http://www.demo2s.com/Tutorial/Cpp/0380__set-multiset/Catalog0380__set-multiset.htm
    #   http://code.activestate.com/recipes/259174/
    #   Knuth, TAOCP Vol. II section 4.6.3

    def __init__(*args, **kwds):
        '''Create a new, empty Counter object.  And if given, count elements
        from an input iterable.  Or, initialize the count from another mapping
        of elements to their counts.

        >>> c = Counter()                           # a new, empty counter
        >>> c = Counter('gallahad')                 # a new counter from an iterable
        >>> c = Counter({'a': 4, 'b': 2})           # a new counter from a mapping
        >>> c = Counter(a=4, b=2)                   # a new counter from keyword args

        '''
        if not args:
            raise TypeError("descriptor '__init__' of 'Counter' object "
                            "needs an argument")
        self, *args = args
        if len(args) > 1:
            raise TypeError('expected at most 1 arguments, got %d' % len(args))
        super(Counter, self).__init__()
        self.update(*args, **kwds)

    def __missing__(self, key):
        'The count of elements not in the Counter is zero.' # 当访问的key不存在的,不是返回keyerror,而是返回改元素的计数0
# Needed so that self[missing_item] does not raise KeyError return 0 def most_common(self, n=None): '''List the n most common elements and their counts from the most #列出出现出现次数最多的元素和次数,如果n为None,则列出所有元素和出现的次数,返回元素,次数组成的列表。 common to the least. If n is None, then list all element counts. >>> Counter('abcdeabcdabcaba').most_common(3) [('a', 5), ('b', 4), ('c', 3)] ''' # Emulate Bag.sortedByCount from Smalltalk if n is None:        #n 为None时,反馈的列表为升序 return sorted(self.items(), key=_itemgetter(1), reverse=True) return _heapq.nlargest(n, self.items(), key=_itemgetter(1)) def elements(self):
     #返回一个迭代器。元素被重复了多少次,在该迭代器中就包含多少个该元素。元素排列无确定顺序,个数小于1的元素不被包含。
'''Iterator over elements repeating each as many times as its count. >>> c = Counter('ABCABC') >>> sorted(c.elements()) ['A', 'A', 'B', 'B', 'C', 'C'] # Knuth's example for prime factors of 1836: 2**2 * 3**3 * 17**1 >>> prime_factors = Counter({2: 2, 3: 3, 17: 1}) >>> product = 1 >>> for factor in prime_factors.elements(): # loop over factors ... product *= factor # and multiply them >>> product 1836 Note, if an element's count has been set to zero or is a negative number, elements() will ignore it. ''' # Emulate Bag.do from Smalltalk and Multiset.begin from C++. return _chain.from_iterable(_starmap(_repeat, self.items())) # Override dict methods where necessary @classmethod def fromkeys(cls, iterable, v=None): # There is no equivalent method for counters because setting v=1 # means that no element can have a count greater than one. raise NotImplementedError( 'Counter.fromkeys() is undefined. Use Counter(iterable) instead.') def update(*args, **kwds):

    #可以使用一个iterable对象或者另一个Counter对象来更新键值。


    #计数器的更新包括增加和减少两种。其中,增加使用update()方法:

'''Like dict.update() but add counts instead of replacing them.

        Source can be an iterable, a dictionary, or another Counter instance.

        >>> c = Counter('which')
        >>> c.update('witch')           # add elements from another iterable
        >>> d = Counter('watch')
        >>> c.update(d)                 # add elements from another counter
        >>> c['h']                      # four 'h' in which, witch, and watch
        4

        '''
        # The regular dict.update() operation makes no sense here because the
        # replace behavior results in the some of original untouched counts
        # being mixed-in with all of the other counts for a mismash that
        # doesn't have a straight-forward interpretation in most counting
        # contexts.  Instead, we implement straight-addition.  Both the inputs
        # and outputs are allowed to contain zero and negative counts.

        if not args:
            raise TypeError("descriptor 'update' of 'Counter' object "
                            "needs an argument")
        self, *args = args
        if len(args) > 1:
            raise TypeError('expected at most 1 arguments, got %d' % len(args))
        iterable = args[0] if args else None
        if iterable is not None:
            if isinstance(iterable, Mapping):
                if self:
                    self_get = self.get
                    for elem, count in iterable.items():
                        self[elem] = count + self_get(elem, 0)
                else:
                    super(Counter, self).update(iterable) # fast path when counter is empty
            else:
                _count_elements(self, iterable)
        if kwds:
            self.update(kwds)

    def subtract(*args, **kwds):
        '''Like dict.update() but subtracts counts instead of replacing them.
        Counts can be reduced below zero.  Both the inputs and outputs are
        allowed to contain zero and negative counts.

        Source can be an iterable, a dictionary, or another Counter instance.

        >>> c = Counter('which')
        >>> c.subtract('witch')             # subtract elements from another iterable
        >>> c.subtract(Counter('watch'))    # subtract elements from another counter
        >>> c['h']                          # 2 in which, minus 1 in witch, minus 1 in watch
        0
        >>> c['w']                          # 1 in which, minus 1 in witch, minus 1 in watch
        -1

        '''
        if not args:
            raise TypeError("descriptor 'subtract' of 'Counter' object "
                            "needs an argument")
        self, *args = args
        if len(args) > 1:
            raise TypeError('expected at most 1 arguments, got %d' % len(args))
        iterable = args[0] if args else None
        if iterable is not None:
            self_get = self.get
            if isinstance(iterable, Mapping):
                for elem, count in iterable.items():
                    self[elem] = self_get(elem, 0) - count
            else:
                for elem in iterable:
                    self[elem] = self_get(elem, 0) - 1
        if kwds:
            self.subtract(kwds)

    def copy(self):
        'Return a shallow copy.'
        return self.__class__(self)

2.4 Counter常用方法

# elements() 按照counter的计数,重复返回元素
>>> c = Counter(a=4, b=2, c=0, d=-2)
>>> list(c.elements())
['a', 'a', 'a', 'a', 'b', 'b']

# most_common(n) 按照counter的计数,按照降序,返回前n项组成的list; n忽略时返回全部
>>> Counter('abracadabra').most_common(3)
[('a', 5), ('r', 2), ('b', 2)]

# subtract([iterable-or-mapping]) counter按照相应的元素,计数相减
>>> c = Counter(a=4, b=2, c=0, d=-2)
>>> d = Counter(a=1, b=2, c=3, d=4)
>>> c.subtract(d)
>>> c
Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

# update([iterable-or-mapping]) 不同于字典的update方法,这里更新counter时,相同的key的value值相加而不是覆盖
# 实例化 Counter 时, 实际也是调用这个方法


# Counter 间的数学集合操作
>>> c = Counter(a=3, b=1, c=5)
>>> d = Counter(a=1, b=2, d=4)
>>> c + d                       # counter相加, 相同的key的value相加
Counter({'c': 5, 'a': 4, 'd': 4, 'b': 3})
>>> c - d                       # counter相减, 相同的key的value相减,只保留正值得value
Counter({'c': 5, 'a': 2})
>>> c & d                       # 交集:  取两者都有的key,value取小的那一个
Counter({'a': 1, 'b': 1})
>>> c | d                       # 并集:  汇聚所有的key, key相同的情况下,取大的value
Counter({'c': 5, 'd': 4, 'a': 3, 'b': 2})

常见做法:
sum(c.values())                 # 继承自字典的.values()方法返回values的列表,再求和
c.clear()                       # 继承自字典的.clear()方法,清空counter
list(c)                         # 返回key组成的list
set(c)                          # 返回key组成的set
dict(c)                         # 转化成字典
c.items()                       # 转化成(元素,计数值)组成的列表
Counter(dict(list_of_pairs))    # 从(元素,计数值)组成的列表转化成Counter
c.most_common()[:-n-1:-1]       # 最小n个计数的(元素,计数值)组成的列表
c += Counter()                  # 利用counter的相加来去除负值和0的值

 

posted on 2017-12-21 18:01  Louiszj  阅读(623)  评论(0编辑  收藏  举报

导航