Python如何高效地统计数据的频率?

本文来自知乎转载~
作者:闻波
链接:https://www.zhihu.com/question/27800240/answer/122682289
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。

 1 作者:闻波
 2 链接:https://www.zhihu.com/question/27800240/answer/122682289
 3 来源:知乎
 4 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
 5 
 6 import collections
 7 import numpy as np
 8 import random
 9 import time
10 
11 
12 def list_to_dict(lst):
13     dic = {}
14     for i in lst:
15         dic[i] = lst.count(i)
16     return dic
17 
18 
19 def collect(lst):
20     return dict(collections.Counter(lst))
21 
22 
23 def unique(lst):
24     return dict(zip(*np.unique(lst, return_counts=True)))
25 
26 
27 def generate_data(num=1000000):
28     return np.random.randint(num / 10, size=num)
29 
30 
31 if __name__ == "__main__":
32     t1 = time.time()
33     lst = list(generate_data())
34     t2 = time.time()
35     print("generate_data took : %sms" % (t2 - t1))  # 本机实测0.12ms
36 
37     t1 = t2
38     d1 = unique(lst)
39     t2 = time.time()
40     print("unique took : %sms" % (t2 - t1))  # 本机实测0.42ms
41 
42     t1 = t2
43     d2 = collect(lst)
44     t2 = time.time()
45     print("collect took : %sms" % (t2 - t1))  # 本机实测1.25ms
46 
47     t1 = t2
48     d3 = list_to_dict(lst)
49     t2 = time.time()
50     print("list_to_dict took : %sms" % (t2 - t1))  # 本机实测...太慢了测不下去了
51 
52     assert(d1 == d2)
53     assert(d1 == d3)

 

posted @ 2017-09-29 21:21  chen狗蛋儿  阅读(4004)  评论(0编辑  收藏  举报