Python如何高效地统计数据的频率?
本文来自知乎转载~
作者:闻波
链接:https://www.zhihu.com/question/27800240/answer/122682289
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
链接:https://www.zhihu.com/question/27800240/answer/122682289
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
1 作者:闻波 2 链接:https://www.zhihu.com/question/27800240/answer/122682289 3 来源:知乎 4 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。 5 6 import collections 7 import numpy as np 8 import random 9 import time 10 11 12 def list_to_dict(lst): 13 dic = {} 14 for i in lst: 15 dic[i] = lst.count(i) 16 return dic 17 18 19 def collect(lst): 20 return dict(collections.Counter(lst)) 21 22 23 def unique(lst): 24 return dict(zip(*np.unique(lst, return_counts=True))) 25 26 27 def generate_data(num=1000000): 28 return np.random.randint(num / 10, size=num) 29 30 31 if __name__ == "__main__": 32 t1 = time.time() 33 lst = list(generate_data()) 34 t2 = time.time() 35 print("generate_data took : %sms" % (t2 - t1)) # 本机实测0.12ms 36 37 t1 = t2 38 d1 = unique(lst) 39 t2 = time.time() 40 print("unique took : %sms" % (t2 - t1)) # 本机实测0.42ms 41 42 t1 = t2 43 d2 = collect(lst) 44 t2 = time.time() 45 print("collect took : %sms" % (t2 - t1)) # 本机实测1.25ms 46 47 t1 = t2 48 d3 = list_to_dict(lst) 49 t2 = time.time() 50 print("list_to_dict took : %sms" % (t2 - t1)) # 本机实测...太慢了测不下去了 51 52 assert(d1 == d2) 53 assert(d1 == d3)