字典处理小脚本

前言：

很多人平时会收集很多字典，但是当字典的量太多的时候会不知道用哪个字典，所以就想着弄个简单的脚本，用来对收集到的字典进行去重和提取出现频率较高的内容。

 1 # -*- coding:utf-8 -*-
 2 
 3 import os
 4 from collections import Counter
 5 import sys
 6 
 7 #Version:1.0
 8 #by:reboot
 9 
10 #获取目标文件夹的路径
11 filedir = os.path.abspath('.') # 获取当前目录
12 filenames = os.listdir(filedir)
13 fname = open(filedir + '\\' + 'all_in_one.txt',"wb")
14 
15 def MergeTxt():   #合并所有txt文件
16     pyname = sys.argv[0].split("/")[-1]
17     filenames.remove(pyname) #提前将脚本文件的名字从列表中移除，省去后面的比较过程。
18     #############合并文件
19     for filename in filenames:
20         file = open(filedir+'\\{0}'.format(filename), "rb")  # 打开列表中的文件,读取文件内容
21         fname.write(file.read())  # 写入新建的文件中
22         file.close()  # 关闭列表文件
23     fname.close()
24 
25 
26 def top_dict(all_in_one,all_in_one_length): #提取TOP字典  # 参数1：未去重的所有内容组成的列表  2：列表长度
27     top = input("请输入要生成排行前几的字典：\n")
28     if top.isdigit() and int(top) <= all_in_one_length:  # 判断top是否是纯数字
29         counts = Counter(all_in_one)
30         data = counts.most_common(int(top))
31 
32         result = open(filedir + '\\' + 'Top' + top +'字典.txt',"wb")
33         for i in data:
34             result.write(list(i)[0])
35         result.close()
36 
37 
38 def data_deduplication():  #合并的基础上对数据进行去重
39     file_deduplication = open(filedir + '\\' + 'all_in_one_deduplication.txt', "wb")
40     fname_set = open(filedir + '\\' + 'all_in_one.txt',"rb").readlines()
41     set_out = list(set(fname_set)) #文本内容每一行组成的无重复列表
42     set_out_length = len(set_out)
43     for i in set_out:
44         file_deduplication.write(i)
45     file_deduplication.close()
46     print("已完成对字典的去重操作。\n")
47 
48     #进行排序，提取前几的字典
49     fname_set_length = len(list(fname_set))
50     top_dict(fname_set,fname_set_length)
51 
52 
53 if __name__ == '__main__':
54     MergeTxt()
55     print('已完成字典合并，文件名：all_in_one.txt\n')
56     data_deduplication()  #进行去重
57     input('按回车退出...')

脚本比较简单，支持用户名、密码、用户名&密码、WEB目录形式的字典。

各位也可以自行按需求进行添加功能，例如我本来有想过维护一份大字典然后转换成MD5字典的形式就可以本地维护一份彩虹表了，以前搞过一份大概270W条，后来也不知丢哪去了。。。

如果需要打包好的exe版本，可以后台回复“字典整理”进行获取。

------------------------------------------------------------------------------

欢迎分享转发。

（信安随笔）

posted @ 2020-02-07 18:48 rebootORZ 阅读(43) 评论(0) 编辑收藏举报

刷新页面返回顶部

rebootORZ

字典处理小脚本

公告