使用Python递归比较两个文件夹下所有文件内容不同的文件
比较两份源代码文件异同
- 使用Python递归比较两个文件夹下所有同名文件内容中存在不同的文件。
- 这么做的应用场景是做代码审计的时候,通过比对两份相同项目的源代码,一份是破解的版本一份是正版的版本,比对破解的版本更改了哪些代码,从而挖掘后门代码。
具体代码如下:
import os import filecmp import difflib # 定义目录路径 base_dir = "/home/viadmin/finddiffiles" wordfencedaoban_dir = os.path.join(base_dir, "wordfencedaoban") wordfencezhengban_dir = os.path.join(base_dir, "wordfencezhengban") # 获取两个文件夹中的所有文件(排除图片文件) def get_files(directory): files = [] for root, _, filenames in os.walk(directory): for filename in filenames: if not filename.lower().endswith(('.png', '.jpg', '.jpeg', '.gif', '.bmp')): files.append(os.path.join(root, filename)) return files # 获取文件列表 daoban_files = get_files(wordfencedaoban_dir) zhengban_files = get_files(wordfencezhengban_dir) # 打印获取的文件名称 print("Files in wordfencedaoban:") for file in daoban_files: print(file) print("\nFiles in wordfencezhengban:") for file in zhengban_files: print(file) # 比较文件内容 def compare_files(file1, file2): with open(file1, 'r') as f1, open(file2, 'r') as f2: lines1 = f1.readlines() lines2 = f2.readlines() diff = difflib.unified_diff(lines1, lines2, fromfile=file1, tofile=file2) return ''.join(diff) # 记录差异文件 diff_files = [] for daoban_file in daoban_files: relative_path = os.path.relpath(daoban_file, wordfencedaoban_dir) zhengban_file = os.path.join(wordfencezhengban_dir, relative_path) if not os.path.exists(zhengban_file): diff_files.append((daoban_file, "Not found in zhengban")) elif not filecmp.cmp(daoban_file, zhengban_file, shallow=False): diff_content = compare_files(daoban_file, zhengban_file) diff_files.append((daoban_file, zhengban_file, diff_content)) # 输出结果到屏幕 for diff in diff_files: print(f"Difference found: {diff[0]} vs {diff[1]}") if len(diff) > 2: print(diff[2]) print("-" * 80) # 输出结果到文本文件 output_file = os.path.join(base_dir, "diff_results.txt") with open(output_file, 'w') as f: for diff in diff_files: f.write(f"Difference found: {diff[0]} vs {diff[1]}\n") if len(diff) > 2: f.write(diff[2]) f.write("-" * 80 + "\n") print(f"Comparison completed. Results saved to {output_file}")
执行结果和方式
- 执行过程就是将上述代码保存到类似compare.py文件,然后使用python3版本执行
- 只要必要的库都安装了,执行是没有什么问题,这里为了测试,将遍历的文件名称打印出来了,可以根据实际情况注释掉!
迷茫的人生,需要不断努力,才能看清远方模糊的志向!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?