Proj. CDeepFuzz Paper Reading: Checker Bug Detection and Repair in Deep Learning Libraries

3. TensorGuard: A RAG-Based Multi-agent framework to detect and fix DL Checker Bugs

RAG Design

relevant contextual information from a large corpus of code changes
Input: the root cause of the checker bug queried
Output: code change
Based on:
Sentence-transformers + all-MiniLM-L6-v2 as the embedding model, converts the documents to a 384-dimensional dense vector space
batch size: 50, chromadb

Checker Bug Detection Agent

COT, Zero-Shot, Few-Shot(随机选两个样例)

TABLE V: Prompt template for bug detection agent (COT).

“prompt”: You are an AI trained to detect bugs in a deep-learning
library based on commit messages and code changes. Your task
is to determine whether a given commit introduces a bug or not.
Follow the steps below to reason through the problem and arrive at
a conclusion.
1. Understand the commit message: Analyze the commit message
to understand the context and purpose of the code change.
{commit message}
2. Review the code change: Examine the deleted and
added lines of code to identify the modifications made.
{code removed}{code added}
3. Identify potential issues: Look for any missing, improper, or
insufficient checkers within the code change. Checkers may include
error handling, input validation, boundary checks, or other safety
mechanisms.
4. Analyze the impact: Consider the impact of the identified issues
on the functionality and reliability of the deep learning libraries.
5. Make a decision: Based on the above analysis, decide whether
the commit introduces a bug or not.
6. Output the conclusion: Generate a clear output of “YES” if the
commit introduces a bug, or “NO” if it does not.
“output”: {Decision}

TABLE VI: Prompt template for bug detection agent (Zero Shot).

“prompt”: You are an AI trained to detect bugs in a deep-learning
library based on commit messages and code changes. Your task
is to determine whether a given commit introduces a bug or not.
Follow the steps below to reason through the problem and arrive at
a conclusion.
Commit message: {commit message}
Code change: {code removed}{code added}
“output”: {Decision}

TABLE VII: Prompt template for bug detection agent (Few Shot).

“prompt”: You are an AI trained to detect bugs in a deep-learning
library based on commit messages and code changes. Your task
is to determine whether a given commit introduces a bug or not.
Follow the steps below to reason through the problem and arrive at
a conclusion.
Example Checker Bug One:
Commit message: {commit message}
Code change: {code removed}{code added}
Example Checker Bug Two:
Commit message: {commit message}
Code change: {code removed}{code added}
Task:
Commit message: {commit message}
Code change: {code removed}{code added}
“output”: {Decision}

Root Cause Analysis Agent

TABLE VIII: Prompt template for root cause analysis agent.

“prompt”: Please describe the root cause of the bug based on the
following commit message:{commit message}
“output”: {Root causes}

Patch Generation Agent

TABLE IX: Prompt template for patch generation agent.

“prompt”: You are given a bug explanation and an external context
for fixing a checker bug. Please think step by step and generate a
patch to fix the bug in the code snippet. Please neglect any issues
related to the indentation in the code snippet. Fixing indentation
is not the goal of this task. If you think the given pattern can be
applied, generate the patch.
Example One: {code removed} {code added}
Example Two: {code removed} {code added}
Bug explanation: {bug explanation}
Retrieved context: {retrieved knowledge}
Code snippet: {code snippet}
“output”: {Think steps}{Patch}

Data for RAG and TensorGuard Evaluation

  • RAG的训练数据:所有commits,而不仅仅是与checker相关的commits, 1.3M code changes
    • 61453 commits for PyTorch and 150352 commits for TensorFlow
    • 391,571 code changes for PyTorch and 920,108 code changes for TensorFlow
  • Test Dataset:
    • 与Checker Bug相关的the commits of PyTorch and TensorFlow from January 1, 2024 to July 20, 2024,这些commits中更改较大的commit(修改的文件超过10个),(修改的代码超过15行)
    • 在这其中,筛选了92 buggy and 135 clean DL checker-related changes.
  • Metrics for patch generation: use Precision, Recall, F1 score, and the number of correctly generated patches
  • GPT-3.5-turbo, temperature =0, run 5 times, use average
posted @   雪溯  阅读(9)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
· 被坑几百块钱后,我竟然真的恢复了删除的微信聊天记录!
· AI技术革命,工作效率10个最佳AI工具
历史上的今天:
2014-11-13 hdu 4081 Qin Shi Huang's National Road System 树的基本性质 or 次小生成树思想 难度:1
2014-11-13 快速切题 poj 3026 Borg Maze 最小生成树+bfs prim算法 难度:0
2014-11-13 poj 1258 Agri-Net 最小生成树 prim算法+heap不完全优化 难度:0
2014-11-13 快速切题 poj 2485 Highways prim算法+堆 不完全优化 难度:0
2014-11-13 poj 1789 Truck History 最小生成树 prim 难度:0
2014-11-13 快速切题 poj 2996 Help Me with the Game 棋盘 模拟 暴力 难度:0
点击右上角即可分享
微信分享提示