Proj CJI Paper Reading: OffsetBias: Leveraging Debiased Data for Tuning Evaluators

目的： reduce bias of LLMs(length, concreteness, empty reference, content continuation, nested instruction, familiar knowledge)
Tool:

OffsetBias： pairwise preference dataset
EVALBiasBench：meta-evaluation benchmark

Method:

OffsetBias
使用GPT4生成off-topic(完全无关的话题）
用GPT3.5生成遵照off-topic回答的bad response
用good response, bad response来微调模型，减少bias
EvalbiasBench
分析存在的meta-evaluation中的错误，归类这些bias
人工构造每种bias对应的prompts，再使用测试验证这些prompts输出错误的概率确实比较高
用来验证bias是否存在

注意：这里off-topic不会作为用于防止注入的data

Abstract

Github: https://github.com/ncsoft/offsetbias

5. Experimental Setup

5.1 Model Description

Judge models
1. Base-data
2. Basedata + OFFSETBIAS
Base-data: 268k human preference dataset
- Ultrafeedback (Cui et al., 2023)： single scoring task
- Helpsteer (Wang et al., 2023b): single scoring task
- HH-RLHF-Helpful-Online, HH-RLHF-Harmless-Base (Bai et al., 2022): pairwise comparison
- a subset of PKU-SafeRLHF (Dai et al., 2024): pairwise comparison
- 加入single scoring task能显著提升pairwise comparison的效果
- augmenting data to stress position bias：把pair调转过来, double the size of dataset
Test OFFSETBias on training reward model
- 意义
  - Q: 消除judge model中prompting和feedback generation的影响，只留下(I, Rg, Rb)的影响。(eliminate the influence of prompting and feedback generation in judge model performance, leaving only the impact of the (I, Rg, Rb) triplets of the data.
- Challenge
  - 直接训练already fine-tuned reward models with new data会导致灾难性的遗忘catastrophic forgetting
    - Solution:
      1. Warm: On the benefits of weight averaged reward models
      2. 挑选FsfiarX-LLaMA3-RM-v0.1作为original model
      3. 利用original 模型的部分训练数据和OFFSETBias一起训练中间reward model
      4. 使用SLERP方法将中间reward model和original mode结合起来成为final model

5.2 Benchmarks

Generative models
- LLMBar: a Natural subset and four Adversarial subsets, named Neighbor, GPTInst, GPTOut and Manual based on their construction methods.
- HHH-Alignment: helpfulness, honesty, harmlessness and others
- MT-Bench Human Judge: 80 prompts from the MT-Bench. Human annotators labeled 3.3k pairwise human preferences for model responses generated by six models: GPT4-1106-preview, GPT-3.5-turbo-0125, Claude-v1, Vicuna-13B, Alpaca-13B, and LLaMA-13B.
Reward model
- RewardBench: Chat, Chat Hard, Safety, and Reasoning.

5.3 Baselines

Generative model baselines:
- OpenAI’s GPT-4o-2024-05-13 and GPT-3.5-turbo0125 as proprietary baselines, PandaLM (Wang et al., 2024), AutoJ (Li et al., 2024) and Prometheus2 (Kim et al., 2024b) as state-of-theart evaluator models, and LLaMA-3-8B-Instruct (AI@Meta, 2024) as a baseline model.
- We adopt original prompt templates of the models for fair comparison
EvalBiasBench + Generative model baselines
- Phi-3-medium (Microsoft, 2024), Mixtral-8x-7Binstruct (MistralAI, 2024), LLaMA2-Chat-70B(GenAI@Meta, 2023) and LLaMA3-70B-Instruct (AI@Meta, 2024)
Reward model baselines
- Eurus-RM-7B (Yuan et al., 2024), Starling-RM-34B (Zhu et al., 2023a), RMMistral-7B and FsfairX-LLaMa3-RM (Xiong et al., 2024)

6 Experiment Results

使用了macro average(类别之间直接平均sum/类别数，而不是按照数目加权平均）
使用了metric: positional agreement rate

posted @ 2024-12-30 18:58 雪溯阅读(2) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

【推荐】还在用 ECharts 开发大屏？试试这款永久免费的开源 BI 工具！
【推荐】编程新体验，更懂你的AI，立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区，博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包，你的智能百科全书，全免费不限次数
【推荐】博客园社区专享云产品让利特惠，阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell：AI 加持，快人一步

相关博文：

· Proj. CLJ Paper Reading: A Survey on LLM-as-a-Judge

· Proj CDeepFuzz Paper Reading: Biasfinder: Metamorphic test generation to uncover bias for sentiment analysis systems

· 论文解读《From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge》

· 预训练语言模型公平性-公平性度量、去偏方法

· AutoDebias: Learning to Debias for Recommendation

阅读排行：
· DeepSeek “源神”启动！「GitHub 热点速览」
· 我与微信审核的“相爱相杀”看个人小程序副业
· 微软正式发布.NET 10 Preview 1：开启下一代开发框架新篇章
· 如何使用 Uni-app 实现视频聊天（源码，支持安卓、iOS）
· C# 集成 DeepSeek 模型实现 AI 私有化（本地部署与 API 调用教程）

历史上的今天：
2020-12-30 Proj THUDBFuzz Paper Reading: VulSeeker-Pro: Enhanced Semantic Learning Based Binary Vulnerablity Seeker with Emulation

公告

Large Visitor Globe

有事您Q我

昵称：雪溯
园龄： 10年6个月
粉丝： 29
关注： 5

<

2025年2月

>

日

一

二

三

四

五

六

26

27

28

29

30

31

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

1

2

3

4

5

6

7

8

随笔分类

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论

1. Re:Proj. THUDFuzzVis: VisFuzz: Understanding and Intervening Fuzzing with Interactive Visualization
@雪溯联系了作者在github上的邮箱，也十分感谢您。...
--席八a
2. Re:Proj. THUDFuzzVis: VisFuzz: Understanding and Intervening Fuzzing with Interactive Visualization
@雪溯谢谢，我已经复现了，把全部版本号都改成7。请问怎么联系到原作者啊，想交流一下...
--席八a
3. Re:Proj. THUDFuzzVis: VisFuzz: Understanding and Intervening Fuzzing with Interactive Visualization
作者说“应该还是他llvm 版本没装对吧。。，可能/usr/local/include 下还是老版本的？我也不确定”
--雪溯
4. Re:Proj. THUDFuzzVis: VisFuzz: Understanding and Intervening Fuzzing with Interactive Visualization
@席八a 哦对了这个callsite.h我没记错的话不是本来不在afl里么，可能要改makefile，我虽然复现了但是很久之前了...
--雪溯
5. Re:Proj. THUDFuzzVis: VisFuzz: Understanding and Intervening Fuzzing with Interactive Visualization
@席八a 请您直接联系作者吧，挺好的人...
--雪溯

支持DeepSeek的编程助手