论文拆解:GPT-RE

论文信息:

Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi:
GPT-RE: In-context Learning for Relation Extraction using Large Language Models. EMNLP 2023: 3534-3547

摘要

% 第一段
% 研究意义 + 主流方法
In spite of the potential for ground-breaking achievements offered by large language models (LLMs) (e.g., GPT-3) via in-context learning (ICL), they still lag significantly behind fully-supervised baselines (e.g., fine-tuned BERT) in relation extraction (RE).
% 前人工作的不足
This is due to the two major shortcomings of ICL for RE:
(1) low relevance regarding entity and relation in existing sentence-level demonstration retrieval approaches for ICL; and
(2) the lack of explaining input-label mappings of demonstrations leading to poor ICL effectiveness.

% 第二段
% 本文方法
In this paper, we propose GPT-RE to successfully address the aforementioned issues by
(1) incorporating task-aware representations in demonstration retrieval; and
(2) enriching the demonstrations with gold label-induced reasoning logic.
% 实验设置 + 实验效果:概述
We evaluate GPT-RE on four widely-used RE datasets and observe that GPT-RE achieves improvements over not only existing GPT-3 baselines, but also fully-supervised baselines as in Figure 1.
% 实验效果:具体
Specifically, GPT-RE achieves SOTA performances on the Semeval and SciERC datasets, and competitive performances on the TACRED and ACE05 datasets.

% 第三段
% 科学发现
Additionally, a critical issue of LLMs revealed by previous work, the strong inclination to wrongly classify NULL examples into other pre-defined labels, is substantially alleviated by our method.
We show an empirical analysis.

引言

第1段:研究背景:GPT-3、ICL

% NLP前沿:GPT-3
The emergence of large language models (LLMs) such as GPT-3 (Brown et al., 2020; Thoppilan et al., 2022; Chowdhery et al., 2022; Rae et al., 2021; Hoffmann et al., 2022) represents a significant advancement in natural language processing (NLP).

% from 微调 to ICL
Instead of following a pretraining-and-finetuning pipeline (Devlin et al., 2019; Beltagy et al., 2019; Raffel et al., 2019; Lan et al., 2019; Zhuang et al., 2021), which finetunes a pre-trained model on a task-specific dataset in a fully-supervised manner, LLMs employ a new paradigm known as in-context learning (ICL) (Brown et al., 2020; Min et al., 2022a) which formulates an NLP task under the paradigm of language generation and makes predictions by learning from a few demonstrations.

% ICL v.s 微调
Under the framework of ICL, LLMs achieve remarkable performance rivaling previous fully-supervised methods even with only a limited number of demonstrations provided in various tasks such as solving math problems, commonsense reasoning, text classification, fact retrieval, natural language inference, and semantic parsing (Brown et al., 2020; Min et al., 2022b; Zhao et al., 2021; Liu et al., 2022b; Shin et al., 2021).

第2段:前人工作

% 前人工作:ICL + RE
Despite the overall promising performance of LLMs, the utilization of ICL for relation extraction (RE) is still suboptimal.

% 背景介绍:RE
RE is the central task for knowledge retrieval requiring a deep understanding of natural language, which seeks to identify a predefined relation between a specific entity pair mentioned in the input sentence or NULL if no relation is found.

% 背景介绍:ICL + RE
Given a test input, ICL for RE prompts the input of LLMs with the task instruction, a few demonstrations retrieved from the training data, and the test input itself.
Then LLMs generate the corresponding relation.

% 前人工作:ICL + RE
Recent research (Gutiérrez et al., 2022) has sought to apply GPT-3 ICL to biomedical RE, but the results are relatively negative and suggest that GPT-3 ICL still significantly underperforms fine-tuned models.

第3.1段:前人工作的不足

% 概述:不足
The reasons that cause the pitfall of GPT-3 ICL in RE are two folds:

% 不足1:实体和关系的低相关性
(1) The low relevance regarding entity and relation in the retrieved demonstrations for ICL.

% 不足1:实体和关系的低相关性:仅考虑句向量
Demonstrations are selected randomly or via k-nearest neighbor (kNN) search based on sentence embedding (Liu et al., 2022b; Gutiérrez et al., 2022).

% 不足1:实体和关系的低相关性:仅考虑句向量:未考虑实体和关系
Regrettably, kNN-retrieval based on sentence embedding is more concerned with the relevance of the overall sentence semantics and not as much with the specific entities and relations it contains, which leads to low-quality demonstrations.

% 不足1:实体和关系的低相关性:仅考虑句向量:未考虑实体和关系:举例说明
As shown in Figure 2, the test input retrieves a semantically similar sentence but is not desired in terms of entities and relations.

第3.2段:前人工作的不足

% 不足2:缺少“输入-标签”映射
(2) The lack of explaining input-label mappings in demonstrations leads to poor ICL effectiveness: A vanilla form of ICL lists all demonstrations as input-label pairs without any explanations.

% 不足2:缺少“输入-标签”映射:LLMs仅从表面线索学习
This may mislead LLMs to learn shallow clues from surface words, while a relation can be presented in diverse forms due to language complexity.

% 不足2:缺少“输入-标签”映射:LLMs仅从表面线索学习:提高每个示例的质量
Especially when ICL has a maximal input length, optimizing the learning efficiency of each single demonstration becomes extremely important.

第4.1段:本文工作

% 动机
To this end, we propose GPT-RE for the RE task.

% 概述:检索 + 推理
GPT-RE employs two strategies to resolve the issues above: (1) task-aware retrieval and (2) gold label-induced reasoning.

% 方法1:任务感知检索:概述
For (1) task-aware retrieval, its core is to use representations that deliberately encode and emphasize entity and relation information rather than sentence embedding for kNN search.

% 方法1:任务感知检索:具体
We achieve this by two different retrieval approaches: (a) entity-prompted sentence embedding; (b) fine-tuned relation representation, which naturally places emphasis on entities and relations.

% 方法1:任务感知检索:优势
Both methods contain more RE-specific information than sentence semantics, thus effectively addressing the problem of low relevance.

第4.2段:本文工作

% 方法2:“input-label”推理:概述
For (2) gold label-induced reasoning, we propose to inject the reasoning logic into the demonstration to provide more evidence to align an input and the label, a strategy akin to the Chain-of-Thought (CoT) research (Wei et al., 2022; Wang et al., 2022b; Kojima et al., 2022).

% 方法2:“input-label”推理:具体、区别
But different from previous work, we allow LLMs to elicit the reasoning process to explain not only why a given sentence should be classified under a particular label but also why a NULL example should not be assigned to any of the pre-defined categories.

% 方法2:“input-label”推理:优势
This process significantly improves the ability of LLMs to align the relations with diverse expression forms.

第5.1段:实验效果

% 提出问题:关系幻觉
Recent work reveals another crucial problem named “overpredicting” as shown in Figure 3: we observe that LLMs have the strong inclination to wrongly classify NULL examples into other predefined labels.

% 关系幻觉:相关工作
A similar phenomenon has also been observed in other tasks such as NER (Gutiérrez et al., 2022; Blevins et al., 2022).

% 本文方法:实验效果
In this paper, we show that this issue can be alleviated if the representations for retrieval can be supervised with the whole set of NULL in the training data.

第5.2段:实验效果

% 实验设置:RE
We evaluate our proposed method on three popular general domain RE datasets: Semeval 2010 task 8, TACRED and ACE05, and one scientific domain dataset SciERC.

% 实验效果:概述:超越:GPT-3基线模型+传统微调模型
We observe that GPT-RE achieves improvements over not only existing GPT-3 baselines, but also fully-supervised baselines.

% 实验效果:具体:取得SOTA + 有竞争力结果
Specifically, GPT-RE achieves SOTA performances on the Semeval and SciERC datasets, and competitive performances on the TACRED and ACE05 datasets.

方法论

Task Definition

Let \(C\) denote the input context and \(e_{sub} \in C\), \(e_{obj} \in C\) denote the pair of subject and object entity.
Given a set of pre-defined relation classes \(R\), relation extraction aims to predict the relation \(y \in R\) between the pair of entities (\(e_{sub}\), \(e_{obj}\)) within the context \(C\), or if there is no pre-defined relation between them, predict \(y = NULL\).

相关工作

In-context Learning

% 带入:ICL
Recent work shows that ICL of GPT-3 (Brown et al., 2020) can perform numerous tasks when provided a few examples in a natural language prompt.
% ICL:主流工作:提示词设计、校准
Existing work focuses on various aspects to effectively utilize the advantages of GPT-3, from prompt design (Perez et al., 2021) for proper input to coherence calibration (Malkin et al., 2022) for tackling the diverse generated output.
% ICL:主流工作:示例排序、示例检索
Another research path locates in the demonstration part, including ordered prompts (Lu et al., 2022) and retrieval-based demonstrations (Rubin et al., 2022; Liu et al., 2022b; Shin et al., 2021).
% 本文区别
To the best of our knowledge, there is no previous work exploring the potential of GPT-3 on general domain RE tasks.
% 相似研究
A recent work attempts to leverage GPT-3 in biomedical information extraction (NER and RE), and reveals issues of ICL that may be detrimental to IE tasks in general.
% 本文区别与贡献
Our work succeeds in overcoming these issues to some extent and confirms the potential of GPT-3 in both general and the scientific domain RE.

Retrieval-based Demonstrations

% 检索:有效性
Several studies have demonstrated that dynamically selecting few-shot demonstrations for each test example, instead of utilizing a fixed set, leads to significant improvement in GPT-3 ICL (Liu et al., 2022b; Shin et al., 2021; Rubin et al., 2022).
% 检索:可证伪性
They also show that nearest neighbor in-context examples yield much better results than the farthest ones.
% 研究意义
This leads to the significance of better retrieval modules for demonstrations.
% 主流方法:句子嵌入
Existing attempts rely on sentence embedding in retrieval, including the sentence encoders of PLMs such as BERT (Devlin et al., 2019), RoBERTa (Zhuang et al., 2021), KATE (Liu et al., 2022b), SimCSE (Gao et al., 2021), Sentence-BERT (Reimers and Gurevych, 2019; Wolf et al., 2020).
% 本文区别:微调编码器
Unlike these sentence embeddings, we propose to fine-tune PLMs on our target RE tasks to produce more task-specific and robust representations for retrieval.

结论

% 本文工作:概述
This work explores the potential of GPT-3 ICL on RE for bridging the performance gap to the fine-tuning baselines via two strategies:
% 概述:方法1
(1) task-aware demonstration retrieval emphasizes entity and relation information for improving the accuracy of searching demonstrations;
% 概述:方法2
(2) gold label-induced reasoning enriches the reasoning evidence of each demonstration.
% 有效性 + 贡献
To the best of our knowledge, GPT-RE is the first GPT-3 ICL research that significantly outperforms the fine-tuning baseline on three datasets and achieves SOTA on Semeval and SciERC.
% 分析:科学发现
We implement detailed studies to explore how GPT-3 overcomes the difficulties such as NULL example influence.

posted @ 2024-08-24 09:15  健康平安快乐  阅读(32)  评论(0编辑  收藏  举报