论文拆解：GPT-RE

论文信息：

Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi:
GPT-RE: In-context Learning for Relation Extraction using Large Language Models. EMNLP 2023: 3534-3547

摘要

% 第一段
% 研究意义 + 主流方法
In spite of the potential for ground-breaking achievements offered by large language models (LLMs) (e.g., GPT-3) via in-context learning (ICL), they still lag significantly behind fully-supervised baselines (e.g., fine-tuned BERT) in relation extraction (RE).
% 前人工作的不足
This is due to the two major shortcomings of ICL for RE:
(1) low relevance regarding entity and relation in existing sentence-level demonstration retrieval approaches for ICL; and
(2) the lack of explaining input-label mappings of demonstrations leading to poor ICL effectiveness.

% 第二段
% 本文方法
In this paper, we propose GPT-RE to successfully address the aforementioned issues by
(1) incorporating task-aware representations in demonstration retrieval; and
(2) enriching the demonstrations with gold label-induced reasoning logic.
% 实验设置 + 实验效果：概述
We evaluate GPT-RE on four widely-used RE datasets and observe that GPT-RE achieves improvements over not only existing GPT-3 baselines, but also fully-supervised baselines as in Figure 1.
% 实验效果：具体
Specifically, GPT-RE achieves SOTA performances on the Semeval and SciERC datasets, and competitive performances on the TACRED and ACE05 datasets.

% 第三段
% 科学发现
Additionally, a critical issue of LLMs revealed by previous work, the strong inclination to wrongly classify NULL examples into other pre-defined labels, is substantially alleviated by our method.
We show an empirical analysis.

引言

第1段：研究背景：GPT-3、ICL

% NLP前沿：GPT-3
The emergence of large language models (LLMs) such as GPT-3 (Brown et al., 2020; Thoppilan et al., 2022; Chowdhery et al., 2022; Rae et al., 2021; Hoffmann et al., 2022) represents a significant advancement in natural language processing (NLP).

% from 微调 to ICL
Instead of following a pretraining-and-finetuning pipeline (Devlin et al., 2019; Beltagy et al., 2019; Raffel et al., 2019; Lan et al., 2019; Zhuang et al., 2021), which finetunes a pre-trained model on a task-specific dataset in a fully-supervised manner, LLMs employ a new paradigm known as in-context learning (ICL) (Brown et al., 2020; Min et al., 2022a) which formulates an NLP task under the paradigm of language generation and makes predictions by learning from a few demonstrations.

% ICL v.s 微调
Under the framework of ICL, LLMs achieve remarkable performance rivaling previous fully-supervised methods even with only a limited number of demonstrations provided in various tasks such as solving math problems, commonsense reasoning, text classification, fact retrieval, natural language inference, and semantic parsing (Brown et al., 2020; Min et al., 2022b; Zhao et al., 2021; Liu et al., 2022b; Shin et al., 2021).

第2段：前人工作

% 前人工作：ICL + RE
Despite the overall promising performance of LLMs, the utilization of ICL for relation extraction (RE) is still suboptimal.

% 背景介绍：RE
RE is the central task for knowledge retrieval requiring a deep understanding of natural language, which seeks to identify a predefined relation between a specific entity pair mentioned in the input sentence or NULL if no relation is found.

% 背景介绍：ICL + RE
Given a test input, ICL for RE prompts the input of LLMs with the task instruction, a few demonstrations retrieved from the training data, and the test input itself.
Then LLMs generate the corresponding relation.

% 前人工作：ICL + RE
Recent research (Gutiérrez et al., 2022) has sought to apply GPT-3 ICL to biomedical RE, but the results are relatively negative and suggest that GPT-3 ICL still significantly underperforms fine-tuned models.

第3.1段：前人工作的不足

% 概述：不足
The reasons that cause the pitfall of GPT-3 ICL in RE are two folds:

% 不足1：实体和关系的低相关性
(1) The low relevance regarding entity and relation in the retrieved demonstrations for ICL.

% 不足1：实体和关系的低相关性：仅考虑句向量
Demonstrations are selected randomly or via k-nearest neighbor (kNN) search based on sentence embedding (Liu et al., 2022b; Gutiérrez et al., 2022).

% 不足1：实体和关系的低相关性：仅考虑句向量：未考虑实体和关系
Regrettably, kNN-retrieval based on sentence embedding is more concerned with the relevance of the overall sentence semantics and not as much with the specific entities and relations it contains, which leads to low-quality demonstrations.

% 不足1：实体和关系的低相关性：仅考虑句向量：未考虑实体和关系：举例说明
As shown in Figure 2, the test input retrieves a semantically similar sentence but is not desired in terms of entities and relations.

第3.2段：前人工作的不足

% 不足2：缺少“输入-标签”映射
(2) The lack of explaining input-label mappings in demonstrations leads to poor ICL effectiveness: A vanilla form of ICL lists all demonstrations as input-label pairs without any explanations.

% 不足2：缺少“输入-标签”映射：LLMs仅从表面线索学习
This may mislead LLMs to learn shallow clues from surface words, while a relation can be presented in diverse forms due to language complexity.

% 不足2：缺少“输入-标签”映射：LLMs仅从表面线索学习：提高每个示例的质量
Especially when ICL has a maximal input length, optimizing the learning efficiency of each single demonstration becomes extremely important.

第4.1段：本文工作

% 动机
To this end, we propose GPT-RE for the RE task.

% 概述：检索 + 推理
GPT-RE employs two strategies to resolve the issues above: (1) task-aware retrieval and (2) gold label-induced reasoning.

% 方法1：任务感知检索：概述
For (1) task-aware retrieval, its core is to use representations that deliberately encode and emphasize entity and relation information rather than sentence embedding for kNN search.

% 方法1：任务感知检索：具体
We achieve this by two different retrieval approaches: (a) entity-prompted sentence embedding; (b) fine-tuned relation representation, which naturally places emphasis on entities and relations.

% 方法1：任务感知检索：优势
Both methods contain more RE-specific information than sentence semantics, thus effectively addressing the problem of low relevance.

第4.2段：本文工作

% 方法2：“input-label”推理：概述
For (2) gold label-induced reasoning, we propose to inject the reasoning logic into the demonstration to provide more evidence to align an input and the label, a strategy akin to the Chain-of-Thought (CoT) research (Wei et al., 2022; Wang et al., 2022b; Kojima et al., 2022).

% 方法2：“input-label”推理：具体、区别
But different from previous work, we allow LLMs to elicit the reasoning process to explain not only why a given sentence should be classified under a particular label but also why a NULL example should not be assigned to any of the pre-defined categories.

% 方法2：“input-label”推理：优势
This process significantly improves the ability of LLMs to align the relations with diverse expression forms.

第5.1段：实验效果

% 提出问题：关系幻觉
Recent work reveals another crucial problem named “overpredicting” as shown in Figure 3: we observe that LLMs have the strong inclination to wrongly classify NULL examples into other predefined labels.

% 关系幻觉：相关工作
A similar phenomenon has also been observed in other tasks such as NER (Gutiérrez et al., 2022; Blevins et al., 2022).

% 本文方法：实验效果
In this paper, we show that this issue can be alleviated if the representations for retrieval can be supervised with the whole set of NULL in the training data.

第5.2段：实验效果

% 实验设置：RE
We evaluate our proposed method on three popular general domain RE datasets: Semeval 2010 task 8, TACRED and ACE05, and one scientific domain dataset SciERC.

% 实验效果：概述：超越：GPT-3基线模型+传统微调模型
We observe that GPT-RE achieves improvements over not only existing GPT-3 baselines, but also fully-supervised baselines.

% 实验效果：具体：取得SOTA + 有竞争力结果
Specifically, GPT-RE achieves SOTA performances on the Semeval and SciERC datasets, and competitive performances on the TACRED and ACE05 datasets.

方法论

Task Definition

Let \(C\) denote the input context and \(e_{sub} \in C\), \(e_{obj} \in C\) denote the pair of subject and object entity.
Given a set of pre-defined relation classes \(R\), relation extraction aims to predict the relation \(y \in R\) between the pair of entities (\(e_{sub}\), \(e_{obj}\)) within the context \(C\), or if there is no pre-defined relation between them, predict \(y = NULL\).

结论

% 本文工作：概述
This work explores the potential of GPT-3 ICL on RE for bridging the performance gap to the fine-tuning baselines via two strategies:
% 概述：方法1
(1) task-aware demonstration retrieval emphasizes entity and relation information for improving the accuracy of searching demonstrations;
% 概述：方法2
(2) gold label-induced reasoning enriches the reasoning evidence of each demonstration.
% 有效性 + 贡献
To the best of our knowledge, GPT-RE is the first GPT-3 ICL research that significantly outperforms the fine-tuning baseline on three datasets and achieves SOTA on Semeval and SciERC.
% 分析：科学发现
We implement detailed studies to explore how GPT-3 overcomes the difficulties such as NULL example influence.

posted @ 2024-08-24 09:15 健康平安快乐阅读(32) 评论(0) 编辑收藏举报

刷新页面返回顶部