paper_reading

1、

A simple Theoretical model of importance for summarization

conference: ACL 2019

abstract:

Research on summarization has mainly been driven by empirical approaches, crafting systems to perform well on standard datasets with the notion of information Importance remaining latent. We argue that establishing theoretical models of Importance will advance our understanding of the task and help to further improve summarization systems. To this end, we propose simple but rigorous definitions of several concepts that were previously used only intuitively in summarization: Redundancy, Relevance, and Informativeness. Importance arises as a single quantity naturally unifying these concepts. Additionally, we provide intuitions to interpret the proposed quantities and experiments to demonstrate the potential of the framework to inform and guide subsequent works.

摘要的研究主要由经验方法驱动，在信息重要性概念仍然存在的情况下，精心设计系统以在标准数据集上表现良好。我们认为，建立重要性理论模型将增进我们对任务的理解，并有助于进一步改进摘要系统。为此，我们提出了几个概念的简单但严格的定义，这些概念以前仅在直观上用于总结：冗余，相关性和信息性。重要性是自然统一了这些概念的单一数量。此外，我们提供了直觉来解释建议的数量和实验，以证明该框架有潜力为后续工作提供信息和指导。

论文解析：https://zhuanlan.zhihu.com/p/76492696

要点：

2、

Simple Unsupervised Summarization by Contextual Matching

code：https://github.com/jzhou316/Unsupervised-Sentence-Summarization

conference: ACL2019

论文解析：https://zhuanlan.zhihu.com/p/112869739

abstract:

We propose an unsupervised method for sentence summarization using only language modeling. The approach employs two language models, one that is generic (i.e. pretrained), and the other that is specific to the target domain. We show that by using a product-of-experts criteria these are enough for maintaining continuous contextual matching while maintaining output fluency. Experiments on both abstractive and extractive sentence summarization data sets show promising results of our method without being exposed to any paired data.

我们提出了一种仅使用语言建模的无监督句子摘要方法。该方法采用两种语言模型，一种是通用的（即预先训练的）语言模型，另一种是特定于目标域的语言模型。我们证明，通过使用专家产品准则，这些足以维持连续的上下文匹配，同时保持输出的流畅性。在抽象句和摘录句摘要数据集上进行的实验表明，我们的方法有希望的结果，而不会暴露于任何成对的数据。

3、

Sentence Centrality Revisited for Unsupervised Summarization

conferenece：ACL2019

code：https://github.com/mswellhao/PacSum.

abstract：

Single document summarization has enjoyed renewed interest in recent years thanks to the popularity of neural network models and the availability of large-scale datasets. In this paper we develop an unsupervised approach arguing that it is unrealistic to expect large-scale and high-quality training data to be available or created for different types of summaries, domains, or languages. We revisit a popular graph-based ranking algorithm and modify how node (aka sentence) centrality is computed in two ways: (a) we employ BERT, a state-of-the-art neural representation learning model to better capture sentential meaning and (b) we build graphs with directed edges arguing that the contribution of any two nodes to their respective centrality is influenced by their relative position in a document. Experimental results on three news summarization datasets representative of different languages and writing styles show that our approach outperforms strong baselines by a wide margin.

近年来，由于神经网络模型的普及和大规模数据集的可用性，单文档摘要引起了新的兴趣。在本文中，我们开发了一种无监督的方法，认为期望为各种类型的摘要，域或语言提供或创建大规模且高质量的培训数据是不现实的。我们重新审视了一种流行的基于图的排名算法，并修改了以两种方式计算节点（aka句子）中心度的方式：（a）我们使用BERT（一种最新的神经表示学习模型）来更好地捕获句子的含义，并且（ b）我们建立有向边的图，认为任何两个节点对它们各自中心的贡献受它们在文档中的相对位置的影响。在代表不同语言和写作风格的三个新闻摘要数据集上的实验结果表明，我们的方法在很大程度上优于强基准。

4、

BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle

conference：

code：

论文解析：https://zhuanlan.zhihu.com/p/84730122

The principle of the Information Bottleneck (Tishby et al., 1999) is to produce a summary of information X optimized to predict some other relevant information Y . In this paper, we propose a novel approach to unsupervised sentence summarization by mapping the Information Bottleneck principle to a conditional language modelling objective: given a sentence, our approach seeks a compressed sentence that can best predict the next sentence. Our iterative algorithm under the Information Bottleneck objective searches gradually shorter subsequences of the given sentence while maximizing the probability of the next sentence conditioned on the summary. Using only pretrained language models with no direct supervision, our approach can efficiently perform extractive sentence summarization over a large corpus.
Building on our unsupervised extractive summarization (BottleSumEx), we then present a new approach to self-supervised abstractive summarization (BottleSumSelf ), where a transformer-based language model is trained on the output summaries of our unsupervised method. Empirical results demonstrate that our extractive method outperforms other unsupervised models on multiple automatic metrics. In addition, we find that our self-supervised abstractive model outperforms unsupervised baselines (including our own) by human evaluation along multiple attributes.

信息瓶颈的原理（Tishby等人，1999）是产生信息X的摘要，该摘要经过优化以预测其他一些相关信息Y。在本文中，我们通过将信息瓶颈原理映射到条件语言建模目标上，提出了一种新的无监督句子汇总方法：给定一个句子，我们的方法将寻找一个可以最佳预测下一个句子的压缩句子。我们在信息瓶颈目标下的迭代算法逐渐搜索给定句子的较短子序列，同时最大化以摘要为条件的下一个句子的概率。仅使用没有直接监督的经过预训练的语言模型，我们的方法就可以有效地对大型语料库进行提取式句子摘要。
然后，在我们的无监督提取摘要（BottleSumEx）的基础上，我们提出了一种新的自我监督抽象摘要（BottleSumSelf）的方法，其中在无监督方法的输出摘要上训练了基于转换器的语言模型。实证结果表明，在多种自动指标上，我们的提取方法优于其他无监督模型。此外，我们发现，通过对多个属性进行人工评估，我们的自我监督抽象模型优于无监督基线（包括我们自己的基线）。

5、

A pretrained unsupervised summarization model with theme modeling and denosing

conference：

code：

abstract:

Text summarization aims to distill essential information from a piece of text and transform it into a concise version. Existing unsupervised abstractive summarization models use recurrent neural networks framework and ignore abundant unlabeled corpora resources. In order to address these issues, we propose TED, a transformer-based unsupervised summarization system with pretraining on large-scale data. We first leverage the lead bias in news articles to pretrain the model on large-scale corpora. Then, we finetune TED on target domains through theme modeling and a denoising autoencoder to enhance the quality of summaries. Notably, TED outperforms all unsupervised abstractive baselines on NYT, CNN/DM and English Gigaword datasets with various document styles. Further analysis shows that the summaries generated by TED are abstractive and containing even higher proportions of novel tokens than those from supervised models.

文本摘要旨在从一段文本中提取基本信息，并将其转换为简洁的文本。现有的无监督抽象摘要模型使用递归神经网络框架，并且忽略了大量未标记的语料库资源。为了解决这些问题，我们提出了TED，这是一种基于变压器的无监督汇总系统，具有对大规模数据的预训练。我们首先利用新闻报道中的潜在偏差来在大型语料库上对该模型进行预训练。然后，我们通过主题建模和去噪自动编码器对目标域上的TED进行微调，以提高摘要的质量。值得注意的是，TED在具有各种文档样式的NYT，CNN / DM和English Gigaword数据集上的性能优于所有无监督的抽象基线。进一步的分析表明，TED生成的摘要是抽象的，并且比监督模型所包含的新颖令牌所占的比例更高。

6、

Extractive multi-document text summarization based on graph independent sets

conference：

abstract：

We propose a novel methodology for extractive, generic summarization of text documents. The Maximum Independent Set, which has not been used previously in any summarization study, has been utilized within the context of this study. In addition, a text processing tool, which we named KUSH, is suggested in order to preserve the semantic cohesion between sentences in the representation stage of introductory texts. Our anticipation was that the set of sentences corresponding to the nodes in the independent set should be excluded from the summary. Based on this anticipation, the nodes forming the Independent Set on the graphs are identified and removed from the graph. Thus, prior to quantification of the effect of the nodes on the global graph, a limitation is applied on the documents to be summarized. This limitation prevents repetition of word groups to be included in the summary. Performance of the proposed approach on the Document Understanding Conference (DUC-2002 and DUC-2004) datasets was calculated using ROUGE evaluation metrics. The developed model achieved a 0.38072 ROUGE performance value for 100-word summaries, 0.51954 for 200-word summaries, and 0.59208 for 400-word summaries. The values reported throughout the experimental processes of the study reveal the contribution of this innovative method.

我们提出了一种新颖的方法，用于文本文档的提取性，通用性摘要。在任何摘要研究中都未使用过的最大独立集已在本研究的范围内使用。另外，建议使用一种称为KUSH的文本处理工具，以便在介绍性文本的表示阶段保持句子之间的语义衔接。我们的预期是，应将与独立集中的节点相对应的句子集从摘要中排除。基于此预期，确定在图上形成独立集的节点并将其从图上删除。因此，在量化节点对全局图的影响之前，对要汇总的文档施加了限制。此限制阻止了要在摘要中包含的单词组的重复。使用ROUGE评估指标计算了拟议方法在文档理解会议（DUC-2002和DUC-2004）数据集上的性能。所开发的模型对于100个单词的摘要达到0.38072 ROUGE性能值，对于200个单词的摘要达到0.51954，对于400个单词的摘要达到0.59208。在整个研究实验过程中报告的值揭示了这种创新方法的贡献。

7、

Text Summarization with Pretrained Encoders

conference：

abstract：

Bidirectional Encoder Representations from Transformers (BERT; Devlin et al. 2019) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several intersentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves state-of-the-art results across the board in both extractive and abstractive settings.

双向编码器表示法（BERT; Devlin等，2019）代表了预训练语言模型的最新化身，该模型最近已推进了广泛的自然语言处理任务。在本文中，我们展示了BERT如何在文本摘要中有用地应用，并提出了提取模型和抽象模型的通用框架。我们介绍一种基于BERT的新颖文档级编码器，该编码器能够表达文档的语义并获得其句子的表示形式。我们的提取模型是在此编码器的基础上，通过堆叠几个互斥的Transformer层而构建的。为了进行抽象总结，我们提出了一种新的微调时间表，该时间表针对编码器和解码器采用了不同的优化器，以减轻两者之间的不匹配（前者经过预训练，而后者没有进行预训练）。我们还证明了两阶段的微调方法可以进一步提高所生成摘要的质量。在三个数据集上进行的实验表明，我们的模型在提取和抽象设置中均获得了最先进的结果。

8、

Cross-Task Knowledge Transfer for Query-Based Text Summarization

conference：ACL 2019

abstract：

We demonstrate the viability of knowledge transfer between two related tasks: machine reading comprehension (MRC) and query-based text summarization. Using an MRC model trained on the SQuAD1.1 dataset as a core system component, we first build an extractive query-based summarizer. For better precision, this summarizer also compresses the output of the MRC model using a novel sentence compression technique. We further leverage pre-trained machine translation systems to abstract our extracted summaries. Our models achieve state-of-the-art results on the publicly available CNN/Daily Mail and Debatepedia datasets, and can serve as simple yet powerful baselines for future systems. We also hope that these results will encourage research on transfer learning from large MRC corpora to query-based summarization.

我们展示了两个相关任务之间的知识转移的可行性：机器阅读理解（MRC）和基于查询的文本摘要。使用在SQuAD1.1数据集上训练的MRC模型作为核心系统组件，我们首先构建基于提取查询的摘要器。为了获得更高的精度，该汇总器还使用一种新颖的句子压缩技术来压缩MRC模型的输出。我们进一步利用预训练的机器翻译系统来提取提取的摘要。我们的模型在可公开获取的CNN /每日邮件和辩论数据库数据集上获得了最新的结果，并且可以作为将来系统的简单而强大的基准。我们也希望这些结果将鼓励研究从大型MRC语料库到基于查询的摘要的转移学习。

9、

Conditional Self-Attention for Query-based Summarization

conference：2020 未发表

abstract：

Self-attention mechanisms have achieved great success on a variety of NLP tasks due to its flexibility of capturing dependency between arbitrary positions in a sequence. For problems such as query-based summarization (Qsumm) and knowledge graph reasoning where each input sequence is associated with an extra query, explicitly modeling such conditional contextual dependencies can lead to a more accurate solution, which however cannot be captured by existing self-attention mechanisms. In this paper, we propose conditional self-attention (CSA), a neural network module designed for conditional dependency modeling. CSA works by adjusting the pairwise attention between input tokens in a self-attention module with the matching score of the inputs to the given query. Thereby, the contextual dependencies modeled by CSA will be highly relevant to the query. We further studied variants of CSA defined by different types of attention. Experiments on Debatepedia and HotpotQA benchmark datasets show CSA consistently outperforms vanilla Transformer and previous models for the Qsumm problem.

自我关注机制在捕获序列中任意位置之间的依赖关系方面具有灵活性，因此在各种NLP任务上都取得了巨大成功。对于诸如基于查询的摘要（Qsumm）和知识图推理之类的问题（其中每个输入序列与一个额外的查询相关联），对此类条件性上下文依存关系进行显式建模可以导致更准确的解决方案，但是现有的自我注意无法捕获这些问题。机制。在本文中，我们提出了条件自注意（CSA），一种用于条件依赖建模的神经网络模块。 CSA通过使用给定查询的输入匹配分数来调整自我注意模块中输入标记之间的成对注意来工作。因此，由CSA建模的上下文相关性将与查询高度相关。我们进一步研究了通过不同类型的注意力定义的CSA变体。在辩论和HotpotQA基准数据集上进行的实验表明，CSA始终优于Vanilla Transformer和以前的Qsumm问题模型。

10、

Convolutional Hierarchical Attention Network for Query-Focused Video Summarization

conference：

abstract：

Previous approaches for video summarization mainly concentrate on finding the most diverse and representative visual contents as video summary without considering the user’s preference. This paper addresses the task of query-focused video summarization, which takes user’s query and a long video as inputs and aims to generate a query-focused video summary. In this paper, we consider the task as a problem of computing similarity between video shots and query. To this end, we propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module. In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot. The encoded features will be sent to query-relevance computing module to generate query-focused video summary. Extensive experiments on the benchmark dataset demonstrate the competitive performance and show the effectiveness of our approach.

以前的视频摘要方法主要着眼于找到最多样化和最具代表性的视觉内容作为视频摘要，而不考虑用户的喜好。本文解决了针对查询的视频摘要的任务，该任务以用户的查询和较长的视频作为输入，旨在生成针对查询的视频摘要。在本文中，我们将任务视为计算视频镜头和查询之间的相似度的问题。为此，我们提出了一种称为卷积层次注意力网络（CHAN）的方法，该方法由两部分组成：特征编码网络和查询相关性计算模块。在编码网络中，我们使用具有局部自我注意机制和查询感知全局注意机制的卷积网络来学习每个镜头的视觉信息。编码后的特征将被发送到查询相关性计算模块，以生成针对查询的视频摘要。在基准数据集上进行的大量实验证明了其竞争性能并证明了我们方法的有效性。

11、

Unsupervised Dual-Cascade Learning with Pseudo-Feedback Distillation for Query-Focused Extractive Summarization

conference：WWW 2020

abstract：

We propose Dual-CES – a novel unsupervised, query-focused, multi-document extractive summarizer. Dual-CES builds on top of the Cross Entropy Summarizer (CES) and is designed to better handle the tradeoff between saliency and focus in summarization. To this end, Dual-CES employs a two-step dual-cascade optimization approach with saliency-based pseudo-feedback distillation. Overall, Dual-CES significantly outperforms all other state-of-the-art unsupervised alternatives. Dual-CES is even shown to be able to outperform strong supervised summarizers.

我们提出Dual-CES-一种新颖的无监督，以查询为中心的多文档提取摘要器。 Dual-CES在交叉熵汇总器（CES）的基础上构建，旨在更好地处理显着性与摘要重点之间的折衷。为此，Dual-CES采用了基于显着性的伪反馈蒸馏的两步双级优化方法。总体而言，Dual-CES明显优于其他所有最新的无监督替代产品。 Dual-CES甚至被证明能够胜过强大的监督摘要器。

12、

Diversity driven Attention Model for Query-based Abstractive Summarization

conference：2018

abstract：

Abstractive summarization aims to generate a shorter version of the document covering all the salient points in a compact and coherent fashion. On the other hand, query-based summarization highlights those points that are relevant in the context of a given query. The encode-attend-decode paradigm has achieved notable success in machine translation, extractive summarization, dialog systems, etc. But it suffers from the drawback of generation of repeated phrases. In this work we propose a model for the query-based summarization task based on the encode-attend-decode paradigm with two key additions (i) a query attention model (in addition to document attention model) which learns to focus on different portions of the query at different time steps (instead of using a static representation for the query) and (ii) a new diversity based attention model which aims to alleviate the problem of repeating phrases in the summary. In order to enable the testing of this model we introduce a new query-based summarization dataset building on debatepedia. Our experiments show that with these two additions the proposed model clearly outperforms vanilla encode-attend-decode models with a gain of 28% (absolute) in ROUGE-L scores.

抽象性摘要旨在以紧凑和连贯的方式生成涵盖所有要点的文档的简短版本。另一方面，基于查询的摘要突出显示了与给定查询的上下文相关的那些点。编码-参与-解码范例在机器翻译，提取摘要，对话系统等方面取得了显著成功。但是，它存在生成重复短语的缺点。在这项工作中，我们提出了一种基于编码-参加-解码范例的基于查询的摘要任务模型，该模型具有两个关键的附加功能（i）查询关注模型（除文档关注模型之外），该学习模型专注于关注文档的不同部分查询在不同的时间步长（而不是使用静态表示形式进行查询），以及（ii）一种新的基于多样性的注意力模型，旨在缓解摘要中重复短语的问题。为了能够测试该模型，我们在辩论百科上引入了一个新的基于查询的摘要数据集。我们的实验表明，通过这两个添加，所提出的模型明显优于常规编码-参加-解码模型，并且ROUGE-L得分提高了28％（绝对值）。

13、

Neural Document Summarization by Jointly Learning to Score and Select Sentences

conference：ACL2018

code：https://github.com/magic282/NeuSum

论文解析：https://zhuanlan.zhihu.com/p/85677258

abstract：

Sentence scoring and sentence selection are two main steps in extractive document summarization systems. However, previous works treat them as two separated subtasks. In this paper, we present a novel end-to-end neural network framework for extractive document summarization by jointly learning to score and select sentences. It first reads the document sentences with a hierarchical encoder to obtain the representation of sentences. Then it builds the output summary by extracting sentences one by one. Different from previous methods, our approach integrates the selection strategy into the scoring model, which directly predicts the relative importance given previously selected sentences. Experiments on the CNN/Daily Mail dataset show that the proposed framework significantly outperforms the state-of-the-art extractive summarization models.

句子评分和句子选择是提取性文档摘要系统中的两个主要步骤。但是，以前的工作将它们视为两个单独的子任务。在本文中，我们通过共同学习评分和选择句子，提出了一种新颖的端到端神经网络框架，用于提取文档摘要。它首先使用分层编码器读取文档句子，以获得句子的表示形式。然后，它通过逐句提取句子来构建输出摘要。与以前的方法不同，我们的方法将选择策略集成到评分模型中，该模型直接预测给定先前选择的句子的相对重要性。在CNN /每日邮件数据集上的实验表明，该框架明显优于最新的提取摘要模型。

14、

Searching for Effective Neural Extractive Summarization: What Works and What’s Next

conference：

abstract：

The recent years have seen remarkable success in the use of deep neural networks on text summarization. However, there is no clear understanding of why they perform so well, or how they might be improved. In this paper, we seek to better understand how neural extractive summarization systems could benefit from different types of model architectures, transferable knowledge and learning schemas. Additionally, we find an effective way to improve current frameworks and achieve the state-of the-art result on CNN/DailyMail by a large margin based on our observations and analyses. Hopefully, our work could provide more clues for future research on extractive summarization. Source code will be available on Github and our project homepage.

近年来，在将深度神经网络用于文本摘要方面已取得了巨大的成功。但是，对于它们为什么表现如此出色或如何进行改进尚无明确的了解。在本文中，我们试图更好地理解神经提取摘要系统如何从不同类型的模型体系结构，可传递的知识和学习模式中受益。此外，根据我们的观察和分析，我们找到了一种有效的方法来改善当前框架，并在CNN / DailyMail上大幅度获得最新结果。希望我们的工作可以为将来的提取摘要研究提供更多的线索。源代码将在Github和我们的项目主页上提供。

15、

Reading Like HER: Human Reading Inspired Extractive Summarization

conference：

abstract：

In this work, we re-examine the problem of extractive text summarization for long documents. We observe that the process of extracting summarization of human can be divided into two stages: 1) a rough reading stage to look for sketched information, and 2) a subsequent careful reading stage to select key sentences to form the summary. By simulating such a two-stage process, we propose a novel approach for extractive summarization. We formulate the problem as a contextual-bandit problem and solve it with policy gradient. We adopt a convolutional neural network to encode gist of paragraphs for rough reading, and a decision making policy with an adapted termination mechanism for careful reading. Experiments on the CNN and Daily-Mail datasets show that our proposed method can provide high-quality summaries with varied length, and significantly outperform the state-of-the-art extractive methods in terms of ROUGE metrics.

在这项工作中，我们重新检查长文档的提取文本摘要问题。我们观察到，提取人的摘要的过程可以分为两个阶段：1）粗略阅读阶段以查找草绘的信息，以及2）随后的仔细阅读阶段以选择关键句子以形成摘要。通过模拟这样的两个阶段的过程，我们提出了一种提取摘要的新方法。我们将该问题公式化为上下文强盗问题，并通过政策梯度来解决。我们采用卷积神经网络对段落的要点进行编码以进行粗略阅读，并采用具有适当终止机制的决策策略进行仔细阅读。在CNN和Daily-Mail数据集上进行的实验表明，我们提出的方法可以提供各种长度的高质量摘要，并且在ROUGE度量方面明显优于最新的提取方法。

16、

Self-Supervised Learning for Contextualized Extractive Summarization

conference：

abstract

Existing models for extractive summarization are usually trained from scratch with a crossentropy loss, which does not explicitly capture the global context at the document level. In this paper, we aim to improve this task by introducing three auxiliary pre-training tasks that learn to capture the document-level context in a self-supervised fashion. Experiments on the widely-used CNN/DM dataset validate the effectiveness of the proposed auxiliary tasks. Furthermore, we show that after pretraining, a clean model with simple building blocks is able to outperform previous state-ofthe-art that are carefully designed.

现有的用于提取摘要的模型通常是从头开始训练的，具有交叉熵损失，这种损失不能在文档级别上明确捕获全局上下文。在本文中，我们旨在通过引入三个辅助的预培训任务来改进此任务，这些任务可以学习以自我监督的方式捕获文档级上下文。在广泛使用的CNN / DM数据集上进行的实验验证了所提出的辅助任务的有效性。此外，我们表明，在进行预训练之后，具有简单构建模块的干净模型能够胜过之前经过精心设计的最新技术。

17、

Heterogeneous Graph Neural Networks for Extractive Document Summarization

conference：ACL2020

code：https://github.com/brxx122/HeterSUMGraph

论文解析：https://zhuanlan.zhihu.com/p/139252407

https://zhuanlan.zhihu.com/p/138600416

abstract：

As a crucial step in extractive document summarization, learning cross-sentence relations has been explored by a plethora of approaches. An intuitive way is to put them in the graph based neural network, which has a more complex structure for capturing inter-sentence relationships. In this paper, we present a heterogeneous graph-based neural network for extractive summarization (HETERSUMGRAPH),which contains semantic nodes of different granularity levels apart from sentences. These additional nodes act as the intermediary between sentences and enrich the cross-sentence relations. Besides, our graph structure is flexible in natural extension from a single document setting to multi-document via introducing document nodes. To our knowledge, we are the first one to introduce different types of nodes into graph-based neural networks for extractive document summarization and perform a comprehensive qualitative analysis to investigate their benefits. The code will be released on Github1.

作为提取性文档摘要中的关键步骤，学习跨句关系已通过多种方法进行了探索。一种直观的方法是将它们放在基于图的神经网络中，该网络具有更复杂的结构以捕获语句间关系。在本文中，我们提出了一种用于提取摘要的基于异构图的神经网络（HETERSUMGRAPH），该网络包含除句子外的不同粒度级别的语义节点。这些额外的节点充当句子之间的中介，并丰富了跨句关系。此外，我们的图结构通过引入文档节点，可以灵活地自然扩展从单个文档设置到多文档。据我们所知，我们是第一个将不同类型的节点引入基于图的神经网络中以进行提取文档摘要并进行全面定性分析以研究其好处的公司。该代码将在Github1上发布。

18、Single Document Summarization as Tree Induction

conference：NAACL2019

code：https://github.com/nlpyang/SUMO

论文解析：https://zhuanlan.zhihu.com/p/94424862

abstract：

本文将单文档抽取式摘要问题定义为树归纳(tree induction)问题。以前的方法依赖于语言驱动的文档表示来生成摘要，我们的模型在预测输出摘要时引入了一个多根依赖树。树中的每个根节点都是一个摘要句，其附属的子树是内容与摘要句相关或解释摘要句的句子。我们设计了一种新的迭代改进算法：通过反复细化先前迭代预测的结构来逐渐生成树。我们在两个基准数据集上进行了实验，证明了我们的summarizer可以与最先进的方法相媲美。

posted @ 2020-06-10 14:10 Joyce_song94 阅读(381) 评论(0) 编辑收藏举报

刷新页面返回顶部

Joyce_song94

paper_reading

公告