scrolls基准

长文档数据集

SCROLLS: Standardized CompaRison Over Long Language Sequences

What is SCROLLS?

SCROLLS is a suite of datasets that require synthesizing information over long texts. The benchmark includes seven natural language tasks across multiple domains, including summarization, question answering, and natural language inference.

https://www.scrolls-benchmark.com/

https://www.scrolls-benchmark.com/tasks

GovReport
Huang et al., 2021
Summarization of long reports from the Congressional Research Service and the U.S. Government Accountability Office.
jsonl.zip

SummScreenFD
Chen et al., 2021
Summarizing episodes of TV shows from their scripts.
jsonl.zip

QMSum
Zhong et al., 2021
Query-based summarization over meeting transcripts.
jsonl.zip

NarrativeQA
Kočiský et al., 2018
Question answering about entire books and movie scripts.
jsonl.zip

Qasper
Dasigi et al., 2021 
Question answering over research papers.
jsonl.zip

QuALITY
Pang et al., 2021
Multiple-choice questions over long articles and stories.
jsonl.zip

Contract NLI
Koreeda and Manning, 2021
Natural language inference over non-disclosure agreements.
jsonl.zip

数据样例：

抽取式摘要

posted on 2022-12-31 19:52 宋岳庭阅读(29) 评论(0) 编辑收藏举报