转-预训练无需注意力，扩展到4096个token不成问题，与BERT相当

https://mp.weixin.qq.com/s/rDwUK0xDXXzRQVMUAM1DFQ

这篇文章没有使用注意力，这是最大亮点。且达到了BERT效果。
基础：状态空间模型和乘法门控结构。
用了堆叠体系结构(STACK)和乘法门控体系结构(GATED)，和N-BEATS、TCN很比较像。
- 自关注的STACK相当于BERT/transformer。
- GATED是双向适应。
在NLU和长文档都做了实验。
理解 GLUE：NLU，分类、
长文档 SCROLLS：长文档，摘要、问答、推理

两个基准的网址是：
（1）https://gluebenchmark.com/
https://github.com/nyu-mll/GLUE-baselines/blob/master/download_glue_data.py

（2）https://www.scrolls-benchmark.com/tasks

文中SCROLLS实验的数据集是：QALT和CNLI

（1）Contract NLI，Koreeda and Manning, 2021，Natural language inference over non-disclosure agreements.
文档很长。但是期望输出的结果是一个句子，而且是抽取式的句子。

（2）QuALITY，Pang et al., 2021，Multiple-choice questions over long articles and stories.
这是一个问答任务。

https://www.scrolls-benchmark.com/tasks

long-range language modeling benchmark

https://readpaper.com/pdf-annotate/note?pdfId=4703485462833004545&noteId=1588376739795717888

https://www.scrolls-benchmark.com/tasks

这篇文章实验了Contract NLI和QuALITY两个数据集。

Contract NLI：Natural language inference over non-disclosure agreements. 输出的结果是个单词，如Entailment
QuALITY： Multiple-choice questions over long articles and stories.

posted on 2022-12-31 19:29 宋岳庭阅读(42) 评论(0) 编辑收藏举报