2023 年 8月 6 日随笔档案 - Peg_Wu

2023年8月6日

摘要： 1. How to make self-attention efficent? 当我们的输入序列非常长时，self-attention会主导整个网络的计算！方式一：Local Attention / Truncated Attention 方式二：Stride Attention 方式三：Glob 阅读全文

posted @ 2023-08-06 17:49 Peg_Wu 阅读(30) 评论(0) 推荐(0) 编辑

Lecture 5 -- Transformer

摘要： 1. Seq2seq Transformer是一种Seq2seq模型 2. Model Architecture A. Encoder B. Decoder (AT & NAT) 由于Decoder是一个一个vector输出的，因此self-attention转变为了masked self-atte 阅读全文

posted @ 2023-08-06 01:06 Peg_Wu 阅读(12) 评论(0) 推荐(0) 编辑

peg-wu

公告