收藏两篇文章,就不自己写了: 详解Transformer (Attention Is All You Need) - 知乎 (zhihu.com) Self-Attention和Transformer - machine-learning-notes (gitbook.io)