Attention Is All You Need 一些好的资料

 

 

 

 

 

 

 The encoders are all identical in structure (yet they do not share weights). Each one is broken down into two sub-layers:

 

 

 

 

 

 

https://kexue.fm/archives/4765

https://jalammar.github.io/illustrated-transformer/

http://nlp.seas.harvard.edu/2018/04/03/attention.html

https://colab.research.google.com/github/tensorflow/tensor2tensor/blob/master/tensor2tensor/notebooks/hello_t2t.ipynb#scrollTo=r6GPPFy1fL2N

posted @ 2018-12-04 10:56  乐乐章  阅读(173)  评论(0编辑  收藏  举报