足迹

能看不尽景，始是不凡人

Transformer block拆解

Transformer block拆解

基本结构

Alt text

basic参数

or : total number of transformer blocks

or : number of units in each bottleneck layer, and number of units of each Q/K/V input

or : number of heads of each transformer block

or : input sequence length

derived参数

: dimension of each attention head,

: intermediate layer units of feed forward layer,

各参数在transformer block中的详细示意图如下(可双击放大)：

Alt text

Zoom in Feed Forward子模块

Alt text

典型模型基本参数

应用	模型
NLP	GPT-3	96	12288	96	2048
NLP	BERT_Base	12	768	12	128/512
NLP	BERT_Large	24	1024	16	128/512
RecSys	BST	1	128(max)	8	20

BST: Behavior Sequence Transformer

References

posted on 2021-07-26 18:54 姚伟峰阅读(1611) 评论(0) 收藏举报

刷新页面返回顶部

导航

公告