复习2

bert结构：BERT-Base, Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters，词典大小：30522

embeding层：

token embeding:30522*768，获取每个token的初始编码

position embeding: 512*768

type embeding:2*768

layer norm:wight+bias 768*2

self_attention层：

query,key,value:(768*768+768)*3

dense：768*768+768

layer norm:wight+bias 768*2

feed_forward:

两层，先升后降

layer norm:wight+bias 768*2

bert.embeddings.word_embeddings.weight torch.Size([30522, 768]) 参数个数为： 23440896

bert.embeddings.position_embeddings.weight torch.Size([512, 768]) 参数个数为： 393216

bert.embeddings.token_type_embeddings.weight torch.Size([2, 768]) 参数个数为： 1536

bert.embeddings.LayerNorm.weight torch.Size([768]) 参数个数为： 768

bert.embeddings.LayerNorm.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.self.query.weight torch.Size([768, 768]) 参数个数为： 589824

bert.encoder.layer.0.attention.self.query.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.self.key.weight torch.Size([768, 768]) 参数个数为： 589824

bert.encoder.layer.0.attention.self.key.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.self.value.weight torch.Size([768, 768]) 参数个数为： 589824

bert.encoder.layer.0.attention.self.value.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.output.dense.weight torch.Size([768, 768]) 参数个数为： 589824

bert.encoder.layer.0.attention.output.dense.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.output.LayerNorm.weight torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.output.LayerNorm.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.intermediate.dense.weight torch.Size([3072, 768]) 参数个数为： 2359296

bert.encoder.layer.0.intermediate.dense.bias torch.Size([3072]) 参数个数为： 3072

bert.encoder.layer.0.output.dense.weight torch.Size([768, 3072]) 参数个数为： 2359296

bert.encoder.layer.0.output.dense.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.output.LayerNorm.weight torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.output.LayerNorm.bias torch.Size([768]) 参数个数为： 768

__init__ __new__ __call__:

__new__方法中的参数是cls，而__init__中的是self。当我们自己定义了__new__方法的时候，一定要返回一个super(class,cls).__new__(cls)这样的方法，不然会报错的。如下代码中__new__方法不返回内容就不会执行__init__()方法。
这个方法也是比一个特殊但是比较常用的方法，在class对象中写这个方法后，class就变成了callable了。对象可调用。

posted @ 2023-08-23 14:53 15375357604 阅读(90) 评论(0) 收藏举报

刷新页面返回顶部

15375357604

复习2

公告