transformers

transformer中的模型分类：

bert（自编码）、gpt（自回归）、bart（编码-解码）

hidden_size (d) = num_attention_heads (m) * attention_head_size (a)，也即 d=m*a，

d为transformer模型输出的维度，这个维度一般是 attention头的个数*attention_head_size

transformer的多头：

d_size = 512(隐藏层维度) n=200(序列长度) head_size = 64(attention的头的维度) head_nums = 512/64=8(attention头的个数)

Input = 200*512 通过3个W权重矩阵，权重矩阵为512*64，生成qkv3个矩阵，维度为200*64*8(个数)

sentence = "Hello, my son is cuting."
input_ids_method1 = torch.tensor(tokenizer.encode(sentence, add_special_tokens=True)) # Batch size 1 //一次性进行分词和id映射
# tensor([ 101, 7592, 1010, 2026, 2365, 2003, 3013, 2075, 1012, 102])

input_token2 = tokenizer.tokenize(sentence) //进行word piece分词
# ['hello', ',', 'my', 'son', 'is', 'cut', '##ing', '.']

input_ids_method2 = tokenizer.convert_tokens_to_ids(input_token2) // 将分词转为分词对应的ids
# tensor([7592, 1010, 2026, 2365, 2003, 3013, 2075, 1012])
# 并没有开头和结尾的标记：[cls]、[sep]

（当tokenizer.encode函数中的add_special_tokens设置为False时，同样不会出现开头和结尾标记：[cls], [sep]。）

print(tokenizer.encode_plus(sentence)) // encode_plus除了输出ids，和type mask三个字典

[101, 7592, 1010, 2026, 2365, 2003, 5870, 1012, 102]
{'input_ids': [101, 7592, 1010, 2026, 2365, 2003, 5870, 1012, 102], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}

posted @ 2023-06-16 09:58 15375357604 阅读(46) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

相关博文：

· huggface

· bert中mask

· transformer笔记

· transformer中每个阶段的张量形状

· Transformer的原理及实现

阅读排行：
· 震惊！C++程序真的从main开始吗？99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码？零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾（3.3-3.9）
· Vue3状态管理终极指南：Pinia保姆级教程

公告

昵称： 15375357604
园龄： 6年4个月
粉丝： 1
关注： 1

+加关注

2025年3月

日

一

二

三

四

五

六

15375357604

transformers

公告

搜索

常用链接

我的标签

随笔分类

随笔档案

阅读排行榜