Pytorch学习笔记06---- torch.nn.Embedding 词嵌入层的理解

1.word Embedding的概念理解

首先，我们先理解一下什么是Embedding。Word Embedding翻译过来的意思就是词嵌入，通俗来讲就是将文字转换为一串数字。因为数字是计算机更容易识别的一种表达形式。我们词嵌入的过程，就相当于是我们在给计算机制造出一本字典的过程。计算机可以通过这个字典来间接地识别文字。词嵌入向量的意思也可以理解成：词在神经网络中的向量表示。

2.Pytorch中的Embedding

官方文档的定义：

A simple lookup table that stores embeddings of a fixed dictionary and size.
This module is often used to store word embeddings and retrieve them using indices.
The input to the module is a list of indices, and the output is the corresponding word embeddings.

一个简单的存储固定大小的词典的嵌入向量的查找表，意思就是说，给一个编号，嵌入层就能返回这个编号对应的嵌入向量，嵌入向量反映了各个编号代表的符号之间的语义关系。该模块通常用于存储单词嵌入并使用索引检索它们。

模块的输入是索引列表，输出是相应的词嵌入。

官方文档参数说明：

def __init__(self, num_embeddings, embedding_dim, padding_idx=None,
                 max_norm=None, norm_type=2., scale_grad_by_freq=False,
                 sparse=False, _weight=None)

Args:
        num_embeddings (int): size of the dictionary of embeddings
        embedding_dim (int): the size of each embedding vector
        padding_idx (int, optional): If given, pads the output with the embedding vector at :attr:`padding_idx`
                                         (initialized to zeros) whenever it encounters the index.
        max_norm (float, optional): If given, each embedding vector with norm larger than :attr:`max_norm`
                                    is renormalized to have norm :attr:`max_norm`.
        norm_type (float, optional): The p of the p-norm to compute for the :attr:`max_norm` option. Default ``2``.
        scale_grad_by_freq (boolean, optional): If given, this will scale gradients by the inverse of frequency of
                                                the words in the mini-batch. Default ``False``.
        sparse (bool, optional): If ``True``, gradient w.r.t. :attr:`weight` matrix will be a sparse tensor.
                                 See Notes for more details regarding sparse gradients.

参数理解说明：

num_embeddings (python:int) – 词典的大小尺寸，即一个词典里要有多少个词，比如总共出现5000个词，那就输入5000。此时index为（0-4999）
embedding_dim (python:int) – 嵌入向量的维度，即用多少维来表示一个符号。
padding_idx (python:int, optional) – 填充id，比如，输入长度为100，但是每次的句子长度并不一样，后面就需要用统一的数字填充，而这里就是指定这个数字，这样，网络在遇到填充id时，就不会计算其与其它符号的相关性。（初始化为0）
max_norm (python:float, optional) – 最大范数，如果嵌入向量的范数超过了这个界限，就要进行再归一化。
norm_type (python:float, optional) – 指定利用什么范数计算，并用于对比max_norm，默认为2范数。
scale_grad_by_freq (boolean, optional) – 根据单词在mini-batch中出现的频率，对梯度进行放缩。默认为False.
sparse (bool, optional) – 若为True,则与权重矩阵相关的梯度转变为稀疏张量

输入： LongTensor (N, W), N = mini-batch, W = 每个mini-batch中提取的下标数
输出： (N, W, embedding_dim)

这个语句是创建一个词嵌入模型，num_embeddings代表一共有多少个词，embedding_dim代表你想要为每个词创建一个多少维的向量来表示它

案例解释：

import torch
from torch import nn

embedding = nn.Embedding(5, 4) # 假定字典中只有5个词，词向量维度为4
word = [[1, 2, 3],
        [2, 3, 4]] # 每个数字代表一个词，例如 {'!':0,'how':1, 'are':2, 'you':3,  'ok':4}
                    #而且这些数字的范围只能在0～4之间，因为上面定义了只有5个词
embed = embedding(torch.LongTensor(word))
print(embed) 
print(embed.size())

输出：

tensor([[[-0.4093, -1.0110,  0.6731,  0.0790],
         [-0.6557, -0.9846, -0.1647,  2.2633],
         [-0.5706, -1.1936, -0.2704,  0.0708]],

        [[-0.6557, -0.9846, -0.1647,  2.2633],
         [-0.5706, -1.1936, -0.2704,  0.0708],
         [ 0.2242, -0.5989,  0.4237,  2.2405]]], grad_fn=<EmbeddingBackward>)
torch.Size([2, 3, 4])

embed输出的维度是[2, 3, 4]，这就代表对于输入的[2,3]维的词，每一个词都被映射成了一个4维的向量。

posted @ 2020-07-25 18:12 雨后观山色阅读(4198) 评论(0) 编辑收藏举报

刷新页面返回顶部

登录后才能查看或发表评论，立即登录或者逛逛博客园首页

阅读排行：
· 分享4款.NET开源、免费、实用的商城系统
· 全程不用写代码，我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了，比商业数据库还牛
· 白话解读 Dapr 1.15：你的「微服务管家」又秀新绝活了
· 上周热点回顾（2.24-3.2）

历史上的今天：
2019-07-25 16 JQuery---JavaScript框架
2019-07-25 15 Filter过滤器和Listener监听器

公告

昵称：雨后观山色
园龄： 6年4个月
粉丝： 94
关注： 13

+加关注

2025年3月

日

一

二

三

四

五

六

雨后观山色

Pytorch学习笔记06---- torch.nn.Embedding 词嵌入层的理解

公告

搜索

常用链接

最新随笔

随笔分类

随笔档案

阅读排行榜

评论排行榜

推荐排行榜

最新评论