Python Word2Vec使用训练好的模型生成词向量

# 文本文件必须是utf-8无bom格式
from gensim.models.deprecated.word2vec import Word2Vec

model = Word2Vec.load(
    './model/Word60.model')  # 3个文件放在一起：Word60.model   Word60.model.syn0.npy   Word60.model.syn1neg.npy
print("read model successful")

word_list = ['了',
            '不存在的词',
            '的',
            '我',
            '你',
            '他',
            '个',
            '1',
            '完成',
            '吃',
            '苹果',
            '香蕉',
            '词汇',
            '物理',
            '地球',
            '黑死病',
            '瘟疫',
            '', ]

for word in word_list:
    if word in model.index2word:
        vec = model[word]
        print(word,vec)
    else:
        print(word + '\t\t\t——不在词汇表里' + '\n\n')

大家也可以在这里下载中文词向量：https://github.com/Embedding/Chinese-Word-Vectors

加载方法：

wv_vocab = set()
with open(file_path, encoding='utf-8') as f:
    for line in f:
        token = line.rstrip().split(' ')[0]
        wv_vocab.add(token)
return wv_vocab

现在word2vec比较旧了，推荐大家使用预训练词向量：

预训练模型使用方法： https://github.com/huggingface/transformers

中文词向量下载：https://github.com/ymcui/Chinese-BERT-wwm

posted @ 2018-04-23 00:57 致林阅读(17457) 评论(6) 编辑收藏举报

刷新页面返回顶部

致林

github.com/haibincoder

Python Word2Vec使用训练好的模型生成词向量

公告