nlp gensim fasttext word2vec
2022-05-09 17:03 brookin 阅读(75) 评论(0) 编辑 收藏 举报gensim train model error assert vocab_n == len(model.wv.vocab)
https://github.com/RaRe-Technologies/gensim/issues/2853
fixed in new version
pip install gensim -U
gensim train fasttext model
https://radimrehurek.com/gensim/models/word2vec.html
https://radimrehurek.com/gensim/models/fasttext.html?highlight=fasttext#module-gensim.models.fasttext
import json
import gensim
with open("train_voc.json", "r") as file:
sents = json.load(file)
model = gensim.models.fasttext.load_facebook_model("cc.de.300.bin")
model.build_vocab(sents, update=True)
model.train(corpus_iterable=sents, total_examples=len(sents), epochs=2)
gensim.models.fasttext.save_facebook_model(model, "cc.de.300.tuned.bin")
train_voc.json 格式
[
[
"This",
"module",
"allows",
"training",
"word",
"embeddings",
"from",
"a",
"training",
"corpus"
],
[
"The",
"additional",
"ability",
"to",
"obtain",
"word",
"vectors",
"for",
"out-of-vocabulary",
"words"
]
]
作者:brookin
出处:http://www.cnblogs.com/brookin/
本文采用知识共享署名-非商业性使用-相同方式共享 2.5 中国大陆许可协议进行许可,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接。
出处:http://www.cnblogs.com/brookin/
本文采用知识共享署名-非商业性使用-相同方式共享 2.5 中国大陆许可协议进行许可,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接。