Text Representation: OneHot, BOW, N-grams, TF-IDF, Word2Vec, Glove, FastText, ELMO, BERT, SBERT
1 Statistical Model
1.1 One-Hot
1.2 Bag of words(BOW)
https://web.stanford.edu/class/datasci112/lectures/lecture8.pdf
1.3 N-grams
1.4 TF-IDF
2 Word Embedding(Neural Network Model)
2.1 Word2Vec
https://projector.tensorflow.org/
Continuous Bag of Words(CBOW)
Skip-Gram
The goal is to get the word vector
Trainable weight is input weight matrix and output matrix
2.2 Glove
2.3 FastText
3 ELMO(2018.02)
4 BERT(2018.10)
5 SBERT(Sentence Embedding)
Reference
https://deysusovan93.medium.com/from-traditional-to-modern-a-comprehensive-guide-to-text-representation-techniques-in-nlp-369946f67497
https://github.com/sawyerbutton/NLP-Funda-2023-Spring
https://github.com/sawyerbutton/LM-Funda-2024-Spring/blob/main/示例代码/Lesson3/LM_Lesson3_Embedding_demo.ipynb
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· DeepSeek “源神”启动!「GitHub 热点速览」
· 微软正式发布.NET 10 Preview 1:开启下一代开发框架新篇章
· 我与微信审核的“相爱相杀”看个人小程序副业
· C# 集成 DeepSeek 模型实现 AI 私有化(本地部署与 API 调用教程)
· spring官宣接入deepseek,真的太香了~