Tokenizer: BPE, WordPiece, and SentencePiece

1 Word-based Tokenizer

2 Character-based Tokenizer

3 Subword-based Tokenizer

3.1 Byte-Pair Encoding(BPE)

Byte-Level BPE

3.2 WordPiece

3.3 Unigram

3.4 SentencePiece

posted @ 2024-05-15 00:15  ForHHeart  阅读(68)  评论(0编辑  收藏  举报