自然语言处理术语

Posted on 2014-01-30 01:00  wintor12  阅读(297)  评论(0编辑  收藏  举报

定义来自维基百科

Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining.

 

Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. Same meanning with Part of Speech(POS).

 

Text segmentation is the process of dividing written text into meaningful units, such as wordssentences, or topics.

 

In computer sciencelexical analysis is the process of converting a sequence of characters into a sequence of tokens.

Copyright © 2024 wintor12
Powered by .NET 9.0 on Kubernetes