The Processing Pipeline: We open a URL and read its HTML content, remove the markup and select a slice of characters; this is then tokenized and optionally converted into an nltk.Text object; we can also lowercase all the words and extract the vocabulary.

 

posted on 2019-05-15 11:11  不同的日子丶看不同的云  阅读(186)  评论(0编辑  收藏  举报