随笔分类 -  Python自然语言处理

not only include Python NLTK
摘要:Chapter6 Learning to Classify Text学习文本分类 Detecting patternsis a central part of Natural Language Processing(模式检测是自然语言处理的核心内容). Words ending in -ed tend to be past tense verbs (Chapter 5). Frequent use of will is indicative of news text (Chapter 3). These observable patterns — word structure and wo.. 阅读全文
posted @ 2011-08-31 14:13 牛皮糖NewPtone 阅读(3437) 评论(0) 推荐(1) 编辑
摘要:5.10Exercises 练习 ☼ Search the web for "spoof newspaper headlines", to find such gems as: British Left Waffles on Falkland Islands, and Juvenile Court to Try Shooting Defendant. Manually tag these headlines to see if knowledge of the part-of-speech tags removes the ambiguity. ☼... 阅读全文
posted @ 2011-08-30 22:51 牛皮糖NewPtone 阅读(1693) 评论(0) 推荐(0) 编辑
摘要:5.9Further Reading 深入阅读 Extra materials for this chapter are posted at http://www.nltk.org/, including links to freely available resources on the web. For more examples of tagging with NLTK, please see the Tagging HOWTO at http://www.nltk.org/howto. Chapters 4 and 5 of (Jurafsky & Martin, 2008) 阅读全文
posted @ 2011-08-30 22:49 牛皮糖NewPtone 阅读(550) 评论(0) 推荐(0) 编辑
摘要:5.8Summary小结 • Words can be grouped into classes, such as nouns, verbs, adjectives, and adverbs. These classes are known as lexical categories or parts-of-speech. Parts-of-speech are assigned short labels, or tags, such as NN and VB. 单词可以分成类,例如名词,动词,形容词以及副词。这些类被称为词汇类别或者词性。词性被赋给了短标签或者标记,例如NN或者VB。... 阅读全文
posted @ 2011-08-30 22:46 牛皮糖NewPtone 阅读(581) 评论(0) 推荐(0) 编辑
摘要:5.7How to Determine the Category of a Word 如何判断词的分类 Now that we have examined word classes in detail, we turn to a more basic question: how do we decide what category a word belongs to in the first place? In general, linguists use morphological(形态学的), syntactic(语法的), and semantic clues to determine. 阅读全文
posted @ 2011-08-30 22:45 牛皮糖NewPtone 阅读(1963) 评论(0) 推荐(0) 编辑
摘要:5.6Transformation-Based Tagging基于转换的标记 A potential issue with n-gram taggers is the size of their n-gram table (表的大小问题or language model). If tagging is to be employed in a variety of language technologies deployed on mobile computing devices, it is important to strike a balance(公平处理) between model . 阅读全文
posted @ 2011-08-30 22:40 牛皮糖NewPtone 阅读(925) 评论(0) 推荐(0) 编辑
摘要:关于Python自然语言处理关于该书的简介:《Python自然语言处理》提供了非常易学的自然语言处理入门介绍,该领域涵盖从文本和电子邮件预测过滤,到自动总结和翻译等多种语言处理技术。在《Python自然语言处理(影印版)》 中,你将学会编写Python程序处理大量非结构化文本。你还将通过使用综合语言数据结构访问含有丰富注释的数据集,理解用于分析书面通信内容和结构的主 要算法。 《Python自然语言处理》准备了充足的示例和练习,可以帮助你: 从非结构化文本中抽取信息,甚至猜测主题或识别“命名实体”; 分析文本语言结构,包括解析和语义分析; 访问流行的语言学数据库,包括Word... 阅读全文
posted @ 2011-08-29 10:44 牛皮糖NewPtone 阅读(20570) 评论(12) 推荐(6) 编辑
摘要:5.5 N-Gram Tagging N-Gram标注Unigram Tagging 一元标注Unigramtaggers are based on a simple statistical algorithm: for each token, assign thetag that is most likely for that particular token. For example, it will assignthe tag JJ to any occurrence of the word frequent,since frequent is used as anadjective ( 阅读全文
posted @ 2011-08-28 21:54 牛皮糖NewPtone 阅读(5669) 评论(0) 推荐(0) 编辑
摘要:5.4Automatic Tagging 自动标注In the rest of this chapter we will explore various ways to automatically add part-of-speech tags to text. We will see that the tag of a word depends on the word and its context within a sentence. For this reason, we will be working with data at the level of (tagged) sentenc 阅读全文
posted @ 2011-08-26 22:05 牛皮糖NewPtone 阅读(1376) 评论(2) 推荐(1) 编辑
摘要:Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE MicrosoftInternetExplorer4 ... 阅读全文
posted @ 2011-08-25 22:13 牛皮糖NewPtone 阅读(3435) 评论(0) 推荐(0) 编辑
摘要:5.2Tagged Corpora 标注语料库 Representing Tagged Tokens 表示标注的语言符号 By convention in NLTK, a tagged token is represented using a tuple consisting of the token and the tag. We can create one of these special tuples from the standard string representation of a tagged token, using the function str2tuple(): .. 阅读全文
posted @ 2011-08-24 23:22 牛皮糖NewPtone 阅读(3533) 评论(0) 推荐(0) 编辑
摘要:CHAPTER 5 Categorizing and Tagging Words 分类和标注单词 Back in elementary school you learned the difference between nouns, verbs, adjectives, and adverbs. These “word classes” are not just the idle invention of grammarians(文法家), but are useful categories for many language processing tasks. As we will s... 阅读全文
posted @ 2011-08-21 15:23 牛皮糖NewPtone 阅读(5030) 评论(0) 推荐(0) 编辑
摘要:4.11Exercises 练习 ☼ Find out more about sequence objects using Python's help facility. In the interpreter, typehelp(str),help(list), andhelp(tuple). This will give you a full list of the functions supported by each type. Some functions have special names flanked with underscore... 阅读全文
posted @ 2011-08-21 15:13 牛皮糖NewPtone 阅读(1280) 评论(0) 推荐(1) 编辑
摘要:4.10Further Reading 深入阅读 This chapter has touched on many topics in programming, some specific to Python, and some quite general. We've just scratched the surface(我们还停留在肤浅的表面), and you may want to read more about these topics, starting with the further materials for this chapter available athttp 阅读全文
posted @ 2011-08-21 15:11 牛皮糖NewPtone 阅读(488) 评论(0) 推荐(0) 编辑
摘要:4.9Summary小结 Python's assignment and parameter passing use object references; e.g. ifais a list and we assignb = a, then any operation onawill modifyb, and vice versa. Python的赋值和传参使用了对象引用;例如,如果a是一个列表并且我们赋值b=a,那么任何对于a的操作将会修改b 的值,反之亦然。 Theisoperation tests if two objects are i... 阅读全文
posted @ 2011-08-21 15:09 牛皮糖NewPtone 阅读(428) 评论(0) 推荐(0) 编辑
摘要:4.8A Sample of Python LibrariesPython库的样本 Python has hundreds of third-party libraries, specialized software packages that extend the functionality of Python. NLTK is one such library. To realize the full power of Python programming, you should become familiar with several other libraries. Most of . 阅读全文
posted @ 2011-08-21 15:05 牛皮糖NewPtone 阅读(5231) 评论(0) 推荐(0) 编辑
摘要:4.7Algorithm Design算法设计This section discusses more advanced concepts, which you may prefer to skip on the first time through this chapter.A major part of algorithmic problem solving is selecting or adapting an appropriate algorithm for the problem at hand. Sometimes there are several alternatives, . 阅读全文
posted @ 2011-08-19 23:41 牛皮糖NewPtone 阅读(2215) 评论(0) 推荐(0) 编辑
摘要:4.6Program Development程序开发 Programming is a skill that is acquired over several years of experience with a variety of programming languages and tasks. Key high-level abilities arealgorithm designand its manifestation instructured programming(主要的高级技能是算法设计以及在结构化编程中的实现). Key low-level abilities include 阅读全文
posted @ 2011-08-16 23:46 牛皮糖NewPtone 阅读(1970) 评论(0) 推荐(0) 编辑
摘要:4.5Doing More with Functions 关于函数的更多使用 This section discusses more advanced features, which you may prefer to skip on the first time through this chapter. Functions as Arguments函数作为参数 So far the arguments we have passed into functions have been simple objects like strings, or structured objects like 阅读全文
posted @ 2011-08-16 23:36 牛皮糖NewPtone 阅读(1301) 评论(0) 推荐(0) 编辑
摘要:4.4Functions: The Foundation of Structured Programming 函数:结构化编程的基础 Functions provide an effective way to package and re-use program code, as already explained inSection 2.3. For example, suppose we find that we often want to read text from an HTML file. This involves several steps: opening the file, 阅读全文
posted @ 2011-08-13 23:59 牛皮糖NewPtone 阅读(1638) 评论(0) 推荐(0) 编辑