随笔分类 - Python自然语言处理
not only include Python NLTK
摘要:8.2 What's the Use of Syntax? 语法有什么作用? Beyond n-grams n-grams之外 We gave an example in Chapter 2 of how to use the frequency information in bigrams to generate text that seems perfectly acceptable for ...
阅读全文
摘要:Chapter8 Analyzing Sentence Structure 分析句子结构 Earlier chapters focused on words: how to identify them, analyze their structure, assign them to lexical categories, and access their meanings. We have also seen how to identify patterns in word sequences or n-grams. However, these methods only scratch ..
阅读全文
摘要:7.8Further Reading Extra materials for this chapter are posted at http://www.nltk.org/, including links to freely available resources on the web. For more examples of chunking with NLTK, please see the Chunking HOWTO at http://www.nltk.org/howto. The popularity of chunking is due in great part to ..
阅读全文
摘要:7.9Exercises 练习 ☼ The IOB format categorizes tagged tokens as I, O and B. Why are three tags necessary? What problem would be caused if we used I and O tags exclusively? ☼ Write a tag pattern to match noun phrases containing plural head nouns, e.g. "many/JJ researchers/NNS",...
阅读全文
摘要:7.7Summary 小结 Information extraction systems search large bodies of unrestricted text for specific types of entities and relations, and use them to populate well-organized databases. These databases can then be used to find answers for specific questions. The typical architecture...
阅读全文
摘要:7.6Relation Extraction 关系抽取 Once named entities have been identified in a text, we then want to extract the relations that exist between them. As indicated earlier, we will typically be looking for relations between specified types of named entity. One way of approaching this task is to initially l.
阅读全文
摘要:7.5Named Entity Recognition 命名实体识别 At the start of this chapter, we briefly introduced named entities (NEs). Named entities are definite(确定的) noun phrases that refer to specific types of individuals, such as organizations, persons, dates, and so on(命名实体是明确的名词短语,指的是个体的具体类型,例如组织,个人,日期等等). Table 7.4 l.
阅读全文
摘要:7.4 Recursion in Linguistic Structure 语言结构中的递归Building Nested Structure with Cascaded Chunkers 用逐位分块器构建嵌套结构So far, our chunk structures have been relatively flat. Trees consist of tagged tokens, optionally grouped under a chunk node such as NP. However, it is possible to build chunk structures of ar
阅读全文
摘要:7.3Developing and Evaluating Chunkers 开发和评价分块器 Now you have a taste of what chunking does, but we haven't explained how to evaluate chunkers. As usual, this requires a suitably annotated corpus. We begin by looking at the mechanics of converting IOB format into an NLTK tree, then at how this is
阅读全文
摘要:7.2Chunking分块 The basic technique we will use for entity detection is chunking, which segments and labels multi-token sequences as illustrated in Figure 7.2. The smaller boxes show the word-level tokenization and part-of-speech tagging, while the large boxes show higher-level chunking. Each of thes.
阅读全文
摘要:Chapter7 Extracting Information from Text 从文本提取信息 For any given question, it's likely that someone has written the answer down somewhere. The amount of natural language text that is available in electronic form is truly staggering(令人惊愕的), and is increasing every day. However, the complexity of n
阅读全文
摘要:6.10Exercises 练习 ☼ Read up on one of the language technologies mentioned in this section, such as word sense disambiguation, semantic role labeling, question answering, machine translation, named entity detection. Find out what type and quantity of annotated data is required for...
阅读全文
摘要:6.9Further Reading深入阅读 Please consult http://www.nltk.org/ for further materials on this chapter and on how to install external machine learning packages, such as Weka, Mallet, TADM, and MEGAM. For more examples of classification and machine learning with NLTK, please see the classification HOWTOs .
阅读全文
摘要:6.8Summary小结 Modeling the linguistic data found in corpora can help us to understand linguistic patterns, and can be used to make predictions about new language data. 建模语料库中的语言数据可以帮助我们理解语言模型,并且可以用于进行关于新语言数据的预测。 Supervised classifiers use labeled training corpora to build models tha...
阅读全文
摘要:6.7Modeling Linguistic Patterns 建模语言模式 Classifiers can help us to understand the linguistic patterns that occur in natural language, by allowing us to create explicit models that capture those patterns. Typically, these models are using supervised classification techniques, but it is also possible .
阅读全文
摘要:6.6Maximum Entropy Classifiers最大熵分类器 The Maximum Entropy classifier uses a model that is very similar to the model employed by the naive Bayes classifier. But rather than using probabilities to set the model's parameters, it uses search techniques to find a set of parameters that will maximize t
阅读全文
摘要:6.5Naive Bayes Classifiers朴素贝叶斯分类器 In naive Bayes classifiers, every feature gets a say in determining which label should be assigned to a given input value. To choose a label for an input value, the naive Bayes classifier begins by calculating the prior probability(先验概率) of each label, which is de.
阅读全文
摘要:6.4Decision Trees 决策树 In the next three sections, we'll take a closer look at three machine learning methods that can be used to automatically build classification models: decision trees, naive Bayes classifiers, and Maximum Entropy classifiers. As we've seen, it's possible to treat thes
阅读全文
摘要:6.3Evaluation 评分 In order to decide whether a classification model is accurately capturing a pattern, we must evaluate that model. The result of this evaluation is important for deciding how trustworthy the model is, and for what purposes we can use it. Evaluation can also be an effective tool for .
阅读全文
摘要:6.2Further Examples of Supervised Classification 监督式分类的更多例子 Sentence Segmentation 句子分割 Sentence segmentation can be viewed as a classification task for punctuation: whenever we encounter a symbol that could possibly end a sentence, such as a period or a question mark, we have to decide whether it ..
阅读全文