09 2011 档案
摘要:Note在阅读本文之外,你可以参考来自于zzcase精心制作的简明swift安装指南和使用已经打包好的配置文件以及笨笨blog所写的Swift在Ubuntu系统上的安装与配置按照本人的一贯风格,采用注解和部分翻译的方式来介绍Swift的SAIO安装Alpha:Created in 2011.9.24Updated:1st 2011.9.27 修正了调试结果2rd 2011.9.29 增加一些注释Instructions for setting up a development VM 搭建开发虚拟机This documents setting up a virtual machine for .
阅读全文
摘要:doctest模块搜索类似于交互式Python会话的文本段,然后执行这些会话来验证他们是否如所示的那样。有一些常见使用doctest的方式: 通过检测所有的文档中的交互式例子输出正常来检查一个模块的文档字符串为最新的。 通过验证来自测试文件或者测试对象的交互式例子的输出像预期一样来执行回归测试。 为一个包编写教程文档,逐字地用输入-输出例子来说明。取决与这个例子或者说明文本是否强调,有些“逐字测试”或者“可执行文档”的味道。 以下是一个来自官方文档中的例子:View Code """Thisisthe"example"m...
阅读全文
摘要:7.3Developing and Evaluating Chunkers 开发和评价分块器 Now you have a taste of what chunking does, but we haven't explained how to evaluate chunkers. As usual, this requires a suitably annotated corpus. We begin by looking at the mechanics of converting IOB format into an NLTK tree, then at how this is
阅读全文
摘要:7.2Chunking分块 The basic technique we will use for entity detection is chunking, which segments and labels multi-token sequences as illustrated in Figure 7.2. The smaller boxes show the word-level tokenization and part-of-speech tagging, while the large boxes show higher-level chunking. Each of thes.
阅读全文
摘要:Chapter7 Extracting Information from Text 从文本提取信息 For any given question, it's likely that someone has written the answer down somewhere. The amount of natural language text that is available in electronic form is truly staggering(令人惊愕的), and is increasing every day. However, the complexity of n
阅读全文
摘要:6.10Exercises 练习 ☼ Read up on one of the language technologies mentioned in this section, such as word sense disambiguation, semantic role labeling, question answering, machine translation, named entity detection. Find out what type and quantity of annotated data is required for...
阅读全文
摘要:6.9Further Reading深入阅读 Please consult http://www.nltk.org/ for further materials on this chapter and on how to install external machine learning packages, such as Weka, Mallet, TADM, and MEGAM. For more examples of classification and machine learning with NLTK, please see the classification HOWTOs .
阅读全文
摘要:6.8Summary小结 Modeling the linguistic data found in corpora can help us to understand linguistic patterns, and can be used to make predictions about new language data. 建模语料库中的语言数据可以帮助我们理解语言模型,并且可以用于进行关于新语言数据的预测。 Supervised classifiers use labeled training corpora to build models tha...
阅读全文
摘要:6.7Modeling Linguistic Patterns 建模语言模式 Classifiers can help us to understand the linguistic patterns that occur in natural language, by allowing us to create explicit models that capture those patterns. Typically, these models are using supervised classification techniques, but it is also possible .
阅读全文
摘要:6.6Maximum Entropy Classifiers最大熵分类器 The Maximum Entropy classifier uses a model that is very similar to the model employed by the naive Bayes classifier. But rather than using probabilities to set the model's parameters, it uses search techniques to find a set of parameters that will maximize t
阅读全文
摘要:6.5Naive Bayes Classifiers朴素贝叶斯分类器 In naive Bayes classifiers, every feature gets a say in determining which label should be assigned to a given input value. To choose a label for an input value, the naive Bayes classifier begins by calculating the prior probability(先验概率) of each label, which is de.
阅读全文
摘要:6.4Decision Trees 决策树 In the next three sections, we'll take a closer look at three machine learning methods that can be used to automatically build classification models: decision trees, naive Bayes classifiers, and Maximum Entropy classifiers. As we've seen, it's possible to treat thes
阅读全文
摘要:6.3Evaluation 评分 In order to decide whether a classification model is accurately capturing a pattern, we must evaluate that model. The result of this evaluation is important for deciding how trustworthy the model is, and for what purposes we can use it. Evaluation can also be an effective tool for .
阅读全文