2011 年 8月随笔档案 - 牛皮糖NewPtone

Python自然语言处理学习笔记(51)：监督式分类的更多例子

摘要：6.2Further Examples of Supervised Classification 监督式分类的更多例子 Sentence Segmentation 句子分割 Sentence segmentation can be viewed as a classification task for punctuation: whenever we encounter a symbol that could possibly end a sentence, such as a period or a question mark, we have to decide whether it .. 阅读全文

posted @ 2011-08-31 23:16 牛皮糖NewPtone 阅读(1634) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(50)：监督式分类

摘要：Chapter6 Learning to Classify Text学习文本分类 Detecting patternsis a central part of Natural Language Processing(模式检测是自然语言处理的核心内容). Words ending in -ed tend to be past tense verbs (Chapter 5). Frequent use of will is indicative of news text (Chapter 3). These observable patterns — word structure and wo.. 阅读全文

posted @ 2011-08-31 14:13 牛皮糖NewPtone 阅读(3474) 评论(0) 推荐(1) 编辑

Python自然语言处理学习笔记(49)：练习

摘要：5.10Exercises 练习 ☼ Search the web for "spoof newspaper headlines", to find such gems as: British Left Waffles on Falkland Islands, and Juvenile Court to Try Shooting Defendant. Manually tag these headlines to see if knowledge of the part-of-speech tags removes the ambiguity. ☼... 阅读全文

posted @ 2011-08-30 22:51 牛皮糖NewPtone 阅读(1727) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(48)：深入阅读

摘要：5.9Further Reading 深入阅读 Extra materials for this chapter are posted at http://www.nltk.org/, including links to freely available resources on the web. For more examples of tagging with NLTK, please see the Tagging HOWTO at http://www.nltk.org/howto. Chapters 4 and 5 of (Jurafsky & Martin, 2008) 阅读全文

posted @ 2011-08-30 22:49 牛皮糖NewPtone 阅读(554) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(47)：5.8 小结

摘要：5.8Summary小结 • Words can be grouped into classes, such as nouns, verbs, adjectives, and adverbs. These classes are known as lexical categories or parts-of-speech. Parts-of-speech are assigned short labels, or tags, such as NN and VB. 单词可以分成类，例如名词，动词，形容词以及副词。这些类被称为词汇类别或者词性。词性被赋给了短标签或者标记，例如NN或者VB。... 阅读全文

posted @ 2011-08-30 22:46 牛皮糖NewPtone 阅读(586) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(46)：5.7 如何判断词的分类

摘要：5.7How to Determine the Category of a Word 如何判断词的分类 Now that we have examined word classes in detail, we turn to a more basic question: how do we decide what category a word belongs to in the first place? In general, linguists use morphological（形态学的）, syntactic（语法的）, and semantic clues to determine. 阅读全文

posted @ 2011-08-30 22:45 牛皮糖NewPtone 阅读(1974) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(45)：5.6 基于转换的标记

摘要：5.6Transformation-Based Tagging基于转换的标记 A potential issue with n-gram taggers is the size of their n-gram table (表的大小问题or language model). If tagging is to be employed in a variety of language technologies deployed on mobile computing devices, it is important to strike a balance（公平处理） between model . 阅读全文

posted @ 2011-08-30 22:40 牛皮糖NewPtone 阅读(929) 评论(0) 推荐(0) 编辑

使用HTMLParser模块解析HTML页面

摘要：HTMLParser是python用来解析html和xhtml文件格式的模块。它可以分析出html里面的标签、数据等等，是一种处理html的简便途径。HTMLParser采用的是一种事件驱动的模式，当HTMLParser找到一个特定的标记时，它会去调用一个用户定义的函数，以此来通知程序处理。它主要的回调函数的命名都是以handler_开头的，都HTMLParser的成员函数。当我们使用时，就从HTMLParser派生出新的类，然后重新定义这几个以handler_开头的函数即可。和在htmllib中的解析器不同，这个解析器并不是基于sgmllib模块的SGML解析器。htmllib模块和sgm. 阅读全文

posted @ 2011-08-30 13:32 牛皮糖NewPtone 阅读(5782) 评论(0) 推荐(0) 编辑

《Python自然语言处理》学习笔记索引

摘要：关于Python自然语言处理关于该书的简介：《Python自然语言处理》提供了非常易学的自然语言处理入门介绍，该领域涵盖从文本和电子邮件预测过滤，到自动总结和翻译等多种语言处理技术。在《Python自然语言处理(影印版)》中，你将学会编写Python程序处理大量非结构化文本。你还将通过使用综合语言数据结构访问含有丰富注释的数据集，理解用于分析书面通信内容和结构的主要算法。《Python自然语言处理》准备了充足的示例和练习，可以帮助你：从非结构化文本中抽取信息，甚至猜测主题或识别“命名实体”；分析文本语言结构，包括解析和语义分析；访问流行的语言学数据库，包括Word... 阅读全文

posted @ 2011-08-29 10:44 牛皮糖NewPtone 阅读(20601) 评论(12) 推荐(6) 编辑

Python自然语言处理学习笔记(44)：5.5 N-Gram标注

摘要：5.5 N-Gram Tagging N-Gram标注Unigram Tagging 一元标注Unigramtaggers are based on a simple statistical algorithm: for each token, assign thetag that is most likely for that particular token. For example, it will assignthe tag JJ to any occurrence of the word frequent,since frequent is used as anadjective ( 阅读全文

posted @ 2011-08-28 21:54 牛皮糖NewPtone 阅读(5685) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(43)：5.4 自动标注

摘要：5.4Automatic Tagging 自动标注In the rest of this chapter we will explore various ways to automatically add part-of-speech tags to text. We will see that the tag of a word depends on the word and its context within a sentence. For this reason, we will be working with data at the level of (tagged) sentenc 阅读全文

posted @ 2011-08-26 22:05 牛皮糖NewPtone 阅读(1396) 评论(2) 推荐(1) 编辑

从蒙特卡洛方法计算pi值谈random模块

摘要：计算机模拟常常需要用到随机选择的数。本文从随机数的一个简单应用开始简要地介绍Python的random模块。使用蒙特卡洛方法计算pi值Links:该问题来自于pudure university（普渡大学）python课程中的problem set2Monte Carlo methods are used to simulate complex physical and mathematical systems by repeated random sampling. In simple terms, given a probability, p, that an event will occu 阅读全文

posted @ 2011-08-26 11:14 牛皮糖NewPtone 阅读(7653) 评论(1) 推荐(1) 编辑

Python自然语言处理学习笔记(42)：5.3 使用Python字典将单词映射到属性

摘要：Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE MicrosoftInternetExplorer4 ... 阅读全文

posted @ 2011-08-25 22:13 牛皮糖NewPtone 阅读(3468) 评论(0) 推荐(0) 编辑

Python 2.7的新特性

摘要：What’s New in Python 2.7 Author:A.M. Kuchling (amk at amk.ca)Release:2.7.2Date:August 25, 2011This article explains the new features in Python 2.7. Python 2.7 was released on July 3, 2010.本文解释了Python2.7中的新特性。该版本于2010年7月3日发布。Numeric handling has been improved in many ways, for both floating-point n.. 阅读全文

posted @ 2011-08-25 21:26 牛皮糖NewPtone 阅读(2333) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(41)：5.2 标注语料库

摘要：5.2Tagged Corpora 标注语料库 Representing Tagged Tokens 表示标注的语言符号 By convention in NLTK, a tagged token is represented using a tuple consisting of the token and the tag. We can create one of these special tuples from the standard string representation of a tagged token, using the function str2tuple(): .. 阅读全文

posted @ 2011-08-24 23:22 牛皮糖NewPtone 阅读(3562) 评论(0) 推荐(0) 编辑

Windows+Python2.6下安装MySQLdb驱动

摘要：下载与安装Python中使用MySQL需要安装MySQLdb驱动，可以从官方站点下载：http://sourceforge.net/projects/mysql-python/ 目前支持最高Python版本号2.6，MySQL版本号5.1，详细描述如下： MySQL support for Python. MySQL versions 3.23-5.1;and Python versions 2.3-2.6 are supported. MySQLdb is the Python DB API-2.0 interface. _mysql is a low-level API similia.. 阅读全文

posted @ 2011-08-24 18:09 牛皮糖NewPtone 阅读(6087) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(40)：5.1 使用词性标注器

摘要：CHAPTER 5 Categorizing and Tagging Words 分类和标注单词 Back in elementary school you learned the difference between nouns, verbs, adjectives, and adverbs. These “word classes” are not just the idle invention of grammarians（文法家）, but are useful categories for many language processing tasks. As we will s... 阅读全文

posted @ 2011-08-21 15:23 牛皮糖NewPtone 阅读(5045) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(39)： 4.11 练习

摘要：4.11Exercises 练习 ☼ Find out more about sequence objects using Python's help facility. In the interpreter, typehelp(str),help(list), andhelp(tuple). This will give you a full list of the functions supported by each type. Some functions have special names flanked with underscore... 阅读全文

posted @ 2011-08-21 15:13 牛皮糖NewPtone 阅读(1286) 评论(0) 推荐(1) 编辑

Python自然语言处理学习笔记(38)： 4.10 深入阅读

摘要：4.10Further Reading 深入阅读 This chapter has touched on many topics in programming, some specific to Python, and some quite general. We've just scratched the surface（我们还停留在肤浅的表面）, and you may want to read more about these topics, starting with the further materials for this chapter available athttp 阅读全文

posted @ 2011-08-21 15:11 牛皮糖NewPtone 阅读(498) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(37)：4.9 小结

摘要：4.9Summary小结 Python's assignment and parameter passing use object references; e.g. ifais a list and we assignb = a, then any operation onawill modifyb, and vice versa. Python的赋值和传参使用了对象引用；例如，如果a是一个列表并且我们赋值b=a，那么任何对于a的操作将会修改b 的值，反之亦然。 Theisoperation tests if two objects are i... 阅读全文

posted @ 2011-08-21 15:09 牛皮糖NewPtone 阅读(436) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(36)： 4.8 Python库的样本

摘要：4.8A Sample of Python LibrariesPython库的样本 Python has hundreds of third-party libraries, specialized software packages that extend the functionality of Python. NLTK is one such library. To realize the full power of Python programming, you should become familiar with several other libraries. Most of . 阅读全文

posted @ 2011-08-21 15:05 牛皮糖NewPtone 阅读(5243) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(35)： 4.7 算法设计

摘要：4.7Algorithm Design算法设计This section discusses more advanced concepts, which you may prefer to skip on the first time through this chapter.A major part of algorithmic problem solving is selecting or adapting an appropriate algorithm for the problem at hand. Sometimes there are several alternatives, . 阅读全文

posted @ 2011-08-19 23:41 牛皮糖NewPtone 阅读(2227) 评论(0) 推荐(0) 编辑

用Python进行SQLite数据库操作

摘要：简单的介绍 SQLite数据库是一款非常小巧的嵌入式开源数据库软件，也就是说没有独立的维护进程，所有的维护都来自于程序本身。它是遵守ACID的关联式数据库管理系统，它的设计目标是嵌入式的，而且目前已经在很多嵌入式产品中使用了它，它占用资源非常的低，在嵌入式设备中，可能只需要几百K的内存就够了。它能够支持Windows/Linux/Unix等等主流的操作系统，同时能够跟很多程序语言相结合，比如 Tcl、C#、PHP、Java等，还有ODBC接口，同样比起Mysql、PostgreSQL这两款开源世界著名的数据库管理系统来讲，它的处理速度比他们都快。SQLite第一个Alpha版本诞生于2000. 阅读全文

posted @ 2011-08-18 16:13 牛皮糖NewPtone 阅读(89907) 评论(1) 推荐(6) 编辑

Python自然语言处理学习笔记(34)：4.6 程序开发

摘要：4.6Program Development程序开发 Programming is a skill that is acquired over several years of experience with a variety of programming languages and tasks. Key high-level abilities arealgorithm designand its manifestation instructured programming（主要的高级技能是算法设计以及在结构化编程中的实现）. Key low-level abilities include 阅读全文

posted @ 2011-08-16 23:46 牛皮糖NewPtone 阅读(1986) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(33)：4.5 关于函数的更多使用

摘要：4.5Doing More with Functions 关于函数的更多使用 This section discusses more advanced features, which you may prefer to skip on the first time through this chapter. Functions as Arguments函数作为参数 So far the arguments we have passed into functions have been simple objects like strings, or structured objects like 阅读全文

posted @ 2011-08-16 23:36 牛皮糖NewPtone 阅读(1316) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(32)：4.4 函数：结构化编程的基础

摘要：4.4Functions: The Foundation of Structured Programming 函数：结构化编程的基础 Functions provide an effective way to package and re-use program code, as already explained inSection 2.3. For example, suppose we find that we often want to read text from an HTML file. This involves several steps: opening the file, 阅读全文

posted @ 2011-08-13 23:59 牛皮糖NewPtone 阅读(1654) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(31)：4.3 关于风格

摘要：4.3Questions of Style 关于风格 Programming is as much an art as a science（编程作为一门像艺术一样的科学）. The undisputed "bible" of programming, a 2,500 page multi-volume work by Donald Knuth, is calledThe Art of Computer Programming. Many books have been written onLiterate Programming, recognizing that huma 阅读全文

posted @ 2011-08-12 23:12 牛皮糖NewPtone 阅读(857) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(30)：4.2 序列

摘要：4.2Sequences序列 So far, we have seen two kinds of sequence object: strings and lists. Another kind of sequence is called atuple. Tuples are formed with the comma operator, and typically enclosed using parentheses. We've actually seen them in the previous chapters, and sometimes referred to them a 阅读全文

posted @ 2011-08-12 23:07 牛皮糖NewPtone 阅读(1314) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(29)：4.1 回到基础

摘要：Chapter 4 Writing Structured Programs编写结构化程序 By now you will have a sense of the capabilities of the Python programming language for processing natural language. However, if you're new to Python or to programming, you may still be wrestling with（努力对付） Python and not feel like you are in full con 阅读全文

posted @ 2011-08-11 22:34 牛皮糖NewPtone 阅读(912) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(28)：3.12 练习

摘要：3.12Exercises 练习 ☼ Define a string s = 'colorless'. Write a Python statement that changes this to "colourless" using only the slice and concatenation operations. ☼ We can use the slice notation to remove morphological endings on words. For example, 'dogs'[:-1] removes the l 阅读全文

posted @ 2011-08-11 22:25 牛皮糖NewPtone 阅读(2491) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(27)：3.11 深入阅读

摘要：3.11Further Reading深入阅读 Extra materials for this chapter are posted at http://www.nltk.org/ , including links to freely available resources on the Web. Remember to consult the Python reference materials at http://docs.python.org/ . (For example, this documentation covers “universal newline support,” 阅读全文

posted @ 2011-08-11 22:21 牛皮糖NewPtone 阅读(608) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(26)：3.10 小结

摘要：3.10Summary小结 • In this book we view a text as a list of words. A “raw text” is a potentially long string containing words and whitespace formatting, and is how we typically store and visualize a text. • A string is specified in Python using single or double quotes:'Monty Python', "Mont 阅读全文

posted @ 2011-08-11 22:20 牛皮糖NewPtone 阅读(728) 评论(0) 推荐(0) 编辑

趣味题：将一个正整数倒置

摘要：偶然看到一道趣味题，将一个正整数倒置过来，例如，将1234变成4321。如果采用C的话，使用do..while语句来做，只要temp不为0，每执行一次，rebmun用来存原整数的最右边的数字并乘以10。代码如下：#include<stdio.h>voidmain(){intnumber=0;intrebmun=0;inttemp=0;printf("\nEnteranumber:");scanf("%d",&number);temp=number;do{rebmun=10*rebmun+temp%10;temp=temp/10;}wh 阅读全文

posted @ 2011-08-10 18:04 牛皮糖NewPtone 阅读(1031) 评论(1) 推荐(0) 编辑

Part2:比较两个渗透函数的性能

摘要：关于该部分的项目描述请见Project2 Percolation in Grids 网格渗透　测试的场景要求如下：· set n=75 · consider values of p from 0 to 1 in increments of 0.05 (or smaller) · for each value of p, generate 10 random grids and record for each algorithm the average running time on the ten grids 可是我在编写该部分的测试代码时，遇到了些麻烦，一开始还阅读全文

posted @ 2011-08-10 14:22 牛皮糖NewPtone 阅读(402) 评论(0) 推荐(0) 编辑

网格大小对于网格渗透率的影响

摘要：此实验来自于part1中的具体要求：Next, consider grid sizes n = 10, 25, 50, and 75 and determine the percolation probabilities (you already know it for n=25). One way to visualize the performance for the different values of n is to make the same curve as above and show all three in one plot. Discuss how the size o. 阅读全文

posted @ 2011-08-10 13:28 牛皮糖NewPtone 阅读(815) 评论(0) 推荐(0) 编辑

递归探测算法的实现

摘要：迷茫童鞋的阅读指南该项目的详细描述和算法的具体说明请参见前一篇 Project2 Percolation in Grids 网格渗透本人给出了percolation_provided.py的函数说明目前已完成水波探测算法的实现该算法的关键是实现explore函数中递归，及percolation_recursive对explore的初始调用。Step1: 国际惯例导入提供的函数：from percolation_provided import * 先考虑percolation_recursive函数，参数和前面的水波算法函数一样，定义为：percolation_recursive(input. 阅读全文

posted @ 2011-08-09 22:13 牛皮糖NewPtone 阅读(531) 评论(0) 推荐(0) 编辑

Python美味食谱:1.8 检测字符串中是否包含某字符集合中的字符

摘要：目的检测字符串中是否包含某字符集合中的字符方法最简洁的方法如下，清晰，通用，快速，适用于任何序列和容器defcontainAny(seq,aset):forcinseq:ifcinaset:returnTruereturnFalse 第二种适用itertools模块来可以提高一点性能，本质上与前者是同种方法(不过此方法违背了Python的核心观点：简洁，清晰)itertools.ifilter(predicate, iterable)的说明 Make an iterator that filters elements from iterable returning only those . 阅读全文

posted @ 2011-08-09 17:03 牛皮糖NewPtone 阅读(1746) 评论(0) 推荐(0) 编辑

Python美味食谱:1.7 将字符串逐字符或逐词反转

摘要：目的把字符串逐字符或逐词反转过来，这个蛮有意思的。方法先看逐字符反转吧，第一种设置切片的步长为-1 revchars=astring[::-1]In[65]:x='abcd'In[66]:x[::-1]Out[66]:'dcba' 第二种做法是采用reversed()，注意它返回的是一个迭代器，可以用于循环或传递给其它的“累加器”，不是一个已完成的字符串。 revchars=''.join(reversed(astring)) In[56]:y=reversed(x)In[57]:yOut[57]:<reversedobjectat0x 阅读全文

posted @ 2011-08-09 15:43 牛皮糖NewPtone 阅读(661) 评论(0) 推荐(0) 编辑

Python美味食谱:1.6 合并字符串

摘要：目的将一些小的字符串合并成一个大字符串，更多考虑的是性能方法常见的方法有以下几种：1.使用+=操作符 BigString=small1+small2+small3+...+smalln例如有一个片段pieces=['Today','is','really','a','good','day']，我们希望把它联起来BigString=''foreinpieces:BigString+=e+''或者用importoperatorBigString=reduce(ope 阅读全文

posted @ 2011-08-09 14:23 牛皮糖NewPtone 阅读(961) 评论(0) 推荐(0) 编辑

Python美味食谱: 1.5 去除字符串两段空格

摘要：目的获得一个首尾不含多余空格的字符串方法可以使用字符串的以下方法处理：string.lstrip(s[, chars]) Return a copy of the string with leading characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the beginning. 阅读全文

posted @ 2011-08-09 12:37 牛皮糖NewPtone 阅读(1510) 评论(0) 推荐(0) 编辑

Python美味食谱: 1.4 字符串对齐

摘要：目的实现字符串的左对齐，右对齐，居中对齐。方法字符串内置了以下方法：其中width是指包含字符串S在内的宽度，fillchar默认是空格，也可以指定填充字符 string.ljust(s, width[, fillchar]) string.rjust(s, width[, fillchar]) string.center(s, width[, fillchar])In[6]:a='Hello!'In[7]:printa.ljust(10,'+')Hello!++++In[8]:printa.rjust(10,'+')++++Hello!In 阅读全文

posted @ 2011-08-09 12:23 牛皮糖NewPtone 阅读(907) 评论(0) 推荐(0) 编辑

Python美味食谱: 1.3 测试对象是否为类字符串

摘要：目的测试一个对象是否是字符串方法Python的字符串的基类是basestring，包括了str和unicode类型。一般可以采用以下方法：defisAString(anobj): returnisinstance(anobj,basestring) 不过以上方法对于UserString类的实例，无能无力。 In[30]:b=UserString.UserString('abc')In[31]:isAString(b)Out[31]:FalseIn[32]:type(b)Out[32]:<class'UserString.UserString'>Py 阅读全文

posted @ 2011-08-09 11:07 牛皮糖NewPtone 阅读(1221) 评论(0) 推荐(0) 编辑

Python美味食谱系列导航图

摘要：正在施工中，请注意安全，谨防砖头瓦片掉落Python美味食谱第一章文本 1.1 每次处理一个字符 1.2 字符和字符值间转换1.3 测试对象是否为类字符串1.4 字符串对齐1.5 去除字符串两段空格1.6 合并字符串1.7 将字符串逐字符或逐词反转阅读全文

posted @ 2011-08-09 10:24 牛皮糖NewPtone 阅读(660) 评论(0) 推荐(0) 编辑

Python美味食谱: 1.2 字符和字符值间转换

摘要：目的将一个字符转化为相应的ASCII或Unicode码，或相反的操作。方法对于ASCII码（0~255范围） >>>printord('A')65>>>printchr(65)A对于Unicode字符，注意仅接收长度为1的Unicode字符>>>printord(u'\u54c8')21704>>>printunichr(21704)哈>>>printrepr(unichr(21704))u'\u54c8'chr()和str()区别，一个仅接收0~255的阅读全文

posted @ 2011-08-08 21:32 牛皮糖NewPtone 阅读(524) 评论(0) 推荐(0) 编辑

Python美味食谱: 1.1 每次处理一个字符

摘要：关于Python美味食谱开这个类别的主要目的是为了总结Python Cookbook上的知识和技巧，也为巩固自己的Python知识点。当然和书上会有所不同，力求简明扼要。目的对字符串的每个字符进行处理，其实每个字符(Char)就是一个长度为1的字符串。方法1.使用内建函数list()>>>A_string='Python'>>>char_list=list(A_string)>>>char_list['P','y','t','h','o',& 阅读全文

posted @ 2011-08-08 16:39 牛皮糖NewPtone 阅读(848) 评论(1) 推荐(0) 编辑

使用水波渗透算法测定网格渗透概率

摘要：本文中，我们将使用前面已经实现的水波渗透算法来测定对于固定大小的网格，在不同开放概率p下发生渗透的概率。关于该部分的具体说明如下：How many trials are needed to make a prediction on whether a grid generated with probability p percolates? How many different values of p should be considered to determine the percolation probability q? 需要做多少次实验才能确定一个按概率p产生的网格是否渗透? 需要考阅读全文

posted @ 2011-08-08 14:09 牛皮糖NewPtone 阅读(619) 评论(0) 推荐(0) 编辑

水波探测算法的实现

摘要：水波探测算法的实现迷茫童鞋的阅读指南该项目的详细描述和算法的具体说明请参见前一篇 Project2 Percolation in Grids 网格渗透本人也给出了percolation_provided.py的函数说明俗话说得好:凡事说起来容易，做起来难。博主没有谨慎思考打开IDE就是一顿乱敲，然后就是不停地修复各种BUG。下面谈谈该算法的实现以及遇到的问题和解决办法：Step1: 国际惯例导入提供的函数：from percolation_provided import * 再敲定函数名：percolation_wave(input_grid)。名字取好了就可以开始干活了，参数先暂定一个阅读全文

posted @ 2011-08-07 23:10 牛皮糖NewPtone 阅读(736) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(25)：3.9 格式化：从列表到字符串

摘要：3.9Formatting: From Lists to Strings 格式化：从列表到字符串 Often we write a program to report a single data item, such as a particular element in a corpus that meets some complicated criterion, or a single summary statistic such as a word-count or the performance of a tagger. More often, we write a program to 阅读全文

posted @ 2011-08-07 20:19 牛皮糖NewPtone 阅读(2611) 评论(0) 推荐(1) 编辑

Python自然语言处理学习笔记(24)：3.8 分割

摘要：3.8Segmentation 分割 This section discusses more advanced concepts, which you may prefer to skip on the first time through this chapter. Tokenization is an instance of a more general problem of segmentation. In this section, we will look at two other instances of this problem, which use radically（根本上）阅读全文

posted @ 2011-08-06 22:46 牛皮糖NewPtone 阅读(1697) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(23)：3.7 用正则表达式文本分词

摘要：3.7Regular Expressions for Tokenizing Text 用正则表达式文本分词 Tokenization is the task of cutting a string into identifiable linguistic units that constitute a piece of language data. Although it is a fundamental task, we have been able to delay it until now because many corpora are already tokenized, and . 阅读全文

posted @ 2011-08-06 22:36 牛皮糖NewPtone 阅读(3708) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(22)：3.6 规格化文本

摘要：3.6Normalizing Text 规格化文本 In earlier program examples we have often converted text to lowercase before doing anything with its words, e.g., set(w.lower() for w in text). By using lower(), we have normalized the text to lowercase so that the distinction between The and the is ignored. Often we want t 阅读全文

posted @ 2011-08-06 22:27 牛皮糖NewPtone 阅读(2162) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(21)：3.5 正则表达式的有益应用

摘要：3.5Useful Applications of Regular Expressions 正则表达式的有益应用 The previous examples all involved searching for words w that match some regular expression regexp using re.search(regexp, w). Apart from checking whether a regular expression matches a word, we can use regular expressions to extract material 阅读全文

posted @ 2011-08-06 16:08 牛皮糖NewPtone 阅读(2003) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(20)：3.4 使用正则表达式检测词组

摘要：转载请注明出处“一块努力的牛皮糖”：http://www.cnblogs.com/yuxc/新手上路，翻译不恰之处，恳请指出，不胜感谢Updated log3.4Regular Expressions for Detecting Word Patterns 使用正则表达式检测词组 Many linguistic processing tasks involve pattern matching（模式匹配）. For example, we can find words ending with ed using endswith('ed'). We saw a variety o 阅读全文

posted @ 2011-08-06 15:32 牛皮糖NewPtone 阅读(2220) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(19):3.3 使用Unicode进行文字处理

摘要：3.3Text Processing with Unicode使用Unicode进行文字处理 Our programs will often need to deal with different languages, and different character sets. The concept of “plain text” is a fiction（虚构）. If you live in the English-speaking world you probably use ASCII, possibly without realizing it. If you live in Eu 阅读全文

posted @ 2011-08-06 14:39 牛皮糖NewPtone 阅读(2820) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(18)：3.2 字符串：最底层的文本处理

摘要：转载请注明出处“一块努力的牛皮糖”：http://www.cnblogs.com/yuxc/新手上路，翻译不恰之处，恳请指出，不胜感谢　Updated log1st 2011.8.6 3.2Strings: Text Processing at the Lowest Level 字符串：最底层的文本处理PS:个人认为这部分很重要，字符串处理是NLP里最基本的部分，各位童鞋好好看，老鸟略过...It’s time to study a fundamental data type that we’ve been studiously（故意地） avoiding so far. In earlier 阅读全文

posted @ 2011-08-05 23:13 牛皮糖NewPtone 阅读(2553) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(17)：3.1 从Web和Disk上访问文本

摘要：CHAPTER 3Processing Raw Text 处理原始文本The most important source of texts is undoubtedly the Web. It’s convenient to have existing text collections to explore, such as the corpora we saw in the previous chapters. However, you probably have your own text sources in mind, and need to learn how to access t 阅读全文

posted @ 2011-08-05 21:51 牛皮糖NewPtone 阅读(2687) 评论(1) 推荐(0) 编辑

Python自然语言处理学习笔记(16)：2.8 Exercises 练习

摘要：博主懒人...练习未做完，下回补全...2.8Exercises练习 1. ○ Create a variable phrase containing a list of words. Experiment with the operations described in this chapter, including addition, multiplication, indexing, slicing, and sorting. List_practice=['Hello','World!']List_practice+['Pythoner' 阅读全文

posted @ 2011-08-05 21:36 牛皮糖NewPtone 阅读(2408) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(15)：2.7 Further Reading 深入阅读

摘要：转载请注明出处“一块努力的牛皮糖”：http://www.cnblogs.com/yuxc/新手上路，翻译不恰之处，恳请指出，不胜感谢2.7Further Reading深入阅读 Extra materials for this chapter are posted at http://www.nltk.org/ , including links to freely available resources on the Web. The corpus methods are summarized in the Corpus HOWTO, at http://www.nltk.org/howt 阅读全文

posted @ 2011-08-05 21:26 牛皮糖NewPtone 阅读(713) 评论(0) 推荐(0) 编辑

Python自然语言处理学习笔记(14)：2.6 Summary 小结

摘要：转载请注明出处“一块努力的牛皮糖”：http://www.cnblogs.com/yuxc/新手上路，翻译不恰之处，恳请指出，不胜感谢2.6Summary 小结 • A text corpus is a large, structured collection of texts. NLTK comes with many corpora, e.g., the Brown Corpus, nltk.corpus.brown. 文本语料库是一个大型的结构化的一系列的文本。NLTK包含了许多语料库，例如，Brown Corpus，nltk.corpus.brown。 • Some text corp 阅读全文

posted @ 2011-08-05 21:24 牛皮糖NewPtone 阅读(2047) 评论(1) 推荐(0) 编辑

Python自然语言处理学习笔记(4)：1.2 进一步学习Python：将文本视作单词列表

摘要：新手上路，翻译不恰之处，恳请指出，不胜感谢Updated log1st:2011/8/6 2nd:新图标更换，原图标实在不喜欢那~相信有不少童鞋会喜欢~1.2 A Closer Look at Python: Texts as Lists of Words 进一步学习Python：将文本视作单词列表You’ve seen some important elements of the Python programming language. Let’s take a few moments to review them systematically.Lists 列表What is a text? 阅读全文

posted @ 2011-08-05 16:40 牛皮糖NewPtone 阅读(1783) 评论(0) 推荐(0) 编辑

数组分组问题

摘要：这个问题是这个样子滴：有一个无序、元素个数为n（n为偶数）的正整数数组arr，要求：如何能把这个数组分割为元素个数为n/2的两个子数组，并使两个子数组的的和最接近。问题来源： http://hi.baidu.com/hell74111/blog/item/b6155d94f46717067bf48024.html我的思路是：（1）把数组拆成2个子数组A和B（2）用A中的每个元素与B中的每个元素比较，数组值之和的绝对值小于原来的值就交换其实并不难，关键在于我突然犯2了...我写了个测试数组a=[1,2,3,4,5,6].然后想当然地以为分成的两个数组各元素之和应该相等的。结果在那苦苦耗了半天，想阅读全文

posted @ 2011-08-02 18:14 牛皮糖NewPtone 阅读(2067) 评论(1) 推荐(1) 编辑

计算整数n的b进制展开式

摘要：给出任意一个十进制整数n，计算它的b进制展开式from __future__ import divisionimport mathdef baseb(b,q): aList=[] while q!=0: a=int(math.fmod(q,b)) q=math.floor(q/b) aList.append(str(a)) expansion=''.join(aList) print expansion运行结果如下>>> baseb(2,100)0010011 阅读全文

posted @ 2011-08-02 11:53 牛皮糖NewPtone 阅读(551) 评论(0) 推荐(0) 编辑

求任意整数的200次平方的末两位

摘要：一道中学生的题目困扰了我好久啊，从吃晚饭时间到现在...求x的n次方的末两位数。令y=x**n，则y的末两位数与x的末两位数有关。规律就是从2开始，每20个数是一个循环View Code for x in range(2, 100): #x为底数 y=x%100 #y为幂的最后两位 s="" for n in range(2,201): #n为指数 init=y #init保留上一次y的值 y=y*x #每次乘以一个x y = y % 100 #对y用100取模，值为最后两位数 if y==init: # 如果本次算值与上次计算值相同，则不需继续计算 break if y= 阅读全文

posted @ 2011-08-01 19:20 牛皮糖NewPtone 阅读(753) 评论(0) 推荐(0) 编辑

我中招了：解喝汽水问题

摘要：这是一道从别人博客里看到的趣味题：[题目]1元钱一瓶汽水，喝完后两个空瓶换一瓶汽水，问：你有20元钱，最多可以喝到几瓶汽水？我想当然地以为是20—>10—>5—>2—>1，漏了一瓶...还是写个程序来解决一下：设每次买一瓶，攒够2个空瓶就换一瓶汽水：def qishui1( m ): s = 0 # 喝去的汽水瓶数 k = 0 #空瓶数 while m>0: m=m-1 #买 1瓶 s=s+1 k=k+1 while k==2: k=0 s=s+1 #换一瓶汽水,喝掉 k=k+1 #又多出来一个空瓶 return s,km = 20s,k=qishui1(m)pr 阅读全文

posted @ 2011-08-01 17:55 牛皮糖NewPtone 阅读(1511) 评论(1) 推荐(0) 编辑

Python:urllib 和urllib2之间的区别

摘要：作为一个Python菜鸟，之前一直懵懂于urllib和urllib2，以为2是1的升级版。今天看到老外写的一篇《Python: difference between urllib and urllib2》才明白其中的区别。You might be intrigued by the existence of two separate URL modules in Python -urllibandurllib2. Even more intriguing: they are not alternatives for each other. So what is the difference be 阅读全文

posted @ 2011-08-01 17:10 牛皮糖NewPtone 阅读(61179) 评论(0) 推荐(2) 编辑

Python文件夹与文件的操作

摘要：最近在写的程序频繁地与文件操作打交道，这块比较弱，还好在百度上找到一篇不错的文章，这是原文传送门，我对原文稍做了些改动。有关文件夹与文件的查找，删除等功能在os模块中实现。使用时需先导入这个模块，导入的方法是:import os一、取得当前目录s =os.getcwd()# s 中保存的是当前目录(即文件夹)比如运行abc.py，那么输入该命令就会返回abc所在的文件夹位置。举个简单例子，我们将abc.py放入A文件夹。并且希望不管将A文件夹放在硬盘的哪个位置，都可以在A文件夹内生成一个新文件夹。且文件夹的名字根据时间自动生成。import osimport timefolder = tim 阅读全文

posted @ 2011-08-01 16:32 牛皮糖NewPtone 阅读(85046) 评论(4) 推荐(5) 编辑

Python 标准库 urllib2 的使用细节

摘要：刚好用到，这篇文章写得不错，转过来收藏。转载自道可道 | Python 标准库 urllib2 的使用细节Python 标准库中有很多实用的工具类，但是在具体使用时，标准库文档上对使用细节描述的并不清楚，比如 urllib2 这个 HTTP 客户端库。这里总结了一些 urllib2 库的使用细节。1 Proxy 的设置 2 Timeout 设置 3 在 HTTP Request 中加入特定的 Header 4 Redirect 5 Cookie 6 使用 HTTP 的 PUT 和 DELETE 方法 7 得到 HTTP 的返回码 8 Debug Log 1 Proxy 的设置urllib2 阅读全文

posted @ 2011-08-01 16:23 牛皮糖NewPtone 阅读(140840) 评论(0) 推荐(9) 编辑

08 2011 档案

牛皮糖的Blog