08 2011 档案

摘要:6.2Further Examples of Supervised Classification 监督式分类的更多例子 Sentence Segmentation 句子分割 Sentence segmentation can be viewed as a classification task for punctuation: whenever we encounter a symbol that could possibly end a sentence, such as a period or a question mark, we have to decide whether it .. 阅读全文
posted @ 2011-08-31 23:16 牛皮糖NewPtone 阅读(1634) 评论(0) 推荐(0) 编辑
摘要:Chapter6 Learning to Classify Text学习文本分类 Detecting patternsis a central part of Natural Language Processing(模式检测是自然语言处理的核心内容). Words ending in -ed tend to be past tense verbs (Chapter 5). Frequent use of will is indicative of news text (Chapter 3). These observable patterns — word structure and wo.. 阅读全文
posted @ 2011-08-31 14:13 牛皮糖NewPtone 阅读(3474) 评论(0) 推荐(1) 编辑
摘要:5.10Exercises 练习 ☼ Search the web for "spoof newspaper headlines", to find such gems as: British Left Waffles on Falkland Islands, and Juvenile Court to Try Shooting Defendant. Manually tag these headlines to see if knowledge of the part-of-speech tags removes the ambiguity. ☼... 阅读全文
posted @ 2011-08-30 22:51 牛皮糖NewPtone 阅读(1727) 评论(0) 推荐(0) 编辑
摘要:5.9Further Reading 深入阅读 Extra materials for this chapter are posted at http://www.nltk.org/, including links to freely available resources on the web. For more examples of tagging with NLTK, please see the Tagging HOWTO at http://www.nltk.org/howto. Chapters 4 and 5 of (Jurafsky & Martin, 2008) 阅读全文
posted @ 2011-08-30 22:49 牛皮糖NewPtone 阅读(554) 评论(0) 推荐(0) 编辑
摘要:5.8Summary小结 • Words can be grouped into classes, such as nouns, verbs, adjectives, and adverbs. These classes are known as lexical categories or parts-of-speech. Parts-of-speech are assigned short labels, or tags, such as NN and VB. 单词可以分成类,例如名词,动词,形容词以及副词。这些类被称为词汇类别或者词性。词性被赋给了短标签或者标记,例如NN或者VB。... 阅读全文
posted @ 2011-08-30 22:46 牛皮糖NewPtone 阅读(586) 评论(0) 推荐(0) 编辑
摘要:5.7How to Determine the Category of a Word 如何判断词的分类 Now that we have examined word classes in detail, we turn to a more basic question: how do we decide what category a word belongs to in the first place? In general, linguists use morphological(形态学的), syntactic(语法的), and semantic clues to determine. 阅读全文
posted @ 2011-08-30 22:45 牛皮糖NewPtone 阅读(1974) 评论(0) 推荐(0) 编辑
摘要:5.6Transformation-Based Tagging基于转换的标记 A potential issue with n-gram taggers is the size of their n-gram table (表的大小问题or language model). If tagging is to be employed in a variety of language technologies deployed on mobile computing devices, it is important to strike a balance(公平处理) between model . 阅读全文
posted @ 2011-08-30 22:40 牛皮糖NewPtone 阅读(929) 评论(0) 推荐(0) 编辑
摘要:HTMLParser是python用来解析html和xhtml文件格式的模块。它可以分析出html里面的标签、数据等等,是一种处理html的简便途径。HTMLParser采用的是一种事件驱动的模式,当HTMLParser找到一个特定的标记时,它会去调用一个用户定义的函数,以此来通知程序处理。它主要的回调函数的命名都是以handler_开头的,都HTMLParser的成员函数。当我们使用时,就从HTMLParser派生出新的类,然后重新定义这几个以handler_开头的函数即可。和在htmllib中的解析器不同,这个解析器并不是基于sgmllib模块的SGML解析器。htmllib模块和sgm. 阅读全文
posted @ 2011-08-30 13:32 牛皮糖NewPtone 阅读(5782) 评论(0) 推荐(0) 编辑
摘要:关于Python自然语言处理关于该书的简介:《Python自然语言处理》提供了非常易学的自然语言处理入门介绍,该领域涵盖从文本和电子邮件预测过滤,到自动总结和翻译等多种语言处理技术。在《Python自然语言处理(影印版)》 中,你将学会编写Python程序处理大量非结构化文本。你还将通过使用综合语言数据结构访问含有丰富注释的数据集,理解用于分析书面通信内容和结构的主 要算法。 《Python自然语言处理》准备了充足的示例和练习,可以帮助你: 从非结构化文本中抽取信息,甚至猜测主题或识别“命名实体”; 分析文本语言结构,包括解析和语义分析; 访问流行的语言学数据库,包括Word... 阅读全文
posted @ 2011-08-29 10:44 牛皮糖NewPtone 阅读(20601) 评论(12) 推荐(6) 编辑
摘要:5.5 N-Gram Tagging N-Gram标注Unigram Tagging 一元标注Unigramtaggers are based on a simple statistical algorithm: for each token, assign thetag that is most likely for that particular token. For example, it will assignthe tag JJ to any occurrence of the word frequent,since frequent is used as anadjective ( 阅读全文
posted @ 2011-08-28 21:54 牛皮糖NewPtone 阅读(5685) 评论(0) 推荐(0) 编辑
摘要:5.4Automatic Tagging 自动标注In the rest of this chapter we will explore various ways to automatically add part-of-speech tags to text. We will see that the tag of a word depends on the word and its context within a sentence. For this reason, we will be working with data at the level of (tagged) sentenc 阅读全文
posted @ 2011-08-26 22:05 牛皮糖NewPtone 阅读(1396) 评论(2) 推荐(1) 编辑
摘要:计算机模拟常常需要用到随机选择的数。本文从随机数的一个简单应用开始简要地介绍Python的random模块。使用蒙特卡洛方法计算pi值Links:该问题来自于pudure university(普渡大学)python课程中的problem set2Monte Carlo methods are used to simulate complex physical and mathematical systems by repeated random sampling. In simple terms, given a probability, p, that an event will occu 阅读全文
posted @ 2011-08-26 11:14 牛皮糖NewPtone 阅读(7653) 评论(1) 推荐(1) 编辑
摘要:Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN X-NONE MicrosoftInternetExplorer4 ... 阅读全文
posted @ 2011-08-25 22:13 牛皮糖NewPtone 阅读(3468) 评论(0) 推荐(0) 编辑
摘要:What’s New in Python 2.7 Author:A.M. Kuchling (amk at amk.ca)Release:2.7.2Date:August 25, 2011This article explains the new features in Python 2.7. Python 2.7 was released on July 3, 2010.本文解释了Python2.7中的新特性。该版本于2010年7月3日发布。Numeric handling has been improved in many ways, for both floating-point n.. 阅读全文
posted @ 2011-08-25 21:26 牛皮糖NewPtone 阅读(2333) 评论(0) 推荐(0) 编辑
摘要:5.2Tagged Corpora 标注语料库 Representing Tagged Tokens 表示标注的语言符号 By convention in NLTK, a tagged token is represented using a tuple consisting of the token and the tag. We can create one of these special tuples from the standard string representation of a tagged token, using the function str2tuple(): .. 阅读全文
posted @ 2011-08-24 23:22 牛皮糖NewPtone 阅读(3562) 评论(0) 推荐(0) 编辑
摘要:下载与安装Python中使用MySQL需要安装MySQLdb驱动,可以从官方站点下载:http://sourceforge.net/projects/mysql-python/ 目前支持最高Python版本号2.6,MySQL版本号5.1,详细描述如下: MySQL support for Python. MySQL versions 3.23-5.1;and Python versions 2.3-2.6 are supported. MySQLdb is the Python DB API-2.0 interface. _mysql is a low-level API similia.. 阅读全文
posted @ 2011-08-24 18:09 牛皮糖NewPtone 阅读(6087) 评论(0) 推荐(0) 编辑
摘要:CHAPTER 5 Categorizing and Tagging Words 分类和标注单词 Back in elementary school you learned the difference between nouns, verbs, adjectives, and adverbs. These “word classes” are not just the idle invention of grammarians(文法家), but are useful categories for many language processing tasks. As we will s... 阅读全文
posted @ 2011-08-21 15:23 牛皮糖NewPtone 阅读(5045) 评论(0) 推荐(0) 编辑
摘要:4.11Exercises 练习 ☼ Find out more about sequence objects using Python's help facility. In the interpreter, typehelp(str),help(list), andhelp(tuple). This will give you a full list of the functions supported by each type. Some functions have special names flanked with underscore... 阅读全文
posted @ 2011-08-21 15:13 牛皮糖NewPtone 阅读(1286) 评论(0) 推荐(1) 编辑
摘要:4.10Further Reading 深入阅读 This chapter has touched on many topics in programming, some specific to Python, and some quite general. We've just scratched the surface(我们还停留在肤浅的表面), and you may want to read more about these topics, starting with the further materials for this chapter available athttp 阅读全文
posted @ 2011-08-21 15:11 牛皮糖NewPtone 阅读(498) 评论(0) 推荐(0) 编辑
摘要:4.9Summary小结 Python's assignment and parameter passing use object references; e.g. ifais a list and we assignb = a, then any operation onawill modifyb, and vice versa. Python的赋值和传参使用了对象引用;例如,如果a是一个列表并且我们赋值b=a,那么任何对于a的操作将会修改b 的值,反之亦然。 Theisoperation tests if two objects are i... 阅读全文
posted @ 2011-08-21 15:09 牛皮糖NewPtone 阅读(436) 评论(0) 推荐(0) 编辑
摘要:4.8A Sample of Python LibrariesPython库的样本 Python has hundreds of third-party libraries, specialized software packages that extend the functionality of Python. NLTK is one such library. To realize the full power of Python programming, you should become familiar with several other libraries. Most of . 阅读全文
posted @ 2011-08-21 15:05 牛皮糖NewPtone 阅读(5243) 评论(0) 推荐(0) 编辑
摘要:4.7Algorithm Design算法设计This section discusses more advanced concepts, which you may prefer to skip on the first time through this chapter.A major part of algorithmic problem solving is selecting or adapting an appropriate algorithm for the problem at hand. Sometimes there are several alternatives, . 阅读全文
posted @ 2011-08-19 23:41 牛皮糖NewPtone 阅读(2227) 评论(0) 推荐(0) 编辑
摘要:简单的介绍 SQLite数据库是一款非常小巧的嵌入式开源数据库软件,也就是说没有独立的维护进程,所有的维护都来自于程序本身。它是遵守ACID的关联式数据库管理系统,它的设计目标是嵌入式的,而且目前已经在很多嵌入式产品中使用了它,它占用资源非常的低,在嵌入式设备中,可能只需要几百K的内存就够了。它能够支持Windows/Linux/Unix等等主流的操作系统,同时能够跟很多程序语言相结合,比如 Tcl、C#、PHP、Java等,还有ODBC接口,同样比起Mysql、PostgreSQL这两款开源世界著名的数据库管理系统来讲,它的处理速度比他们都快。SQLite第一个Alpha版本诞生于2000. 阅读全文
posted @ 2011-08-18 16:13 牛皮糖NewPtone 阅读(89907) 评论(1) 推荐(6) 编辑
摘要:4.6Program Development程序开发 Programming is a skill that is acquired over several years of experience with a variety of programming languages and tasks. Key high-level abilities arealgorithm designand its manifestation instructured programming(主要的高级技能是算法设计以及在结构化编程中的实现). Key low-level abilities include 阅读全文
posted @ 2011-08-16 23:46 牛皮糖NewPtone 阅读(1986) 评论(0) 推荐(0) 编辑
摘要:4.5Doing More with Functions 关于函数的更多使用 This section discusses more advanced features, which you may prefer to skip on the first time through this chapter. Functions as Arguments函数作为参数 So far the arguments we have passed into functions have been simple objects like strings, or structured objects like 阅读全文
posted @ 2011-08-16 23:36 牛皮糖NewPtone 阅读(1316) 评论(0) 推荐(0) 编辑
摘要:4.4Functions: The Foundation of Structured Programming 函数:结构化编程的基础 Functions provide an effective way to package and re-use program code, as already explained inSection 2.3. For example, suppose we find that we often want to read text from an HTML file. This involves several steps: opening the file, 阅读全文
posted @ 2011-08-13 23:59 牛皮糖NewPtone 阅读(1654) 评论(0) 推荐(0) 编辑
摘要:4.3Questions of Style 关于风格 Programming is as much an art as a science(编程作为一门像艺术一样的科学). The undisputed "bible" of programming, a 2,500 page multi-volume work by Donald Knuth, is calledThe Art of Computer Programming. Many books have been written onLiterate Programming, recognizing that huma 阅读全文
posted @ 2011-08-12 23:12 牛皮糖NewPtone 阅读(857) 评论(0) 推荐(0) 编辑
摘要:4.2Sequences序列 So far, we have seen two kinds of sequence object: strings and lists. Another kind of sequence is called atuple. Tuples are formed with the comma operator, and typically enclosed using parentheses. We've actually seen them in the previous chapters, and sometimes referred to them a 阅读全文
posted @ 2011-08-12 23:07 牛皮糖NewPtone 阅读(1314) 评论(0) 推荐(0) 编辑
摘要:Chapter 4 Writing Structured Programs编写结构化程序 By now you will have a sense of the capabilities of the Python programming language for processing natural language. However, if you're new to Python or to programming, you may still be wrestling with(努力对付) Python and not feel like you are in full con 阅读全文
posted @ 2011-08-11 22:34 牛皮糖NewPtone 阅读(912) 评论(0) 推荐(0) 编辑
摘要:3.12Exercises 练习 ☼ Define a string s = 'colorless'. Write a Python statement that changes this to "colourless" using only the slice and concatenation operations. ☼ We can use the slice notation to remove morphological endings on words. For example, 'dogs'[:-1] removes the l 阅读全文
posted @ 2011-08-11 22:25 牛皮糖NewPtone 阅读(2491) 评论(0) 推荐(0) 编辑
摘要:3.11Further Reading深入阅读 Extra materials for this chapter are posted at http://www.nltk.org/ , including links to freely available resources on the Web. Remember to consult the Python reference materials at http://docs.python.org/ . (For example, this documentation covers “universal newline support,” 阅读全文
posted @ 2011-08-11 22:21 牛皮糖NewPtone 阅读(608) 评论(0) 推荐(0) 编辑
摘要:3.10Summary小结 • In this book we view a text as a list of words. A “raw text” is a potentially long string containing words and whitespace formatting, and is how we typically store and visualize a text. • A string is specified in Python using single or double quotes:'Monty Python', "Mont 阅读全文
posted @ 2011-08-11 22:20 牛皮糖NewPtone 阅读(728) 评论(0) 推荐(0) 编辑
摘要:偶然看到一道趣味题,将一个正整数倒置过来,例如,将1234变成4321。如果采用C的话,使用do..while语句来做,只要temp不为0,每执行一次,rebmun用来存原整数的最右边的数字并乘以10。代码如下:#include<stdio.h>voidmain(){intnumber=0;intrebmun=0;inttemp=0;printf("\nEnteranumber:");scanf("%d",&number);temp=number;do{rebmun=10*rebmun+temp%10;temp=temp/10;}wh 阅读全文
posted @ 2011-08-10 18:04 牛皮糖NewPtone 阅读(1031) 评论(1) 推荐(0) 编辑
摘要:关于该部分的项目描述请见Project2 Percolation in Grids 网格渗透 测试的场景要求如下:· set n=75 · consider values of p from 0 to 1 in increments of 0.05 (or smaller) · for each value of p, generate 10 random grids and record for each algorithm the average running time on the ten grids 可是我在编写该部分的测试代码时,遇到了些麻烦,一开始还 阅读全文
posted @ 2011-08-10 14:22 牛皮糖NewPtone 阅读(402) 评论(0) 推荐(0) 编辑
摘要:此实验来自于part1中的具体要求:Next, consider grid sizes n = 10, 25, 50, and 75 and determine the percolation probabilities (you already know it for n=25). One way to visualize the performance for the different values of n is to make the same curve as above and show all three in one plot. Discuss how the size o. 阅读全文
posted @ 2011-08-10 13:28 牛皮糖NewPtone 阅读(815) 评论(0) 推荐(0) 编辑
摘要:迷茫童鞋的阅读指南该项目的详细描述和算法的具体说明请参见前一篇 Project2 Percolation in Grids 网格渗透本人给出了percolation_provided.py的函数说明目前已完成水波探测算法的实现 该算法的关键是实现explore函数中递归,及percolation_recursive对explore的初始调用。Step1: 国际惯例导入提供的函数:from percolation_provided import * 先考虑percolation_recursive函数,参数和前面的水波算法函数一样,定义为:percolation_recursive(input. 阅读全文
posted @ 2011-08-09 22:13 牛皮糖NewPtone 阅读(531) 评论(0) 推荐(0) 编辑
摘要:目的 检测字符串中是否包含某字符集合中的字符方法 最简洁的方法如下,清晰,通用,快速,适用于任何序列和容器defcontainAny(seq,aset):forcinseq:ifcinaset:returnTruereturnFalse 第二种适用itertools模块来可以提高一点性能,本质上与前者是同种方法(不过此方法违背了Python的核心观点:简洁,清晰)itertools.ifilter(predicate, iterable)的说明 Make an iterator that filters elements from iterable returning only those . 阅读全文
posted @ 2011-08-09 17:03 牛皮糖NewPtone 阅读(1746) 评论(0) 推荐(0) 编辑
摘要:目的 把字符串逐字符或逐词反转过来,这个蛮有意思的。方法 先看逐字符反转吧,第一种设置切片的步长为-1 revchars=astring[::-1]In[65]:x='abcd'In[66]:x[::-1]Out[66]:'dcba' 第二种做法是采用reversed(),注意它返回的是一个迭代器,可以用于循环或传递给其它的“累加器”,不是一个已完成的字符串。 revchars=''.join(reversed(astring)) In[56]:y=reversed(x)In[57]:yOut[57]:<reversedobjectat0x 阅读全文
posted @ 2011-08-09 15:43 牛皮糖NewPtone 阅读(661) 评论(0) 推荐(0) 编辑
摘要:目的 将一些小的字符串合并成一个大字符串,更多考虑的是性能 方法 常见的方法有以下几种:1.使用+=操作符 BigString=small1+small2+small3+...+smalln例如有一个片段pieces=['Today','is','really','a','good','day'],我们希望把它联起来BigString=''foreinpieces:BigString+=e+''或者用importoperatorBigString=reduce(ope 阅读全文
posted @ 2011-08-09 14:23 牛皮糖NewPtone 阅读(961) 评论(0) 推荐(0) 编辑
摘要:目的 获得一个首尾不含多余空格的字符串方法 可以使用字符串的以下方法处理:string.lstrip(s[, chars]) Return a copy of the string with leading characters removed. If chars is omitted or None, whitespace characters are removed. If given and not None, chars must be a string; the characters in the string will be stripped from the beginning. 阅读全文
posted @ 2011-08-09 12:37 牛皮糖NewPtone 阅读(1510) 评论(0) 推荐(0) 编辑
摘要:目的 实现字符串的左对齐,右对齐,居中对齐。方法 字符串内置了以下方法:其中width是指包含字符串S在内的宽度,fillchar默认是空格,也可以指定填充字符 string.ljust(s, width[, fillchar]) string.rjust(s, width[, fillchar]) string.center(s, width[, fillchar])In[6]:a='Hello!'In[7]:printa.ljust(10,'+')Hello!++++In[8]:printa.rjust(10,'+')++++Hello!In 阅读全文
posted @ 2011-08-09 12:23 牛皮糖NewPtone 阅读(907) 评论(0) 推荐(0) 编辑
摘要:目的 测试一个对象是否是字符串方法Python的字符串的基类是basestring,包括了str和unicode类型。一般可以采用以下方法:defisAString(anobj): returnisinstance(anobj,basestring) 不过以上方法对于UserString类的实例,无能无力。 In[30]:b=UserString.UserString('abc')In[31]:isAString(b)Out[31]:FalseIn[32]:type(b)Out[32]:<class'UserString.UserString'>Py 阅读全文
posted @ 2011-08-09 11:07 牛皮糖NewPtone 阅读(1221) 评论(0) 推荐(0) 编辑
摘要:正在施工中,请注意安全,谨防砖头瓦片掉落Python美味食谱第一章 文本 1.1 每次处理一个字符 1.2 字符和字符值间转换1.3 测试对象是否为类字符串1.4 字符串对齐1.5 去除字符串两段空格1.6 合并字符串1.7 将字符串逐字符或逐词反转 阅读全文
posted @ 2011-08-09 10:24 牛皮糖NewPtone 阅读(660) 评论(0) 推荐(0) 编辑
摘要:目的将一个字符转化为相应的ASCII或Unicode码,或相反的操作。方法 对于ASCII码(0~255范围) >>>printord('A')65>>>printchr(65)A对于Unicode字符,注意仅接收长度为1的Unicode字符>>>printord(u'\u54c8')21704>>>printunichr(21704)哈>>>printrepr(unichr(21704))u'\u54c8'chr()和str()区别,一个仅接收0~255的 阅读全文
posted @ 2011-08-08 21:32 牛皮糖NewPtone 阅读(524) 评论(0) 推荐(0) 编辑
摘要:关于Python美味食谱 开这个类别的主要目的是为了总结Python Cookbook上的知识和技巧,也为巩固自己的Python知识点。当然和书上会有所不同, 力求简明扼要。目的对字符串的每个字符进行处理,其实每个字符(Char)就是一个长度为1的字符串。方法1.使用内建函数list()>>>A_string='Python'>>>char_list=list(A_string)>>>char_list['P','y','t','h','o',& 阅读全文
posted @ 2011-08-08 16:39 牛皮糖NewPtone 阅读(848) 评论(1) 推荐(0) 编辑
摘要:本文中,我们将使用前面已经实现的水波渗透算法来测定对于固定大小的网格,在不同开放概率p下发生渗透的概率。关于该部分的具体说明如下:How many trials are needed to make a prediction on whether a grid generated with probability p percolates? How many different values of p should be considered to determine the percolation probability q? 需要做多少次实验才能确定一个按概率p产生的网格是否渗透? 需要考 阅读全文
posted @ 2011-08-08 14:09 牛皮糖NewPtone 阅读(619) 评论(0) 推荐(0) 编辑
摘要:水波探测算法的实现 迷茫童鞋的阅读指南该项目的详细描述和算法的具体说明请参见前一篇 Project2 Percolation in Grids 网格渗透本人也给出了percolation_provided.py的函数说明 俗话说得好:凡事说起来容易,做起来难。 博主没有谨慎思考打开IDE就是一顿乱敲,然后就是不停地修复各种BUG。下面谈谈该算法的实现以及遇到的问题和解决办法:Step1: 国际惯例导入提供的函数:from percolation_provided import * 再敲定函数名:percolation_wave(input_grid)。名字取好了就可以开始干活了,参数先暂定一个 阅读全文
posted @ 2011-08-07 23:10 牛皮糖NewPtone 阅读(736) 评论(0) 推荐(0) 编辑
摘要:3.9Formatting: From Lists to Strings 格式化:从列表到字符串 Often we write a program to report a single data item, such as a particular element in a corpus that meets some complicated criterion, or a single summary statistic such as a word-count or the performance of a tagger. More often, we write a program to 阅读全文
posted @ 2011-08-07 20:19 牛皮糖NewPtone 阅读(2611) 评论(0) 推荐(1) 编辑
摘要:3.8Segmentation 分割 This section discusses more advanced concepts, which you may prefer to skip on the first time through this chapter. Tokenization is an instance of a more general problem of segmentation. In this section, we will look at two other instances of this problem, which use radically(根本上) 阅读全文
posted @ 2011-08-06 22:46 牛皮糖NewPtone 阅读(1697) 评论(0) 推荐(0) 编辑
摘要:3.7Regular Expressions for Tokenizing Text 用正则表达式文本分词 Tokenization is the task of cutting a string into identifiable linguistic units that constitute a piece of language data. Although it is a fundamental task, we have been able to delay it until now because many corpora are already tokenized, and . 阅读全文
posted @ 2011-08-06 22:36 牛皮糖NewPtone 阅读(3708) 评论(0) 推荐(0) 编辑
摘要:3.6Normalizing Text 规格化文本 In earlier program examples we have often converted text to lowercase before doing anything with its words, e.g., set(w.lower() for w in text). By using lower(), we have normalized the text to lowercase so that the distinction between The and the is ignored. Often we want t 阅读全文
posted @ 2011-08-06 22:27 牛皮糖NewPtone 阅读(2162) 评论(0) 推荐(0) 编辑
摘要:3.5Useful Applications of Regular Expressions 正则表达式的有益应用 The previous examples all involved searching for words w that match some regular expression regexp using re.search(regexp, w). Apart from checking whether a regular expression matches a word, we can use regular expressions to extract material 阅读全文
posted @ 2011-08-06 16:08 牛皮糖NewPtone 阅读(2003) 评论(0) 推荐(0) 编辑
摘要:转载请注明出处“一块努力的牛皮糖”:http://www.cnblogs.com/yuxc/新手上路,翻译不恰之处,恳请指出,不胜感谢Updated log3.4Regular Expressions for Detecting Word Patterns 使用正则表达式检测词组 Many linguistic processing tasks involve pattern matching(模式匹配). For example, we can find words ending with ed using endswith('ed'). We saw a variety o 阅读全文
posted @ 2011-08-06 15:32 牛皮糖NewPtone 阅读(2220) 评论(0) 推荐(0) 编辑
摘要:3.3Text Processing with Unicode使用Unicode进行文字处理 Our programs will often need to deal with different languages, and different character sets. The concept of “plain text” is a fiction(虚构). If you live in the English-speaking world you probably use ASCII, possibly without realizing it. If you live in Eu 阅读全文
posted @ 2011-08-06 14:39 牛皮糖NewPtone 阅读(2820) 评论(0) 推荐(0) 编辑
摘要:转载请注明出处“一块努力的牛皮糖”:http://www.cnblogs.com/yuxc/新手上路,翻译不恰之处,恳请指出,不胜感谢 Updated log1st 2011.8.6 3.2Strings: Text Processing at the Lowest Level 字符串:最底层的文本处理PS:个人认为这部分很重要,字符串处理是NLP里最基本的部分,各位童鞋好好看,老鸟略过...It’s time to study a fundamental data type that we’ve been studiously(故意地) avoiding so far. In earlier 阅读全文
posted @ 2011-08-05 23:13 牛皮糖NewPtone 阅读(2553) 评论(0) 推荐(0) 编辑
摘要:CHAPTER 3Processing Raw Text 处理原始文本The most important source of texts is undoubtedly the Web. It’s convenient to have existing text collections to explore, such as the corpora we saw in the previous chapters. However, you probably have your own text sources in mind, and need to learn how to access t 阅读全文
posted @ 2011-08-05 21:51 牛皮糖NewPtone 阅读(2687) 评论(1) 推荐(0) 编辑
摘要:博主懒人...练习未做完,下回补全...2.8Exercises练习 1. ○ Create a variable phrase containing a list of words. Experiment with the operations described in this chapter, including addition, multiplication, indexing, slicing, and sorting. List_practice=['Hello','World!']List_practice+['Pythoner' 阅读全文
posted @ 2011-08-05 21:36 牛皮糖NewPtone 阅读(2408) 评论(0) 推荐(0) 编辑
摘要:转载请注明出处“一块努力的牛皮糖”:http://www.cnblogs.com/yuxc/新手上路,翻译不恰之处,恳请指出,不胜感谢2.7Further Reading深入阅读 Extra materials for this chapter are posted at http://www.nltk.org/ , including links to freely available resources on the Web. The corpus methods are summarized in the Corpus HOWTO, at http://www.nltk.org/howt 阅读全文
posted @ 2011-08-05 21:26 牛皮糖NewPtone 阅读(713) 评论(0) 推荐(0) 编辑
摘要:转载请注明出处“一块努力的牛皮糖”:http://www.cnblogs.com/yuxc/新手上路,翻译不恰之处,恳请指出,不胜感谢2.6Summary 小结 • A text corpus is a large, structured collection of texts. NLTK comes with many corpora, e.g., the Brown Corpus, nltk.corpus.brown. 文本语料库是一个大型的结构化的一系列的文本。NLTK包含了许多语料库,例如,Brown Corpus,nltk.corpus.brown。 • Some text corp 阅读全文
posted @ 2011-08-05 21:24 牛皮糖NewPtone 阅读(2047) 评论(1) 推荐(0) 编辑
摘要:新手上路,翻译不恰之处,恳请指出,不胜感谢Updated log1st:2011/8/6 2nd:新图标更换,原图标实在不喜欢那~相信有不少童鞋会喜欢~1.2 A Closer Look at Python: Texts as Lists of Words 进一步学习Python:将文本视作单词列表You’ve seen some important elements of the Python programming language. Let’s take a few moments to review them systematically.Lists 列表What is a text? 阅读全文
posted @ 2011-08-05 16:40 牛皮糖NewPtone 阅读(1783) 评论(0) 推荐(0) 编辑
摘要:这个问题是这个样子滴:有一个无序、元素个数为n(n为偶数)的正整数数组arr,要求:如何能把这个数组分割为元素个数为n/2的两个子数组,并使两个子数组的的和最接近。问题来源: http://hi.baidu.com/hell74111/blog/item/b6155d94f46717067bf48024.html我的思路是:(1)把数组拆成2个子数组A和B(2)用A中的每个元素与B中的每个元素比较,数组值之和的绝对值小于原来的值就交换其实并不难,关键在于我突然犯2了...我写了个测试数组a=[1,2,3,4,5,6].然后想当然地以为分成的两个数组各元素之和应该相等的。结果在那苦苦耗了半天,想 阅读全文
posted @ 2011-08-02 18:14 牛皮糖NewPtone 阅读(2067) 评论(1) 推荐(1) 编辑
摘要:给出任意一个十进制整数n,计算它的b进制展开式from __future__ import divisionimport mathdef baseb(b,q): aList=[] while q!=0: a=int(math.fmod(q,b)) q=math.floor(q/b) aList.append(str(a)) expansion=''.join(aList) print expansion运行结果如下>>> baseb(2,100)0010011 阅读全文
posted @ 2011-08-02 11:53 牛皮糖NewPtone 阅读(551) 评论(0) 推荐(0) 编辑
摘要:一道中学生的题目困扰了我好久啊,从吃晚饭时间到现在...求x的n次方的末两位数。令y=x**n,则y的末两位数与x的末两位数有关。规律就是从2开始,每20个数是一个循环View Code for x in range(2, 100): #x为底数 y=x%100 #y为幂的最后两位 s="" for n in range(2,201): #n为指数 init=y #init保留上一次y的值 y=y*x #每次乘以一个x y = y % 100 #对y用100取模,值为最后两位数 if y==init: # 如果本次算值与上次计算值相同,则不需继续计算 break if y= 阅读全文
posted @ 2011-08-01 19:20 牛皮糖NewPtone 阅读(753) 评论(0) 推荐(0) 编辑
摘要:这是一道从别人博客里看到的趣味题:[题目]1元钱一瓶汽水,喝完后两个空瓶换一瓶汽水,问:你有20元钱,最多可以喝到几瓶汽水?我想当然地以为是20—>10—>5—>2—>1,漏了一瓶...还是写个程序来解决一下:设每次买一瓶,攒够2个空瓶就换一瓶汽水:def qishui1( m ): s = 0 # 喝去的汽水瓶数 k = 0 #空瓶数 while m>0: m=m-1 #买 1瓶 s=s+1 k=k+1 while k==2: k=0 s=s+1 #换一瓶汽水,喝掉 k=k+1 #又多出来一个空瓶 return s,km = 20s,k=qishui1(m)pr 阅读全文
posted @ 2011-08-01 17:55 牛皮糖NewPtone 阅读(1511) 评论(1) 推荐(0) 编辑
摘要:作为一个Python菜鸟,之前一直懵懂于urllib和urllib2,以为2是1的升级版。今天看到老外写的一篇《Python: difference between urllib and urllib2》才明白其中的区别。You might be intrigued by the existence of two separate URL modules in Python -urllibandurllib2. Even more intriguing: they are not alternatives for each other. So what is the difference be 阅读全文
posted @ 2011-08-01 17:10 牛皮糖NewPtone 阅读(61179) 评论(0) 推荐(2) 编辑
摘要:最近在写的程序频繁地与文件操作打交道,这块比较弱,还好在百度上找到一篇不错的文章,这是原文传送门,我对原文稍做了些改动。有关文件夹与文件的查找,删除等功能 在os模块中实现。使用时需先导入这个模块,导入的方法是:import os一、取得当前目录s =os.getcwd()# s 中保存的是当前目录(即文件夹)比如运行abc.py,那么输入该命令就会返回abc所在的文件夹位置。举个简单例子,我们将abc.py放入A文件夹。并且希望不管将A文件夹放在硬盘的哪个位置,都可以在A文件夹内生成一个新文件夹。且文件夹的名字根据时间自动生成。import osimport timefolder = tim 阅读全文
posted @ 2011-08-01 16:32 牛皮糖NewPtone 阅读(85046) 评论(4) 推荐(5) 编辑
摘要:刚好用到,这篇文章写得不错,转过来收藏。 转载自 道可道 | Python 标准库 urllib2 的使用细节Python 标准库中有很多实用的工具类,但是在具体使用时,标准库文档上对使用细节描述的并不清楚,比如 urllib2 这个 HTTP 客户端库。这里总结了一些 urllib2 库的使用细节。1 Proxy 的设置 2 Timeout 设置 3 在 HTTP Request 中加入特定的 Header 4 Redirect 5 Cookie 6 使用 HTTP 的 PUT 和 DELETE 方法 7 得到 HTTP 的返回码 8 Debug Log 1 Proxy 的设置urllib2 阅读全文
posted @ 2011-08-01 16:23 牛皮糖NewPtone 阅读(140840) 评论(0) 推荐(9) 编辑

点击右上角即可分享
微信分享提示