Python自然语言处理学习笔记(11):2.3 代码重用
2.3 More Python: Reusing Code 代码重用
By this time you’ve probably typed and retyped a lot of code in the Python interactive interpreter. If you mess up(弄得一团糟) when retyping a complex example, you have to enter it again. Using the arrow keys to access and modify previous commands is helpful but only goes so far. In this section, we see two important ways to reuse code: text editors and Python functions.
Creating Programs with a Text Editor 使用文本编辑器创建程序
The Python interactive interpreter performs your instructions as soon as you type them.Often, it is better to compose a multiline program using a text editor, then ask Python to run the whole program at once. Using IDLE, you can do this by going to the File menu and opening a new window. Try this now, and enter the following one-line program:
print 'Monty Python'
Save this program in a file called monty.py, then go to the Run menu and select the command Run Module. (We’ll learn what modules are shortly.) The result in the main
IDLE window should look like this:
>>>
Monty Python
>>>
You can also type from monty import * and it will do the same thing.
From now on, you have a choice of using the interactive interpreter or a text editor to create your programs. It is often convenient to test your ideas using the interpreter, revising(修改) a line of code until it does what you expect. Once you’re ready, you can paste the code (minus any >>> or ... prompts) into the text editor, continue to expand it, and finally save the program in a file so that you don’t have to type it in again later. Give the file a short but descriptive name, using all lowercase letters and separating words with underscore, and using the .py filename extension, e.g., monty_python.py.
Important: Our inline(内联的)code examples include the >>> and ... prompts as if (好像)we are interacting directly with the interpreter. As they get more complicated, you should instead type them into the editor, without the prompts, and run them from the editor as shown earlier. When we provide longer programs in this book, we will leave out the prompts to remind you to type them into a file rather than using the interpreter. You can see this already in Example 2-1. Note that the example still includes a couple of lines with the Python prompt; this is the interactive part of the task where you inspect some data and invoke a function. Remember that all code samples like Example 2-1 are downloadable from http://www.nltk.org/ .
Functions 函数
Suppose that you work on analyzing text that involves different forms of the same word, and that part of your program needs to work out the plural form(复数形式) of a given singular noun(单数名词). Suppose it needs to do this work in two places, once when it is processing some texts and again when it is processing user input. Rather than repeating the same code several times over, it is more efficient and reliable to localize this work inside a function. A function is just a named block of code that performs some well-defined task, as we saw in Section 1.1. A function is usually defined to take some inputs, using special variables known as parameters, and it may produce a result, also known as a return value. We define a function using the keyword def followed by the function name and any input parameters, followed by the body of the function. Here’s the function we saw in Section 1.1 (including the import statement that makes division behave as expected):
>>> def lexical_diversity(text):
... return len(text) / len(set(text))
We use the keyword return to indicate the value that is produced as output by the function. In this example, all the work of the function is done in the return statement. Here’s an equivalent definition that does the same work using multiple lines of code. We’ll change the parameter name from text to my_text_data to remind you that this is an arbitrary choice:
... word_count = len(my_text_data)
... vocab_size = len(set(my_text_data))
... diversity_score = word_count / vocab_size
... return diversity_score
Notice that we’ve created some new variables inside the body of the function. These are local variables(局部变量) and are not accessible outside the function. So now we have defined a function with the name lexical_diversity. But just defining it won’t produce any output! Functions do nothing until they are “called” (or “invoked”被调用).
Let’s return to our earlier scenario, and actually define a simple function to work out English plurals. The function plural() in Example 2-2 takes a singular noun and generates a plural form, though it is not always correct. (We’ll discuss functions at greater length in Section 4.4.)
Example 2-2. A Python function: This function tries to work out the plural form of any English noun; the keyword def (define) is followed by the function name, then a parameter inside parentheses, and a colon; the body of the function is the indented block of code; it tries to recognize patterns within the word and process the word accordingly; e.g., if the word ends with y, delete they and add ies.
if word.endswith('y'):
return word[:-1] + 'ies'
elif word[-1] in 'sx' or word[-2:] in ['sh', 'ch']:
return word + 'es'
elif word.endswith('an'):
return word[:-2] + 'en'
else:
return word + 's'
>>> plural('fairy')
'fairies'
>>> plural('woman')
'women'
The endswith() function is always associated with a string object(字符串对象) (e.g., word in Example 2-2). To call such functions, we give the name of the object, a period, and then the name of the function. These functions are usually known as methods.
Modules 模块
Over time(随着时间的过去)you will find that you create a variety of useful little text-processing functions, and you end up copying them from old programs to new ones. Which file contains the latest version of the function you want to use? It makes life a lot easier if you can collect your work into a single place, and access previously defined functions without making copies.
To do this, save your function(s) in a file called (say) textproc.py. Now, you can access your work simply by importing it from the file:
>>> plural('wish')
wishes
>>> plural('fan')
fen
Our plural function obviously has an error, since the plural of fan is fans. Instead of typing in a new version of the function, we can simply edit the existing one. Thus, at every stage, there is only one version of our plural function, and no confusion about which one is being used.
A collection of variable and function definitions in a file is called a Python module(存一些放在文件中的变量和函数定义称为Python的模块). A collection of related modules is called a package. NLTK’s code for processing the Brown Corpus is an example of a module, and its collection of code for processing all the different corpora is an example of a package. NLTK itself is a set of packages, sometimes called a library(库).
Caution!
If you are creating a file to contain some of your Python code, do not name your file nltk.py: it may get imported in place of the “real” NLTK package. When it imports modules, Python first looks in the current directory (folder).