Python自然语言处理学习笔记(29):4.1 回到基础
Chapter 4
Writing Structured Programs 编写结构化程序
By now you will have a sense of the capabilities of the Python programming language for processing natural language. However, if you're new to Python or to programming, you may still be wrestling with(努力对付) Python and not feel like you are in full control yet. In this chapter we'll address the following questions:
- How can you write well-structured, readable programs that you and others will be able to re-use easily?
你如何能写出结构良好,可读性佳的程序使得他人可以很方便地重用?
- How do the fundamental building blocks work, such as loops, functions and assignment?
例如循环、函数以及赋值等这些基础的构建模型是如何运转的?
- What are some of the pitfalls with Python programming and how can you avoid them?
Python编程有哪些陷阱,我们如何避免?
Along the way, you will consolidate(巩固)your knowledge of fundamental programming constructs, learn more about using features of the Python language in a natural and concise way, and learn some useful techniques in visualizing natural language data. As before, this chapter contains many examples and exercises (and as before, some exercises introduce new material). Readers new to programming should work through them carefully and consult other introductions to programming if necessary; experienced programmers can quickly skim this chapter.
In the other chapters of this book, we have organized the programming concepts as dictated by the needs of NLP. Here we revert to(回到) a more conventional approach where the material is more closely tied to the structure of the programming language. There's not room for a complete presentation of the language, so we'll just focus on the language constructs(设计) and idioms that are most important for NLP.
4.1 Back to the Basics 回到基础
Assignment 赋值
Assignment would seem to be the most elementary programming concept, not deserving a separate discussion. However, there are some surprising subtleties(微妙) here. Consider the following code fragment:
|
This behaves exactly as expected. When we write bar = foo in the above code , the value of foo (the string 'Monty') is assigned to bar. That is, bar is a copy of foo, so when we overwrite foo with a new string 'Python' on line , the value of bar is not affected.
However, assignment statements do not always involve making copies in this way. Assignment always copies the value of an expression, but a value is not always what you might expect it to be. In particular, the "value" of a structured object such as a list is actually just a reference to the object(例如列表这样结构化对象的“值”实际上只是一个对象的引用而已). In the following example, assigns the reference of foo to the new variable bar. Now when we modify something inside foo on line , we can see that the contents of bar have also been changed.
|
Figure 4.1: List Assignment and Computer Memory: Two list objects foo and bar reference the same location in the computer's memory; updating foo will also modify bar, and vice versa.
The line bar = foo does not copy the contents of the variable, only its "object reference". To understand what is going on here, we need to know how lists are stored in the computer's memory. In Figure 4.1, we see that a list foo is a reference to an object stored at location 3133 (which is itself a series of pointers to other locations holding strings). When we assign bar = foo, it is just the object reference 3133 that gets copied. This behavior extends to other aspects of the language, such as parameter passing (Section 4.4).
Let's experiment some more, by creating a variable empty holding the empty list, then using it three times on the next line.
|
Observe that changing one of the items inside our nested list of lists changed them all. This is because each of the three elements is actually just a reference to one and the same list in memory.
Note
Your Turn: Use multiplication to create a list of lists: nested = [[]] * 3. Now modify one of the elements of the list, and observe that all the elements are changed. Use Python's id() function to find out the numerical identifier for any object, and verify that id(nested[0]), id(nested[1]), and id(nested[2]) are all the same. Now, notice that when we assign a new value to one of the elements of the list, it does not propagate(传送) to the others:
|
We began with a list containing three references to a single empty list object. Then we modified that object by appending 'Python' to it, resulting in a list containing three references to a single list object ['Python']. Next, we overwrote one of those references with a reference to a new object ['Monty']. This last step modified one of the three object references inside the nested list. However, the ['Python'] object wasn't changed, and is still referenced from two places in our nested list of lists. It is crucial to appreciate this difference between modifying an object via an object reference, and overwriting an object reference.(认识到通过对象引用来修改一个对象和重写对象引用之间的区别是非常重要的)
Note
Important: To copy the items from a list foo to a new list bar, you can write bar = foo[:]. This copies the object references inside the list. To copy a structure without copying any object references, use copy.deepcopy().(浅拷贝和深拷贝)
Equality 等式
Python provides two ways to check that a pair of items are the same. The is operator tests for object identity(is操作符是用于测试对象等同). We can use it to verify our earlier observations about objects. First we create a list containing several copies of the same object, and demonstrate that they are not only identical(完全相同的) according to ==, but also that they are one and the same object:(同一个对象)
|
Now let's put a new python in this nest. We can easily show that the objects are not all identical:
|
You can do several pairwise(成对的) tests to discover which position contains the interloper(入侵者,这里指不是相同的对象), but the id() function makes detection easier:
|
This reveals that the second item of the list has a distinct identifier. If you try running this code snippet yourself, expect to see different numbers in the resulting list, and also the interloper may be in a different position.
Having two kinds of equality might seem strange. However, it's really just the type-token distinction, familiar from natural language, here showing up in a programming language.
Conditionals 条件语句
In the condition part of an if statement, a nonempty(非空的) string or list is evaluated as true, while an empty string or list evaluates as false.
|
That is, we don't need to say if len(element) > 0: in the condition.
What's the difference between using if...elif as opposed to using a couple of if statements in a row? Well, consider the following situation:
|
Since the if clause of the statement is satisfied, Python never tries to evaluate the elif clause, so we never get to print out 2. By contrast, if we replaced the elif by an if, then we would print out both 1 and 2. So an elif clause potentially gives us more information than a bare if clause; when it evaluates to true, it tells us not only that the condition is satisfied, but also that the condition of the main if clause was not satisfied.
The functions all() and any() can be applied to a list (or other sequence) to check whether all or any items meet some condition(这个之前还真不知道,好方法):
|