[zz]Python 使用 UTF-8 编码 & Defining Python Source Code Encodings

Python 使用 UTF-8 编码

发表于:2010年1月28日 | 分类:Python | 标签: utf8 | views(3,059)

版权信息: 可以任意转载, 转载时请务必以超链接形式标明文章原文出处, 即下面的声明.

原文出处:http://blog.chenlb.com/2010/01/python-use-utf-8.html

一般我喜欢用 utf-8 编码,在 python 怎么使用呢?

1、在 python 源码文件中用 utf-8 文字。一般会报错,如下:

File "F:\workspace\psh\src\test.py", line 2
SyntaxError: Non-ASCII character '\xe4' in file F:\workspace\psh\src\test.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

test.py 的内容:

  1. print "你好"  
print "你好"

如果要正常运行在 test.py 文件前面加编码注释,如:

  1. #!/usr/bin/python2.6  
  2. # -*- coding: utf-8 -*-  
  3. print "你好" 



http://www.python.org/dev/peps/pep-0263/

Defining Python Source Code Encodings

Defining the Encoding

    Python will default to ASCII as standard encoding if no other
encoding hints are given.

To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:

# coding=<encoding name>

or (using formats recognized by popular editors)

#!/usr/bin/python
# -*- coding: <encoding name> -*-

or

#!/usr/bin/python
# vim: set fileencoding=<encoding name> :

More precisely, the first or second line must match the regular
expression "coding[:=]\s*([-\w.]+)". The first group of this
expression is then interpreted as encoding name. If the encoding
is unknown to Python, an error is raised during compilation. There
must not be any Python statement on the line that contains the
encoding declaration.

To aid with platforms such as Windows, which add Unicode BOM marks
to the beginning of Unicode files, the UTF-8 signature
'\xef\xbb\xbf' will be interpreted as 'utf-8' encoding as well
(even if no magic encoding comment is given).

If a source file uses both the UTF-8 BOM mark signature and a
magic encoding comment, the only allowed encoding for the comment
is 'utf-8'. Any other encoding will cause an error.

Examples

    These are some examples to clarify the different styles for
defining the source code encoding at the top of a Python source
file:

1. With interpreter binary and using Emacs style file encoding
comment:

#!/usr/bin/python
# -*- coding: latin-1 -*-
import os, sys
...

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import os, sys
...

#!/usr/bin/python
# -*- coding: ascii -*-
import os, sys
...

2. Without interpreter line, using plain text:

# This Python file uses the following encoding: utf-8
import os, sys
...

3. Text editors might have different ways of defining the file's
encoding, e.g.

#!/usr/local/bin/python
# coding: latin-1
import os, sys
...

4. Without encoding comment, Python's parser will assume ASCII
text:

#!/usr/local/bin/python
import os, sys
...

5. Encoding comments which don't work:

Missing "coding:" prefix:

#!/usr/local/bin/python
# latin-1
import os, sys
...

Encoding comment not on line 1 or 2:

#!/usr/local/bin/python
#
# -*- coding: latin-1 -*-
import os, sys
...

Unsupported encoding:

#!/usr/local/bin/python
# -*- coding: utf-42 -*-
import os, sys
...

posted @ 2011-03-21 12:39  bettermanlu  阅读(2725)  评论(0编辑  收藏  举报