[zz]Python 使用 UTF-8 编码 & Defining Python Source Code Encodings

Python 使用 UTF-8 编码

一般我喜欢用 utf-8 编码,在 python 怎么使用呢?

1、在 python 源码文件中用 utf-8 文字。一般会报错,如下:

File "F:\workspace\psh\src\test.py", line 2
SyntaxError: Non-ASCII character '\xe4' in file F:\workspace\psh\src\test.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

test.py 的内容:

  1. print "你好"  
print "你好"

如果要正常运行在 test.py 文件前面加编码注释,如:

  1. #!/usr/bin/python2.6  
  2. # -*- coding: utf-8 -*-  
  3. print "你好" 


Defining Python Source Code Encodings

Defining the Encoding

    Python will default to ASCII as standard encoding if no other
encoding hints are given.

To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:

# coding=<encoding name>

or (using formats recognized by popular editors)

# -*- coding: <encoding name> -*-


# vim: set fileencoding=<encoding name> :

More precisely, the first or second line must match the regular
expression "coding[:=]\s*([-\w.]+)". The first group of this
expression is then interpreted as encoding name. If the encoding
is unknown to Python, an error is raised during compilation. There
must not be any Python statement on the line that contains the
encoding declaration.

To aid with platforms such as Windows, which add Unicode BOM marks
to the beginning of Unicode files, the UTF-8 signature
'\xef\xbb\xbf' will be interpreted as 'utf-8' encoding as well
(even if no magic encoding comment is given).

If a source file uses both the UTF-8 BOM mark signature and a
magic encoding comment, the only allowed encoding for the comment
is 'utf-8'. Any other encoding will cause an error.


    These are some examples to clarify the different styles for
defining the source code encoding at the top of a Python source

1. With interpreter binary and using Emacs style file encoding

# -*- coding: latin-1 -*-
import os, sys

# -*- coding: iso-8859-15 -*-
import os, sys

# -*- coding: ascii -*-
import os, sys

2. Without interpreter line, using plain text:

# This Python file uses the following encoding: utf-8
import os, sys

3. Text editors might have different ways of defining the file's
encoding, e.g.

# coding: latin-1
import os, sys

4. Without encoding comment, Python's parser will assume ASCII

import os, sys

5. Encoding comments which don't work:

Missing "coding:" prefix:

# latin-1
import os, sys

Encoding comment not on line 1 or 2:

# -*- coding: latin-1 -*-
import os, sys

Unsupported encoding:

# -*- coding: utf-42 -*-
import os, sys

