yaml简介以及在python上的应用

2009-05-13 javaeye http://angeloce.iteye.com/admin/blogs/385976

==================================================

 

YAML是一种直观的能够被电脑识别的的数据序列化格式,容易被人类阅读,并且容易和脚本语言交互。YAML类似于XML,但是语法比XML简单得多,对于转化成数组或可以hash的数据时是很简单有效的。

 

YAML语法规则:

  http://www.ibm.com/developerworks/cn/xml/x-cn-yamlintro/

  http://www.yaml.org/

 

YAML被很多人认为是可以超越xml和json的文件格式。对比xml,除了拥有xml的众多优点外,它足够简单,易于使用。而对于json,YAML可以写成规范化的配置文件(这我认为是高于json很多的优点,用json写配置文件会让人发疯)。

  YAML使用寄主语言的数据类型,这在多种语言中流传的时候可能会引起兼容性的问题。

 

如何写yaml?(抄的)

name: Tom Smith
age: 37
spouse:
    name: Jane Smith
    age: 25
children:
 - name: Jimmy Smith
   age: 15
 - name1: Jenny Smith
   age1: 12

 

具体语法请参照yaml语法规则。

 

--------------------------------------------------------------------------------------------

 

yaml在python上的具体实现:PyYaml

 

将yaml写成配置脚本test.yaml ,以下介绍如何读写yaml配置。

 

使用python的yaml库PyYAML。http://pyyaml.org/

 

安装到python lib下后就可以正常使用了。

 

#加载yaml
import yaml

#读取文件
f = open('test.yaml')

#导入
x = yaml.load(f)

print x

 

也许你会得到以下类似的strings:

{'age': 37, 'spouse': {'age': 25, 'name': 'Jane Smith'}, 'name': 'Tom Smith', 'children': [{'age': 15, 'name': 'Jimmy Smith'}, {'age1': 12, 'name1': 'Jenny Smith'}]}

 

 python上使用yaml库很简单,基本就使用两个函数:

 

yaml.load

 

yaml.dump

 

对于使用过pickle的各位童鞋来说,这意味着什么不用详说了吧?

 

Warning: It is not safe to call yaml.load with any data received from an untrusted source!yaml.load is as powerful as pickle.load and so may call any Python function.

 

对于yaml的读取来讲,最难的在于写出正确的yaml数据格式。如果一不小心出错,将会导致load异常,但有时没有异常报,而是会读不出任何数据。

 

pyYaml是完全的python实现,号称比pickle更nb。(这谁知道呢?)

 

yaml.load accepts a byte string, a Unicode string, an open binary file object, or an open text file object. A byte string or a file must be encoded with utf-8utf-16-be or utf-16-le encoding. yaml.loaddetects the encoding by checking the BOM (byte order mark) sequence at the beginning of the string/file. If no BOM is present, the utf-8 encoding is assumed.

 

yaml.load可接收一个byte字符串,unicode字符串,打开的二进制文件或文本文件对象。字节字符串和文件必须是utf-8,utf-16-be或utf-16-le编码的.yaml.load通过检查字符串/文件开始的BOM(字节序标记)来确认编码。如果没有BOM,就默认为utf-8。

 

百度上的关于BOM
    在UCS 编码中有一个叫做"ZERO WIDTH NO-BREAK SPACE"的字符,它的编码是FEFF。而FFFE在UCS中是不存在的字符,所以不应该出现在实际传输中。UCS规范建议我们在传输字节流前,先传输字符"ZERO WIDTH NO-BREAK SPACE"。这样如果接收者收到FEFF,就表明这个字节流是Big-Endian的;如果收到FFFE,就表明这个字节流是Little- Endian的。因此字符"ZERO WIDTH NO-BREAK SPACE"又被称作BOM。 
    UTF-8不需要BOM来表明字节顺序,但可以用BOM来表明编码方式。字符"ZERO WIDTH NO-BREAK SPACE"的UTF-8编码是EF BB BF。所以如果接收者收到以EF BB BF开头的字节流,就知道这是UTF-8编码了。Windows就是使用BOM来标记文本文件的编码方式的。

 

 yaml.load 会返回一个python对象。关于会是什么……看你数据是什么了……

 

 

If a string or a file contains several documents, you may load them all with the yaml.load_all function.

 

如果string或文件包含几块yaml文档,你可以使用yaml.load_all来解析全部的文档。

 

yaml.load(stream, Loader=<class 'yaml.loader.Loader'>)
    Parse the first YAML document in a stream #只解析第一个
    and produce the corresponding Python object.

yaml.load_all(stream, Loader=<class 'yaml.loader.Loader'>)
    Parse all YAML documents in a stream
    and produce corresponding Python objects.

 

yaml.load_all 会生成一个迭代器,你要做的就是for 读出来

 

documents = """
name: The Set of Gauntlets 'Pauraegen'
description: >
  A set of handgear with sparks that crackle
  across its knuckleguards.
 ---
name: The Set of Gauntlets 'Paurnen'
description: >
   A set of gauntlets that gives off a foul,
   acrid odour yet remains untarnished.
 ---
name: The Set of Gauntlets 'Paurnimmen'
description: >
   A set of handgear, freezing with unnatural cold.
"""


for data in yaml.load_all(documents):
print data

#{'description': 'A set of handgear with sparks that crackle across its #knuckleguards.\n',
#'name': "The Set of Gauntlets 'Pauraegen'"}
#{'description': 'A set of gauntlets that gives off a foul, acrid odour #yet remains untarnished.\n',
#'name': "The Set of Gauntlets 'Paurnen'"}
#{'description': 'A set of handgear, freezing with unnatural cold.\n',
#'name': "The Set of Gauntlets 'Paurnimmen'"}

 

PyYAML allows you to construct a Python object of any type.

Even instances of Python classes can be constructed using the !!python/object tag.

 

PyYaml允许你构建任何类型的python对象,甚至是python类实例,只需要借助一下yaml标签!!python/object。

这个以后再说,非常有用的东西。

 

Note that the ability to construct an arbitrary Python object may be dangerous if you receive a YAML document from an untrusted source such as Internet. The function yaml.safe_load limits this ability to simple Python objects like integers or lists.

 

需要注意的是随意在yaml里构建python对象是有一定危险的,尤其是接收到一个未知的yaml文档。yaml.safe_load可以限制这个能力,就使用些简单的对象吧。

 

 ---------------------------------------

Dumping YAML

 

The yaml.dump function accepts a Python object and produces a YAML document.

 

yaml.dump 将一个python对象生成为yaml文档,与yaml.load搭配使用。

dump(data, stream=None, Dumper=<class 'yaml.dumper.Dumper'>, **kwds)

    Serialize a Python object into a YAML stream.
    If stream is None, return the produced string instead.
    #很好,如果缺省数据流为空的话,就会给你返回个字符串作为yaml文档

 

 

aproject = {'name': 'Silenthand Olleander', 
                   'race': 'Human',
                    'traits': ['ONE_HAND', 'ONE_EYE']
                   }


print yaml.dump(aproject)

#返回
#name: Silenthand Olleander
#race: Human
#traits: [ONE_HAND, ONE_EYE]

 

 

 

 

yaml.dump accepts the second optional argument, which must be an open text or binary file. In this case, yaml.dump will write the produced YAML document into the file. Otherwise, yaml.dump returns the produced document. 

 

 解释上面那句话的:yaml.dump接收的第二个参数一定要是一个打开的文本文件或二进制文件,yaml.dump会把生成的yaml文档写到文件里。否则,yaml.dump会返回生成的文档。

 

If you need to dump several YAML documents to a single stream, use the function yaml.dump_all.yaml.dump_all accepts a list or a generator producing

Python objects to be serialized into a YAML document. The second optional argument is an open file.

 

如果你需要把几段yaml文档同时写进一个数据流中,请使用yaml.dump_all函数。yaml.dump_all可以接收一个列表或者生成python对象的可序列化生成器(好别扭啊),第二个参数是打开的文件。这完全是对应yaml.load_all的。

 

You may even dump instances of Python classes.

 

你甚至可以直接把python类的实例(对象)dump进去。

 

yaml.dump supports a number of keyword arguments that specify formatting details for the emitter. For instance, you may set the preferred intendation and width, use the canonical YAML format or force preferred style for scalars and collections.

 

yaml.dump支持很多种确定格式化发射器的关键字参数(请先无视这句- -#)。比如你可以设置缩进和宽度(指的yaml文档),使用标准yaml格式或者强制优先样式对于标量和收集(请继续无视- -#)。

 

瞧这翻译的。

 

dump_all(documents, stream=None, Dumper=<class 'yaml.dumper.Dumper'>, default_style=None, default_flow_style=None, canonical=None, indent=None, width=None, allow_unicode=None, line_break=None, encoding='utf-8', explicit_start=None, explicit_end=None, version=None, tags=None)


#不过对应具体的函数参数可以看出所叙述的几个参数
#cannonical
#indent
#width
#等等

 

举例

>>> print yaml.dump(range(50))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
  23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
  43, 44, 45, 46, 47, 48, 49]

>>> print yaml.dump(range(50), width=50, indent=4)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
    16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,
    28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
    40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

>>> print yaml.dump(range(5), canonical=True)
---
!!seq [
  !!int "0",
  !!int "1",
  !!int "2",
  !!int "3",
  !!int "4",
]

>>> print yaml.dump(range(5), default_flow_style=False)
- 0
- 1
- 2
- 3
- 4

>>> print yaml.dump(range(5), default_flow_style=True, default_style='"')
[!!int "0", !!int "1", !!int "2", !!int "3", !!int "4"]

 

这关键都在后面的参数呢。

 

------------------------------------------------------

 

 Constructors, representers, resolvers

 

构造器,描绘器(?),解析器

 

You may define your own application-specific tags. The easiest way to do it is to define a subclass ofyaml.YAMLObject

 

你可以自定义一个程序专属标签(tag),定义一个yaml.YAMLObject的子类的最简单方法可以这么干:

 

 

class Monster(yaml.YAMLObject):
    yaml_tag = u'!Monster'
    def __init__(self, name, hp, ac, attacks):
        self.name = name
        self.hp = hp
        self.ac = ac
        self.attacks = attacks
    def __repr__(self):
        return "%s(name=%r, hp=%r, ac=%r, attacks=%r)" % (
            self.__class__.__name__, self.name, self.hp, self.ac,self.attacks)

 

 

The above definition is enough to automatically load and dump Monster objects:

 

    上面这个定义的Monster类已经足够用来load和dump了:

 

>>> yaml.load("""
... --- !Monster
... name: Cave spider
... hp: [2,6]    # 2d6
... ac: 16
... attacks: [BITE, HURT]
... """)

Monster(name='Cave spider', hp=[2, 6], ac=16, attacks=['BITE', 'HURT'])

>>> print yaml.dump(Monster(
...     name='Cave lizard', hp=[3,6], ac=16, attacks=['BITE','HURT']))

!Monster
ac: 16
attacks: [BITE, HURT]
hp: [3, 6]
name: Cave lizard

 

 

 

yaml.YAMLObject uses metaclass magic to register a constructor, which transforms a YAML node to a class instance, and a representer, which serializes a class instance to a YAML node.

 

yaml.YAMLObject 使用魔法元类注册一个把yaml编码转成类实例的构造器,还有一个把类实例序列化成yaml编码的描述器。

 

If you don't want to use metaclasses, you may register your constructors and representers using the functions yaml.add_constructor and yaml.add_representer. For instance, you may want to add a constructor and a representer for the following Dice class:

 

如果不想使用元类,也可以使用函数yaml.add_constructor和yaml.add_representer来注册构造器和描述器。例如,你可以把一个构造器和描述器加到下面这个Dice类里:

 

>>> class Dice(tuple):
...     def __new__(cls, a, b):
...         return tuple.__new__(cls, [a, b])
...     def __repr__(self):
...         return "Dice(%s,%s)" % self

>>> print Dice(3,6)
Dice(3,6)

 

 

The default representation for Dice objects is not nice:

 

这个Dice对象默认的yaml描述可不怎么好看:

 

>>> print yaml.dump(Dice(3,6))

!!python/object/new:__main__.Dice
- !!python/tuple [3, 6]

 

 

Suppose you want a Dice object to represented as AdB in YAML:

 

好,现在假设你想把Dice对象描述成在yaml里为"AdB"的形式(A,B为变量)。

 

First we define a representer that convert a dice object to scalar node with the tag !dice and register it.

 

首先我们定义一个可以把Dice对象转换成带有'!dice'标签节点的描述器,然后注册。

 

>>> def dice_representer(dumper, data):
...     return dumper.represent_scalar(u'!dice', u'%sd%s' % data)

>>> yaml.add_representer(Dice, dice_representer)

 

 

Now you may dump an instance of the Dice object:

 

现在你就可以dump一个Dice实例了:

 

>>> print yaml.dump({'gold': Dice(10,6)})
{gold: !dice '10d6'}

 

Let us add the code to construct a Dice object:

 

让我们把节点加到Dice对象的构造器中。

 

>>> def dice_constructor(loader, node):
...     value = loader.construct_scalar(node)
...     a, b = map(int, value.split('d'))
...     return Dice(a, b)

>>> yaml.add_constructor(u'!dice', dice_constructor)

 

 

Then you may load a Dice object as well:

 

然后就可以使用了

 

>>> print yaml.load("""
... initial hit points: !dice 8d4
... """)

{'initial hit points': Dice(8,4)}

 

 

从这里可以看出了,constructor和representer是相对的,一个为load,一个为dump。

 

 

-------------------------------------------------------

 

以上大多数来自 http://pyyaml.org/wiki/PyYAMLDocumentation

 

posted on 2013-01-05 11:55  I'm morning  阅读(28735)  评论(0编辑  收藏  举报

导航