XML::Simple 简单的读写xml(尤其适合配置文件)

语法:

    use XML::Simple qw(:strict);

    my $ref = XMLin([<xml file or string>] [, <options>]);

    my $xml = XMLout($hashref [, <options>]);

快速开始:

假设你有一个foo.xml配置文件:

<config logdir="/var/log/foo/" debugfile="/tmp/foo.debug">
    <server name="sahara" osname="solaris" osversion="2.6">
      <address>10.0.0.101</address>
      <address>10.0.1.101</address>
    </server>
    <server name="gobi" osname="irix" osversion="6.5">
      <address>10.0.0.102</address>
    </server>
    <server name="kalahari" osname="linux" osversion="2.0.34">
      <address>10.0.0.103</address>
      <address>10.0.1.103</address>
    </server>
  </config>

foo 文件代码如下:

  use XML::Simple qw(:strict);

  my $config = XMLin(undef, KeyAttr => { server => 'name' }, ForceArray => [ 'server', 'address' ]);

配置文件将产生一个hashref索引到$config,使用Data::Dumper产生结构如下:

{
      'logdir'        => '/var/log/foo/',
      'debugfile'     => '/tmp/foo.debug',
      'server'        => {
          'sahara'        => {
              'osversion'     => '2.6',
              'osname'        => 'solaris',
              'address'       => [ '10.0.0.101', '10.0.1.101' ]
          },
          'gobi'          => {
              'osversion'     => '6.5',
              'osname'        => 'irix',
              'address'       => [ '10.0.0.102' ]
          },
          'kalahari'      => {
              'osversion'     => '2.0.34',
              'osname'        => 'linux',
              'address'       => [ '10.0.0.103', '10.0.1.103' ]
          }
      }
  }

例子中,ForceArray选项被用来列出可能出现很多次的元素,因此结果中用arrayrefs 来表示(即使只有一个元素)

KeyAttr选项的含义是,每一个<server>中有独一无二的属性name,这让你可以根据hash 索引的key来访问每个server记录

简介:

XML::Simple  提供简单的API 层在xml解析模块之上。 提供两个函数:XMLin() and XMLout().

提示:你也可以显示的声明使用小写的方法:xml_in() and xml_out().

XMLin() 方法:

XMLin()可选参数如下:

filename:文件名称

undef: 寻找脚本同名的xml文件

xml字符串

IO::Handle object  

XMLout()   方法

转换一个数据机构(尤其是一个hashref)同时返回一个encode编码的xml。

XMLout()函数也能被用来输出sax 事件的xml输出。

当hash键有”-”将被略过

选项:

因为有众多的可选参数,对于新手很难知道哪个是重要的,这里是两个你必须知道的

ForceArray

KeyAttr

 'important'  # don't use the module until you understand this one
 'handy'       #you can skip this on the first time through
 'advanced' # you can skip this on the second time through
 'SAX only'   #don't worry about this unless you're using SAX (or
                  # alternatively if you need this, you also need SAX)
 'seldom used' # you'll probably never use this unless you were the
                 person that requested the feature

可选参数列表:

AttrIndent => 1 # out - handy

输出时,通过此属性可以使属性不再一行,而是采用缩进的方式。

Cache => [ cache schemes ] # in - advanced

Because loading the XML::Parser module and parsing an XML file can consume a significant number of CPU cycles, it is often desirable to cache the output of XMLin() for later reuse.

ContentKey => 'keyname' # in+out - seldom used

When text content is parsed to a hash value, this option let's you specify a name for the hash key to override the default 'content'.

DataHandler => code_ref # in - SAX only
When you use an XML::Simple object as a SAX handler, it will return a 'simple tree' data structure in the same format as XMLin() would return. If this option is set (to a subroutine reference), then when the tree is built the subroutine will be called and passed two arguments: a reference to the XML::Simple object and a reference to the data tree. The return value from the subroutine will be returned to the SAX driver. (See "SAX SUPPORT" for more details).

 

ForceArray => 1 # in - important

选项设置成1,即使只有一个元素在嵌套的时候也会转成数组。

ForceArray => [ names ] # in - important

允许指定转成数组的列表,不仅仅是‘all' 或者 不转。

 

ForceContent => 1 # in - seldom used
When XMLin() parses elements which have text content as well as attributes, the text content must be represented as a hash value rather than a simple scalar. This option allows you to force text content to always parse to a hash value even when there are no attributes.

 

GroupTags => { grouping tag => grouped tag } # in+out - handy

剔除Perl结构中的额外级别的层次

 

Handler => object_ref # out - SAX only

Use the 'Handler' option to have XMLout() generate SAX events rather than returning a string of XML. For more details see "SAX SUPPORT" below.

 

 

KeepRoot => 1 # in+out - handy

In its attempt to return a data structure free of superfluous detail and unnecessary levels of indirection, XMLin() normally discards the root element name. Setting the 'KeepRoot' option to '1' will cause the root element name to be retained. So after executing this code:

 

KeyAttr => [ list ] # in+out - important

将数组嵌套的元素转成hash。

 

KeyAttr => { list } # in+out - important

This alternative (and preferred) method of specifiying the key attributes allows more fine grained control over which elements are folded and on which attributes. For example the option 'KeyAttr => { package => 'id' } will cause any package elements to be folded on the 'id' attribute. No other elements which have an 'id' attribute will be folded at all.

The '+' indicates that the value of the key attribute should be copied rather than moved to the folded hash key.

 

NoAttr => 1 # in+out - handy

When used with XMLout(), the generated XML will contain no attributes. All hash key/values will be represented as nested elements instead.

When used with XMLin(), any attributes in the XML will be ignored.

NoEscape => 1 # out - seldom used

By default, XMLout() will translate the characters '<', '>', '&' and '"' to '&lt;', '&gt;', '&amp;' and '&quot' respectively. Use this option to suppress escaping (presumably because you've already escaped the data in some more sophisticated manner).

NoIndent => 1 # out - seldom used

Set this option to 1 to disable XMLout()'s default 'pretty printing' mode. With this option enabled, the XML output will all be on one line (unless there are newlines in the data) - this may be easier for downstream processing.

NoSort => 1 # out - seldom used

Newer versions of XML::Simple sort elements and attributes alphabetically (*), by default. Enable this option to suppress the sorting - possibly for backwards compatibility.

* Actually, sorting is alphabetical but 'key' attribute or element names (as in 'KeyAttr') sort first. Also, when a hash of hashes is 'unfolded', the elements are sorted alphabetically by the value of the key field.

NormaliseSpace => 0 | 1 | 2 # in - handy

This option controls how whitespace in text content is handled. Recognised values for the option are:
?0 = (default) whitespace is passed through unaltered (except of course for the normalisation of whitespace in attribute values which is mandated by the XML recommendation)
?1 = whitespace is normalised in any value used as a hash key (normalising means removing leading and trailing whitespace and collapsing sequences of whitespace characters to a single space)
?2 = whitespace is normalised in all text content

Note: you can spell this option with a 'z' if that is more natural for you.

NSExpand => 1 # in+out handy - SAX only

This option controls namespace expansion - the translation of element and attribute names of the form 'prefix:name' to '{uri}name'. For example the element name 'xsl:template' might be expanded to: '{http://www.w3.org/1999/XSL/Transform}template'.

By default, XMLin() will return element names and attribute names exactly as they appear in the XML. Setting this option to 1 will cause all element and attribute names to be expanded to include their namespace prefix.

Note: You must be using a SAX parser for this option to work (ie: it does not work with XML::Parser).

This option also controls whether XMLout() performs the reverse translation from '{uri}name' back to 'prefix:name'. The default is no translation. If your data contains expanded names, you should set this option to 1 otherwise XMLout will emit XML which is not well formed.

Note: You must have the XML::NamespaceSupport module installed if you want XMLout() to translate URIs back to prefixes.

NumericEscape => 0 | 1 | 2 # out - handy

Use this option to have 'high' (non-ASCII) characters in your Perl data structure converted to numeric entities (eg: &#8364;) in the XML output. Three levels are possible:

0 - default: no numeric escaping (OK if you're writing out UTF8)

1 - only characters above 0xFF are escaped (ie: characters in the 0x80-FF range are not escaped), possibly useful with ISO8859-1 output

2 - all characters above 0x7F are escaped (good for plain ASCII output)

OutputFile => <file specifier> # out - handy

The default behaviour of XMLout() is to return the XML as a string. If you wish to write the XML to a file, simply supply the filename using the 'OutputFile' option.

ParserOpts => [ XML::Parser Options ] # in - don't use this

Note: This option is now officially deprecated. If you find it useful, email the author with an example of what you use it for. Do not use this option to set the ProtocolEncoding, that's just plain wrong - fix the XML.

This option allows you to pass parameters to the constructor of the underlying XML::Parser object (which of course assumes you're not using SAX).

 

RootName => 'string' # out - handy

缺省xml的root元素是opt。这个选项可以进行指定。

 

SearchPath => [ list ] # in - handy

If you pass XMLin() a filename, but the filename include no directory component, you can use this option to specify which directories should be searched to locate the file. You might use this option to search first in the user's home directory, then in a global directory such as /etc.

If a filename is provided to XMLin() but SearchPath is not defined, the file is assumed to be in the current directory.

If the first parameter to XMLin() is undefined, the default SearchPath will contain only the directory in which the script itself is located. Otherwise the default SearchPath will be empty.

StrictMode => 1 | 0 # in+out seldom used

This option allows you to turn "STRICT MODE" on or off for a particular call, regardless of whether it was enabled at the time XML::Simple was loaded.

SuppressEmpty => 1 | '' | undef # in+out - handy

This option controls what XMLin() should do with empty elements (no attributes and no content). The default behaviour is to represent them as empty hashes. Setting this option to a true value (eg: 1) will cause empty elements to be skipped altogether. Setting the option to 'undef' or the empty string will cause empty elements to be represented as the undefined value or the empty string respectively. The latter two alternatives are a little easier to test for in your code than a hash with no keys.

The option also controls what XMLout() does with undefined values. Setting the option to undef causes undefined values to be output as empty elements (rather than empty attributes), it also suppresses the generation of warnings about undefined values. Setting the option to a true value (eg: 1) causes undefined values to be skipped altogether on output.

ValueAttr => [ names ] # in - handy

Use this option to deal elements which always have a single attribute and no content. Eg:

ValueAttr => { element => attribute, ... } # in+out - handy

This (preferred) form of the ValueAttr option requires you to specify both the element and the attribute names. This is not only safer, it also allows the original XML to be reconstructed by XMLout().

Note: You probably don't want to use this option and the NoAttr option at the same time.

Variables => { name => value } # in - handy

This option allows variables in the XML to be expanded when the file is read. (there is no facility for putting the variable names back if you regenerate XML using XMLout).

A 'variable' is any text of the form ${name} which occurs in an attribute value or in the text content of an element. If 'name' matches a key in the supplied hashref, ${name} will be replaced with the corresponding value from the hashref. If no matching key is found, the variable will not be replaced. Names must match the regex: [\w.]+ (ie: only 'word' characters and dots are allowed).

VarAttr => 'attr_name' # in - handy

In addition to the variables defined using Variables, this option allows variables to be defined in the XML. A variable definition consists of an element with an attribute called 'attr_name' (the value of the VarAttr option). The value of the attribute will be used as the variable name and the text content of the element will be used as the value. A variable defined in this way will override a variable defined using the Variables option

 

XMLDecl => 1 or XMLDecl => 'string' # out - handy

产生xml的声明

posted @ 2013-02-23 16:31  新闻官  阅读(810)  评论(0编辑  收藏  举报