使用REXML在ruby中处理xml

REXML简要说明
1、解析XML文件
require "rexml/document"
file = File.new( "mydoc.xml" )
doc = REXML::Document.new file

2、解析XML字符串
require "rexml/document"
include REXML # so that we don't have to prefix everything with REXML::...
string = <<EOF
<mydoc>
<someelement attribute="nanoo">Text, text, text</someelement>
</mydoc>
EOF
doc = Document.new string

有了Document之后，可以通过多种方式来访问其中的元素
○ Element 类有each_element_with_attributes方法，通常使用它来访问元素
○ Element.elements属性是一个Elements类的实例，可以通过Elements类的each和[]来访问其中的元素。这两个方法都支持使用XPath来进行过滤等操作，因此功能非常强大。
○ Element是Parent的子类，所以要访问元素的子节点，可以通过类似数组的方法，诸如Element[]、Element.each、Element.find、Element.delete等。这是访问一个确实是数组的子节点的最快方式，不支持XPath搜索，并且所有的子节点元素都在这个数组中，不只是Element的子节点。
★ 在REXML中的Element子节点的索引从1开始，而不是0。因为XPath就是从1开始进行计数的，REXML维持了这种关系。

3、使用XPath
# The invisibility cream is the first <item>
invisibility = XPath.first( doc, "//item" )
# Prints out all of the prices
XPath.each( doc, "//price") { |element| puts element.text }
# Gets an array of all of the "name" elements in the document.
names = XPath.match( doc, "//name" )

4、使用Element.elements.to_a()方法，也可以得到匹配解决的数组。
all_elements = doc.elements.to_a
all_children = doc.to_a
all_upc_strings = doc.elements.to_a( "//item/attribute::upc" )
all_name_elements = doc.elements.to_a( "//name" )

5、手动添加元素的方式创建XML文档

require "rexml/document"

doc = REXML::Document.new "<root/>"
root_node = doc.root
el = root_node.add_element "myel"
el2 = el.add_element "another", {"id"=>"10"}
# does the same, but also sets attribute "id" of el2 to "10"
el3 = REXML::Element.new "blah"
el.elements << el3
el3.attributes["myid"] = "sean"
puts doc.to_s

输出：
<root><myel><another id='10'/><blah myid='sean'/></myel></root>

6、为Element添加文本

el1 = Element.new "myelement"
el1.text = "Hello world!"
# -> <myelement>Hello world!</myelement>
el1.add_text "Hello dolly"
# -> <myelement>Hello world!Hello dolly</element>
el1.add Text.new("Goodbye")
# -> <myelement>Hello world!Hello dollyGoodbye</element>
el1 << Text.new(" cruel world")
# -> <myelement>Hello world!Hello dollyGoodbye cruel world</element>

注意，这些Text对象仍然分开储存的；el1.text返回"Hello world!", el1[2]返回内容为"Goodbye"的Text对象。

7、REXML所有文本节点中都是以UTF-8编码的，所有调用的代码都要注意这一点，在程序中，传递给REXML的字符串必须是经过UTF-8编码的。

REXML不可能总是正确猜测出你的文本的编码方式，所以它总是假定为UTF-8编码。同时，如果你试图添加其他编码方式的文本，REXML不会发出警告。添加者必须保证自己添加的是UTF-8的文本。如果添加标准的ASCII 7位编码，是没有关系的。如果使用ISO8859-1文本，必须在添加之前转换为UTF-8编码。可以使用text.unpack("C*").pack("U*")。变更编码进行输出，只有Document.write()和Document.to_s()支持。如果需要输出特定编码的节点，必须用Output把输出对象包装起来。

e = Element.new "<a/>"
e.text = "f\xfcr" # ISO-8859-1 '??'
o = ''
e.write( Output.new( o, "ISO-8859-1" ) )

可以向Output传递任何支持的编码。

8、插入元素
两种方式：标准的Ruby数组表示法

doc = Document.new "<a><one/><three/></a>"
doc.root[1,0] = Element.new "two"
# -> <a><one/><two/><three/></a>

调用Parent.insert_before 或 Parent.insert_after
three = doc.elements["a/three"]
doc.root.insert_after three, Element.new "four"
# -> <a><one/><two/><three/><four/></a>
# A convenience method allows you to insert before/after an XPath:
doc.root.insert_after( "//one", Element.new("one-five") )
# -> <a><one/><one-five/><two/><three/><four/></a>
# Another convenience method allows you to insert after/before an element:
four = doc.elements["//four"]
four.previous_sibling = Element.new("three-five")
# -> <a><one/><one-five/><two/><three/><three-five/><four/></a>

9、元素的迭代
除使用Element.each方法迭代全部子节点之外，还有其他四种主要的遍历方式。Element.elements.each,只对子元素进行遍历；Element.next_element和Element.previous_element，用作取得下一个Element兄弟节点；Element.next_sibling和Element.previous_sibling,用作取得下一个和上一个兄弟节点，不管其类型是什么。

posted on 2007-07-23 14:00 小熊bryan 阅读(3391) 评论(0) 收藏举报