python 解析xml 文件: DOM 方式

环境

python:3.4.4

准备xml文件

首先新建一个xml文件,countries.xml。内容是在python官网上看到的。

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

准备python文件

新建一个test_DOM.py,用来解析xml文件。

#!/usr/bin/python
# -*- coding: UTF-8 -*-

from xml.dom.minidom import parse
import xml.dom.minidom

DOMTree = xml.dom.minidom.parse("countries.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("data"):
    print ("Root element : %s" % collection.getAttribute("data"))

countries = collection.getElementsByTagName("country")

for country in countries:
    print ("*****Country*****")
    if country.hasAttribute("name"):
        print ("Name: %s" % country.getAttribute("name"))
    rank = country.getElementsByTagName('rank')[0]
    print ("Rank: %s" % rank.childNodes[0].data)
    year = country.getElementsByTagName('year')[0]
    print ("Year: %s" % year.childNodes[0].data)
    gdppc = country.getElementsByTagName('gdppc')[0]
    print ("Gdppc: %s" % gdppc.childNodes[0].data)
    neighbors = country.getElementsByTagName('neighbor')
    for neighbor in neighbors:
        print ("Neighbor:", neighbor.getAttribute("name"),neighbor.getAttribute("direction"))

执行结果

>python test_DOM.py
*****Country*****
Name: Liechtenstein
Rank: 1
Year: 2008
Gdppc: 141100
Neighbor: Austria E
Neighbor: Switzerland W
*****Country*****
Name: Singapore
Rank: 4
Year: 2011
Gdppc: 59900
Neighbor: Malaysia N
*****Country*****
Name: Panama
Rank: 68
Year: 2011
Gdppc: 13600
Neighbor: Costa Rica W
Neighbor: Colombia E

备注

DOM(Document Object Model)

DOM是一个W3C的跨语言的API,用来读取和更改 XML 文档。

一个DOM解析器在解析一个XML文档时,一次性读取整个文档,把文档中的所有元素保存在内存中的一个树结构中,之后可以对这个树结构进行读取或修改,也可以把修改过的树结构写入xml文件。

参见: https://docs.python.org/2/library/xml.dom.html

 

DOMTree = xml.dom.minidom.parse("countries.xml")

使用 xml.dom.minidom解析器打开 countries.xml 文件,并返回一个 Document对象,也就是树结构。Document 对象代表了整个 XML 文档,包括它的元素、属性、处理指令、备注等。

参见: https://docs.python.org/2/library/xml.dom.minidom.html

Return a Document from the given input. filename_or_file may be either a file name, or a file-like object. parser, if given, must be a SAX2 parser object. This function will change the document handler of the parser and activate namespace support; other parser configuration (like setting an entity resolver) must have been done in advance.

 

collection = DOMTree.documentElement

返回 DOMTree的根元素。

 

Document.documentElement
The one and only root element of the document.

 

rank = country.getElementsByTagName('rank')[0]

从country往下寻找所有 tag名为“rank”的元素节点,将找到的第一个节点赋值给 rank。

Document.getElementsByTagName(tagName)
Search for all descendants (direct children, children’s children, etc.) with a particular element type name.

 

collection.getAttribute("data")

获取并返回 collection 的“data”属性值。如果collection没有“data”属性,则返回一个空的字符串。

Element.getAttribute(name)
Return the value of the attribute named by name as a string. If no such attribute exists, an empty string is returned, as if the attribute had no value.
posted @ 2015-12-31 16:03  微微微笑  阅读(2691)  评论(0编辑  收藏  举报