python 解析xml 文件: DOM 方式
环境
python:3.4.4
准备xml文件
首先新建一个xml文件,countries.xml。内容是在python官网上看到的。
<?xml version="1.0"?> <data> <country name="Liechtenstein"> <rank>1</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> </country> <country name="Singapore"> <rank>4</rank> <year>2011</year> <gdppc>59900</gdppc> <neighbor name="Malaysia" direction="N"/> </country> <country name="Panama"> <rank>68</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/> <neighbor name="Colombia" direction="E"/> </country> </data>
准备python文件
新建一个test_DOM.py,用来解析xml文件。
#!/usr/bin/python # -*- coding: UTF-8 -*- from xml.dom.minidom import parse import xml.dom.minidom DOMTree = xml.dom.minidom.parse("countries.xml") collection = DOMTree.documentElement if collection.hasAttribute("data"): print ("Root element : %s" % collection.getAttribute("data")) countries = collection.getElementsByTagName("country") for country in countries: print ("*****Country*****") if country.hasAttribute("name"): print ("Name: %s" % country.getAttribute("name")) rank = country.getElementsByTagName('rank')[0] print ("Rank: %s" % rank.childNodes[0].data) year = country.getElementsByTagName('year')[0] print ("Year: %s" % year.childNodes[0].data) gdppc = country.getElementsByTagName('gdppc')[0] print ("Gdppc: %s" % gdppc.childNodes[0].data) neighbors = country.getElementsByTagName('neighbor') for neighbor in neighbors: print ("Neighbor:", neighbor.getAttribute("name"),neighbor.getAttribute("direction"))
执行结果
>python test_DOM.py *****Country***** Name: Liechtenstein Rank: 1 Year: 2008 Gdppc: 141100 Neighbor: Austria E Neighbor: Switzerland W *****Country***** Name: Singapore Rank: 4 Year: 2011 Gdppc: 59900 Neighbor: Malaysia N *****Country***** Name: Panama Rank: 68 Year: 2011 Gdppc: 13600 Neighbor: Costa Rica W Neighbor: Colombia E
备注
DOM(Document Object Model)
DOM是一个W3C的跨语言的API,用来读取和更改 XML 文档。
一个DOM解析器在解析一个XML文档时,一次性读取整个文档,把文档中的所有元素保存在内存中的一个树结构中,之后可以对这个树结构进行读取或修改,也可以把修改过的树结构写入xml文件。
参见: https://docs.python.org/2/library/xml.dom.html
DOMTree = xml.dom.minidom.parse("countries.xml")
使用 xml.dom.minidom解析器打开 countries.xml 文件,并返回一个 Document对象,也就是树结构。Document 对象代表了整个 XML 文档,包括它的元素、属性、处理指令、备注等。
参见: https://docs.python.org/2/library/xml.dom.minidom.html
Return a Document from the given input. filename_or_file may be either a file name, or a file-like object. parser, if given, must be a SAX2 parser object. This function will change the document handler of the parser and activate namespace support; other parser configuration (like setting an entity resolver) must have been done in advance.
collection = DOMTree.documentElement
返回 DOMTree的根元素。
Document.documentElement The one and only root element of the document.
rank = country.getElementsByTagName('rank')[0]
从country往下寻找所有 tag名为“rank”的元素节点,将找到的第一个节点赋值给 rank。
Document.getElementsByTagName(tagName) Search for all descendants (direct children, children’s children, etc.) with a particular element type name.
collection.getAttribute("data")
获取并返回 collection 的“data”属性值。如果collection没有“data”属性,则返回一个空的字符串。
Element.getAttribute(name) Return the value of the attribute named by name as a string. If no such attribute exists, an empty string is returned, as if the attribute had no value.
作者:微微微笑
本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接,否则保留追究法律责任的权利.