Python—使用xm.dom解析xml文件

什么是DOM？

文件对象模型（Document Object Model，简称DOM），是W3C组织推荐的处理可扩展置标语言的标准编程接口。

一个 DOM 的解析器在解析一个 XML 文档时，一次性读取整个文档，把文档中所有元素保存在内存中的一个树结构里，之后你可以利用DOM 提供的不同的函数来读取或修改文档的内容和结构，也可以把修改过的内容写入xml文件。

优点：操作简单，容易理解

缺点：因DOM需要将XML数据映射到内存中的树，一是比较慢，二是比较耗内存

movies.xml:需要解析的xml文件如下：

<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
<movie title="Trigun">
   <type>Anime, Action</type>
   <format>DVD</format>
   <episodes>4</episodes>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
   <type>Comedy</type>
   <format>VHS</format>
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>
</collection>

xmltest.py:解析movies.xml文件的python代码如下：

# -*- coding:UTF-8 -*-

'''
Created on 2015年9月10日

@author: xiaowenhui
'''

from xml.dom.minidom import parse
import xml.dom.minidom


#第一种方法，DOM解析

#使用minidom解析器打开xml文档
DOMTree  = xml.dom.minidom.parse("movies.xml")
collection = DOMTree.documentElement

#在集合中获取所有电影
movies = collection.getElementsByTagName("movie")

#打印每部电影的详细信息
dict_movies = {}

for movie in movies:
    dict_movie = {}
    title = ""
    print "*****Movie*****"
    if movie.hasAttribute("title"): #具有属性
        print "Title:%s" % movie.getAttribute("title") #获取属性值
        title = movie.getAttribute("title")
           
    try:
        type = movie.getElementsByTagName("type")[0] 
        print "Type :%s" % type.childNodes[0].data
        dict_movie["type"] = type.childNodes[0].data
    
        format = movie.getElementsByTagName("format")[0] #获取该标签下的第一个子节点
        print "format:%s" % format.childNodes[0].data
        dict_movie["format"] = format.childNodes[0].data
    
        try:
            year = movie.getElementsByTagName("year")[0]
            print "year :%s" % year.childNodes[0].data  
            dict_movie["year"] = year.childNodes[0].data 
        except:
            pass
        
        try:
            episodes = movie.getElementsByTagName("episodes")[0]
            print "episodes:%s" % episodes.childNodes[0].data
            dict_movie["episodes"] = episodes.childNodes[0].data
        except:
            pass

        rating = movie.getElementsByTagName('rating')[0]
        print "Rating: %s" % rating.childNodes[0].data
        dict_movie["rating"] = rating.childNodes[0].data
    
        stars = movie.getElementsByTagName('stars')[0]
        print "stars: %s" % stars.childNodes[0].data
        dict_movie["stars"] = stars.childNodes[0].data
    
        description = movie.getElementsByTagName('description')[0]
        print "Description: %s" % description.childNodes[0].data
        dict_movie["description"] = description.childNodes[0].data
    except:
        print "error:" + title  + "\n"
        continue   
    
    dict_movies[title] = dict_movie

print dict_movies

解析后的输出结果如下：

*****Movie*****
Title:Enemy Behind
Type :War, Thriller
format:DVD
year :2003
Rating: PG
stars: 10
Description: Talk about a US-Japan war
*****Movie*****
Title:Transformers
Type :Anime, Science Fiction
format:DVD
year :1989
Rating: R
stars: 8
Description: A schientific fiction
*****Movie*****
Title:Trigun
Type :Anime, Action
format:DVD
episodes:4
Rating: PG
stars: 10
Description: Vash the Stampede!
*****Movie*****
Title:Ishtar
Type :Comedy
format:VHS
Rating: PG
stars: 2
Description: Viewable boredom
{u'Transformers': {'rating': u'R', 'description': u'A schientific fiction', 'format': u'DVD', 'stars': u'8', 'year': u'1989', 'type': u'Anime, Science Fiction'}, u'Ishtar': {'rating': u'PG', 'type': u'Comedy', 'description': u'Viewable boredom', 'stars': u'2', 'format': u'VHS'}, u'Enemy Behind': {'rating': u'PG', 'description': u'Talk about a US-Japan war', 'format': u'DVD', 'stars': u'10', 'year': u'2003', 'type': u'War, Thriller'}, u'Trigun': {'rating': u'PG', 'description': u'Vash the Stampede!', 'format': u'DVD', 'episodes': u'4', 'stars': u'10', 'type': u'Anime, Action'}}

posted on 2015-09-14 17:45 xiaowenhui 阅读(463) 评论(0) 编辑收藏举报

刷新页面返回顶部

xiaowenhui

Python—使用xm.dom解析xml文件

什么是DOM？

movies.xml:需要解析的xml文件如下：

xmltest.py:解析movies.xml文件的python代码如下：

解析后的输出结果如下：

导航

公告