Python—使用xm.dom解析xml文件
什么是DOM?
文件对象模型(Document Object Model,简称DOM),是W3C组织推荐的处理可扩展置标语言的标准编程接口。
一个 DOM 的解析器在解析一个 XML 文档时,一次性读取整个文档,把文档中所有元素保存在内存中的一个树结构里,之后你可以利用DOM 提供的不同的函数来读取或修改文档的内容和结构,也可以把修改过的内容写入xml文件。
优点:操作简单,容易理解
缺点:因DOM需要将XML数据映射到内存中的树,一是比较慢,二是比较耗内存
movies.xml:需要解析的xml文件如下:
<collection shelf="New Arrivals"> <movie title="Enemy Behind"> <type>War, Thriller</type> <format>DVD</format> <year>2003</year> <rating>PG</rating> <stars>10</stars> <description>Talk about a US-Japan war</description> </movie> <movie title="Transformers"> <type>Anime, Science Fiction</type> <format>DVD</format> <year>1989</year> <rating>R</rating> <stars>8</stars> <description>A schientific fiction</description> </movie> <movie title="Trigun"> <type>Anime, Action</type> <format>DVD</format> <episodes>4</episodes> <rating>PG</rating> <stars>10</stars> <description>Vash the Stampede!</description> </movie> <movie title="Ishtar"> <type>Comedy</type> <format>VHS</format> <rating>PG</rating> <stars>2</stars> <description>Viewable boredom</description> </movie> </collection>
xmltest.py:解析movies.xml文件的python代码如下:
# -*- coding:UTF-8 -*- ''' Created on 2015年9月10日 @author: xiaowenhui ''' from xml.dom.minidom import parse import xml.dom.minidom #第一种方法,DOM解析 #使用minidom解析器打开xml文档 DOMTree = xml.dom.minidom.parse("movies.xml") collection = DOMTree.documentElement #在集合中获取所有电影 movies = collection.getElementsByTagName("movie") #打印每部电影的详细信息 dict_movies = {} for movie in movies: dict_movie = {} title = "" print "*****Movie*****" if movie.hasAttribute("title"): #具有属性 print "Title:%s" % movie.getAttribute("title") #获取属性值 title = movie.getAttribute("title") try: type = movie.getElementsByTagName("type")[0] print "Type :%s" % type.childNodes[0].data dict_movie["type"] = type.childNodes[0].data format = movie.getElementsByTagName("format")[0] #获取该标签下的第一个子节点 print "format:%s" % format.childNodes[0].data dict_movie["format"] = format.childNodes[0].data try: year = movie.getElementsByTagName("year")[0] print "year :%s" % year.childNodes[0].data dict_movie["year"] = year.childNodes[0].data except: pass try: episodes = movie.getElementsByTagName("episodes")[0] print "episodes:%s" % episodes.childNodes[0].data dict_movie["episodes"] = episodes.childNodes[0].data except: pass rating = movie.getElementsByTagName('rating')[0] print "Rating: %s" % rating.childNodes[0].data dict_movie["rating"] = rating.childNodes[0].data stars = movie.getElementsByTagName('stars')[0] print "stars: %s" % stars.childNodes[0].data dict_movie["stars"] = stars.childNodes[0].data description = movie.getElementsByTagName('description')[0] print "Description: %s" % description.childNodes[0].data dict_movie["description"] = description.childNodes[0].data except: print "error:" + title + "\n" continue dict_movies[title] = dict_movie print dict_movies
解析后的输出结果如下:
*****Movie***** Title:Enemy Behind Type :War, Thriller format:DVD year :2003 Rating: PG stars: 10 Description: Talk about a US-Japan war *****Movie***** Title:Transformers Type :Anime, Science Fiction format:DVD year :1989 Rating: R stars: 8 Description: A schientific fiction *****Movie***** Title:Trigun Type :Anime, Action format:DVD episodes:4 Rating: PG stars: 10 Description: Vash the Stampede! *****Movie***** Title:Ishtar Type :Comedy format:VHS Rating: PG stars: 2 Description: Viewable boredom {u'Transformers': {'rating': u'R', 'description': u'A schientific fiction', 'format': u'DVD', 'stars': u'8', 'year': u'1989', 'type': u'Anime, Science Fiction'}, u'Ishtar': {'rating': u'PG', 'type': u'Comedy', 'description': u'Viewable boredom', 'stars': u'2', 'format': u'VHS'}, u'Enemy Behind': {'rating': u'PG', 'description': u'Talk about a US-Japan war', 'format': u'DVD', 'stars': u'10', 'year': u'2003', 'type': u'War, Thriller'}, u'Trigun': {'rating': u'PG', 'description': u'Vash the Stampede!', 'format': u'DVD', 'episodes': u'4', 'stars': u'10', 'type': u'Anime, Action'}}
posted on 2015-09-14 17:45 xiaowenhui 阅读(463) 评论(0) 编辑 收藏 举报