python中的lxml模块

Python中自带了XML的模块，但是性能不太好，相比之下，LXML增加了很多实用的功能。

lxml中主要有两部分，

1) etree，主要可以用来解析XML字符串，

　　内部有两个对象，etree._ElementTree和etree._Element

　etree.Element对象中包含的属性和方法：

属性：1)tag，返回该节点的名称：

　　　　　　print 'root.tag' 输出tag

　　　2)text，设置该节点的文本：

　　　　　　root.text = 'hello world' 输出<root>hello world</root>

　　　3)tail，在标签后边追加文本：

　　　　　　root.tail = 'hym' 输出<root>hello world</root>hym

方法：1)Element(string)，创建一个Element对象：

　　　　　　root = etree.Element('root') 返回一个XML的节点，名称root

　　　　　　root = etree.Element('root', interesting='totally') 返回一个root节点，属性interesting = 'totally'

　　　2) set(name,value)，为已有的节点，添加属性，

　　　　　　root.set('hello', 'huhu') 增加一个属性hello = 'huhu'

　　 3) get(string)，返回属性值，

　　　　　　root.get('intersting') 返回‘totally’

　　 4) keys()，返回所有的属性名，

　　　　　　root.keys() 返回interesting，hello

　　 5) items()，返回字典，其中包含所有的属性，及其 value

　　　　　　for name,value in sorted(root.items()) 返回两对属性

　　 6) 为该节点，添加子节点，

　　　　　　child1 = etree.SubElement(root, 'child')

　　 7) 为该节点，删除子节点，

　　　　　　root.remove(child1)

　　 8) getparent()，拿到父节点

　　　　　　child1.getparent().tag 返回root

etree允许，节点内部的子节点，认为是一个list，

　　print "len(root)" 返回root节点及其子节点的个数；

　　root.index(child2) 返回child2的索引值

　　child = root[0] 返回child1，允许索引访问

　　for child in root: 允许遍历

　　　　...

　　root.insert(0, etree.Element('child0')) 允许插入

　　root.append(etree.Element('child4')) 允许append

etree._Element对象是一颗xml的树，内部包含很多element的对象，

　　1)root.getroottree()，返回一个节点对应的树，root表示当前节点的tag，返回的是Tree类型的对象

　　2)getroot()，返回根节点，返回的是Element类型的对象

　　3)etree.ELementTree(root)，从一个节点构建一颗tree，该节点，也就是根节点，

etree，Element和Tree类型的对象，都支持xpath的方法：

　　foo.xpath('//root')[0].tag

2) html，主要用来解析html，

　　etree.html(HTML)来解析html，并得到Element对象，也可以调用xpath来分析xml

xpath，可以实现节点和属性的快速查找：

　　xpath('//div[@attr = value]/text()') 返回该div节点，满足attr属性要求的，节点的文本

　　xpath('//div[@attr]/@attr')　　返回该div节点，含有attr属性，的值

example：

import xml.etree.ElementTree as ET

　　　　<name>python</name>

　　　　......

tree = ET.ElementTree(file='./../text.xml')

root = tree.getroot();

for child in root:

　　print(child.tag, child.attrib, child.text)

posted @ 2018-01-10 10:21 _9_8 阅读(1227) 评论(0) 编辑收藏举报

刷新页面返回顶部

_9_8

python中的lxml模块

公告