Python中使用ElementTree解析xml
在Python中,ElementTree是我们常用的一个解析XML的模块
1.导入ElementTree模块
from xml.etree import ElementTree as ET
2.初始化一个ElementTree类。初始化ElementTree类常用两种方式:一种通过xml文件,一种通过字符串。
#通过xml文件初始化,test.xml是根文件夹的一个xml文件 myET=ET.parse("test.xml") #通过字符串初始化 xml="<xml><name>张三</name><age>21</age></xml>" myET=ET.XML(xml)
3.查找对象
getchildren()方法会返回根节点包含的所有子节点,返回类型为ElementTree列表
find(match)方法可以根据节点名称来寻找节点内容
print myET.getchildren()[0].text print myET.find("name").text
两行代码输出的结果都是 张三
4.添加子节点
通过append方法添加子节点
sexET=ET.XML("<sex>男</sex>") myET.append(sexET)
5.删除子节点
通过remove方法删除子节点
ageET=myET.find("age") myET.remove(ageET)
6.修改内容
#修改内容 myET.find("name").text="李四" #修改标签 myET.find("name").tag="person"
7.转换成字符串
tostring()方法可以将ElementTree对象转换成字符串
第一个参数是ElementTree对象,第二个参数是编码方式,可以缺省
ET.tostring(myET,"utf-8")
ElementTree其他方法或属性:
- tag
- A string identifying what kind of data this element represents (the element type, in other words).
- text
- The text attribute can be used to hold additional data associated with the element. As the name implies this attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found between the element tags.
- tail
- The tail attribute can be used to hold additional data associated with the element. This attribute is usually a string but may be any application-specific object. If the element is created from an XML file the attribute will contain any text found after the element’s end tag and before the next tag.
- attrib
- A dictionary containing the element’s attributes. Note that while the attrib value is always a real mutable Python dictionary, an ElementTree implementation may choose to use another internal representation, and create the dictionary only if someone asks for it. To take advantage of such implementations, use the dictionary methods below whenever possible.
The following dictionary-like methods work on the element attributes.
- clear()
- Resets an element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to None.
- get(key,default=None)
-
Gets the element attribute named key.
Returns the attribute value, or default if the attribute was not found.
- items()
- Returns the element attributes as a sequence of (name, value) pairs. The attributes are returned in an arbitrary order.
- keys()
- Returns the elements attribute names as a list. The names are returned in an arbitrary order.
- set(key,value)
- Set the attribute key on the element to value.
The following methods work on the element’s children (subelements).
- append(subelement)
- Adds the element subelement to the end of this elements internal list of subelements.
- extend(subelements)
-
Appends subelements from a sequence object with zero or more elements. RaisesAssertionError if a subelement is not a valid object.
New in version 2.7.
- find(match)
- Finds the first subelement matching match. match may be a tag name or path. Returns an element instance orNone.
- findall(match)
- Finds all matching subelements, by tag name or path. Returns a list containing all matching elements in document order.
- findtext(match,default=None)
- Finds text for the first subelement matching match. match may be a tag name or path. Returns the text content of the first matching element, ordefault if no element was found. Note that if the matching element has no text content an empty string is returned.
- getchildren()
-
Deprecated since version 2.7:Uselist(elem) or iteration.
- getiterator(tag=None)
-
Deprecated since version 2.7:Use methodElement.iter() instead.
- insert(index,element)
- Inserts a subelement at the given position in this element.
- iter(tag=None)
- Creates a tree iterator with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order. If tag is not None or '*', only elements whose tag equals tag are returned from the iterator. If the tree structure is modified during iteration, the result is undefined.
- iterfind(match)
-
Finds all matching subelements, by tag name or path. Returns an iterable yielding all matching elements in document order.
New in version 2.7.
- itertext()
-
Creates a text iterator. The iterator loops over this element and all subelements, in document order, and returns all inner text.
New in version 2.7.
- makeelement(tag,attrib)
- Creates a new element object of the same type as this element. Do not call this method, use theSubElement() factory function instead.
- remove(subelement)
- Removes subelement from the element. Unlike the find* methods this method compares elements based on the instance identity, not on tag value or contents.
参考:http://www.cnblogs.com/ifantastic/archive/2013/04/12/3017110.html