python 解析xml文件
https://www.cnblogs.com/handsome1013/p/10058838.html
ET.Parser 用法
https://www.cnblogs.com/yezuhui/p/6853323.html
https://blog.csdn.net/gz153016/article/details/90216737
Python3 xml解析模块xml.etree.ElementTree简介
删除重复xml节点
https://blog.csdn.net/u014203484/article/details/74332815
import xml.etree.ElementTree as ET----------导入xml模块
root = ET.parse('GHO.xml')------------------分析指定xml文件
tree = root.getroot()-----------------------获取第一标签
data = tree.find('Data')--------------------查找第一标签中'Data'标签
for obs in data:----------------------------历遍'Data'中的所有标签
for item in obs:------------------------历遍'Data'中的'obs'标签下的所有标签
key = item.attrib()-----------------提取key值参数
print(list(key))--------------------输出key值
如何读取属性及节点内容。
怎样将data中的 id,name及其值取出来?
问题解释
两种方式:
1.先取得node
String strID = node.getAttributes().getNamedItem("id").getNodeValue();
String strName = node.getAttributes().getNamedItem("name").getNodeValue();
2.先取得element
String strID = element.getAttribute("id");
String strName = element.getAttribute("name");
小练习
#!/usr/bin/env python import sys import xml.etree.ElementTree as ET tree = ET.parse('abcdefg.xml') root = tree.getroot() iter_elem = root.findall('.//*') print(len(iter_elem)) #elem = root.find('') #print iter_elem for element in iter_elem: if element is None: continue if element.text is None: continue print("hello") context=[] src_elem = element.find("source") if src_elem is None: continue context.append(src_elem.text) print( "attri :%s"%src_elem.attrib) print("tag :%s"%src_elem.tag) #for item in src_elem: # key = item.text() # print list(key)
del duplicatd node:
import xml.etree.ElementTree as ET path = 'in.xml' tree = ET.parse(path) root = tree.getroot() prev = None def elements_equal(e1, e2): if type(e1) != type(e2): return False if e1.tag != e1.tag: return False if e1.text != e2.text: return False if e1.tail != e2.tail: return False if e1.attrib != e2.attrib: return False if len(e1) != len(e2): return False return all([elements_equal(c1, c2) for c1, c2 in zip(e1, e2)]) for page in root: # iterate over pages elems_to_remove = [] for elem in page: if elements_equal(elem, prev): print("found duplicate: %s" % elem.text) # equal function works well elems_to_remove.append(elem) continue prev = elem for elem_to_remove in elems_to_remove: page.remove(elem_to_remove) tree.write("out.xml")
RapidXml库的使用博客文章推荐:
https://blog.csdn.net/wqvbjhc/article/details/7662931
https://www.cnblogs.com/kanego/articles/2247602.html
http://blog.csdn.net/wqvbjhc/article/details/7662931
http://www.oschina.net/question/873634_81784
http://www.cnblogs.com/kanego/articles/2247602.html
http://blog.sina.com.cn/s/blog_a459dcf501019393.html