python基础之读取xml
python怎么操作xml文件详细介绍链接:https://www.jb51.net/article/50812.htm
从结构上来说,xml很像常见的HTML超文本标记语言。不过超文本语言被设计用来显示数据,其焦点是数据的外观。xml被设计用来传输和存储数据,其焦点是数据的内容。
特征:
1. 标签对组成:<TEST></TEST>
2. 标签可以有属性<TEST Loop="1"></TEST>
3. 标签可以嵌入数据:<TEST>CPU</TEST>
4. 标签可以嵌入子标签(具有层级关系)
Python读取xml
import xml.dom.minidom
打开xml文件:xml.dom.minidom.parse()
每个节点都有nodeName, nodeValue, nodeType,nodeName为节点名字,nodeValue是节点的值,只对文本节点有效。catalog是ELEMENT_NODE类型
现在有以下几种:
'ATTRIBUTE_NODE'
'CDATA_SECTION_NODE'
'COMMENT_NODE'
'DOCUMENT_FRAGMENT_NODE'
'DOCUMENT_NODE'
'DOCUMENT_TYPE_NODE'
'ELEMENT_NODE'
'ENTITY_NODE'
'ENTITY_REFERENCE_NODE'
'NOTATION_NODE'
'PROCESSING_INSTRUCTION_NODE'
'TEXT_NODE'
举个例子,有这样一份xml:
abc.xml
<?xml version="1.0" encoding="utf-8"?> <catalog> <maxid>4</maxid> <login username="pytest" passwd='123456'> <caption>Python</caption> <item id="4"> <caption>测试</caption> </item> </login> <item id="2"> <caption>Zope</caption> </item> </catalog>
读取根节点:
from xml.dom.minidom import parse def read_xml_root_node(xml_path): dom = parse(xml_path) root = dom.documentElement return root if __name__ == "__main__": root_node = read_xml_root_node("abc.xml") print(root_node.nodeName) print(root_node.nodeType)
输出结果:
catalog
1
为什么打印出来的类型是1呢,1代表什么呢。参考nodeType。
获取子节点以及value:
from xml.dom.minidom import parse def read_xml_root_node(xml_path): dom = parse(xml_path) root = dom.documentElement return root def read_child_label(node, label_name): child = node.getElementsByTagName(label_name) return child if __name__ == "__main__": root_node = read_xml_root_node("abc.xml") print(root_node.nodeName) print(root_node.nodeType) child_nodes = read_child_label(root_node, "maxid") for child_node in child_nodes: print(child_node.nodeName) print(child_node.nodeType) print(child_node.childNodes[0].nodeValue)
输出结果:
catalog 1 maxid 1 4
获取标签属性
from xml.dom.minidom import parse def read_xml_root_node(xml_path): dom = parse(xml_path) root = dom.documentElement return root def read_child_label(node, label_name): child = node.getElementsByTagName(label_name) return child def read_attribute(node, attr_name): attribute = node.getAttribute(attr_name) return attribute if __name__ == "__main__": root_node = read_xml_root_node("abc.xml") print(root_node.nodeName) print(root_node.nodeType) child_nodes_login = read_child_label(root_node, "login") for child_node in child_nodes_login: attr_username = read_attribute(child_node, "username") print(attr_username)
输出结果:
catalog 1 pytest
另一种模块读取xml的方法,可以遍历指定标签下的子标签
from xml.etree import ElementTree as ET per = ET.parse("abc.xml") p = per.findall("./login/item") for opener in p: for child in opener.getchildren(): print(child.tag, ":", child.text) p = per.findall("./item") for oneper in p: for child in oneper.getchildren(): print(child.tag, ":", child.text)
输出结果:
caption : 测试
caption : Zope