python基础之读取xml

python怎么操作xml文件详细介绍链接:https://www.jb51.net/article/50812.htm

 

从结构上来说,xml很像常见的HTML超文本标记语言。不过超文本语言被设计用来显示数据,其焦点是数据的外观。xml被设计用来传输和存储数据,其焦点是数据的内容。

特征:

1. 标签对组成:<TEST></TEST>

2. 标签可以有属性<TEST Loop="1"></TEST>

3. 标签可以嵌入数据:<TEST>CPU</TEST>

4. 标签可以嵌入子标签(具有层级关系)

 

Python读取xml

import xml.dom.minidom

打开xml文件:xml.dom.minidom.parse()

每个节点都有nodeName, nodeValue, nodeType,nodeName为节点名字,nodeValue是节点的值,只对文本节点有效。catalog是ELEMENT_NODE类型

现在有以下几种:

'ATTRIBUTE_NODE'
'CDATA_SECTION_NODE'
'COMMENT_NODE'
'DOCUMENT_FRAGMENT_NODE'
'DOCUMENT_NODE'
'DOCUMENT_TYPE_NODE'
'ELEMENT_NODE'
'ENTITY_NODE'
'ENTITY_REFERENCE_NODE'
'NOTATION_NODE'
'PROCESSING_INSTRUCTION_NODE'
'TEXT_NODE'

 

举个例子,有这样一份xml:

abc.xml

<?xml version="1.0" encoding="utf-8"?>
<catalog>
    <maxid>4</maxid>
    <login username="pytest" passwd='123456'>
        <caption>Python</caption>
        <item id="4">
            <caption>测试</caption>
        </item>
    </login>
    <item id="2">
        <caption>Zope</caption>
    </item>
</catalog>
View Code

 

读取根节点:

from xml.dom.minidom import parse


def read_xml_root_node(xml_path):
    dom = parse(xml_path)
    root = dom.documentElement
    return root


if __name__ == "__main__":
    root_node = read_xml_root_node("abc.xml")
    print(root_node.nodeName)
    print(root_node.nodeType)
View Code

 

输出结果:

catalog
1

 

为什么打印出来的类型是1呢,1代表什么呢。参考nodeType

 

获取子节点以及value:

from xml.dom.minidom import parse


def read_xml_root_node(xml_path):
    dom = parse(xml_path)
    root = dom.documentElement
    return root


def read_child_label(node, label_name):
    child = node.getElementsByTagName(label_name)
    return child


if __name__ == "__main__":
    root_node = read_xml_root_node("abc.xml")
    print(root_node.nodeName)
    print(root_node.nodeType)
    child_nodes = read_child_label(root_node, "maxid")
    for child_node in child_nodes:
        print(child_node.nodeName)
        print(child_node.nodeType)
        print(child_node.childNodes[0].nodeValue)
View Code

 

输出结果:

catalog
1
maxid
1
4

 

 

获取标签属性

from xml.dom.minidom import parse


def read_xml_root_node(xml_path):
    dom = parse(xml_path)
    root = dom.documentElement
    return root


def read_child_label(node, label_name):
    child = node.getElementsByTagName(label_name)
    return child


def read_attribute(node, attr_name):
    attribute = node.getAttribute(attr_name)
    return attribute


if __name__ == "__main__":
    root_node = read_xml_root_node("abc.xml")
    print(root_node.nodeName)
    print(root_node.nodeType)
    child_nodes_login = read_child_label(root_node, "login")
    for child_node in child_nodes_login:
        attr_username = read_attribute(child_node, "username")
        print(attr_username)
View Code

 

输出结果:

catalog
1
pytest

 

 

另一种模块读取xml的方法,可以遍历指定标签下的子标签

from xml.etree import ElementTree as ET


per = ET.parse("abc.xml")
p = per.findall("./login/item")

for opener in p:
    for child in opener.getchildren():
        print(child.tag, ":", child.text)


p = per.findall("./item")

for oneper in p:
    for child in oneper.getchildren():
        print(child.tag, ":", child.text)
View Code

 

输出结果:

caption : 测试
caption : Zope

 

 

 

 

posted @ 2019-12-04 14:55  o云淡风轻o  阅读(6070)  评论(0编辑  收藏  举报