在python开发[第九篇],我们已经在request模块中,讲解了如何根据url去获取网页内容。

如果返回的是内容的格式python的基本数据类型,可以json将返回的字符串转为python的基本数据类型。但是大多数情况下,我们通过http协议请求一个url后,返回的却是?xml格式。基于这种常见的报文格式,python对其进行提供了相应模块,如下:

一、xml模块

XML是实现不同语言或程序之间进行数据交换的协议,XML文件格式如下:

<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2026</year>
        <gdppc>59900</gdppc>
        <neighbor direction="N" name="Malaysia" />
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>
first.xml

下面我们先试着请求一个地址,来查看其返回的结果:

#检查QQ是否在线的
import
requests r = requests.get("http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=1223995142") result = r.text print(result) """ 执行结果: <?xml version="1.0" encoding="utf-8"?> <string xmlns="http://WebXml.com.cn/">Y</string> """

由上面可以看出,请求后返回的是xml格式的。为了能便捷的处理xml内容,python提供了内置模块模块xml模块,在python的Lib目录下。

1.xml报文的解析

第一种方法:
from xml.etree import ElementTree as ET  #在poython的Lib目录中有xml/etree目录。并在该目录下导入ElementTree.py文件,并重命名为ET

# 打开文件,读取XML内容
str_xml = open('first.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)  #调用ElementTree文件中的XML()方法
print(root,type(root))  #执行结果:<Element 'data' at 0x00702BA0> <class 'xml.etree.ElementTree.Element'>
#可见,root是Element的类对象
第二种方法:
from xml.etree import ElementTree as ET

# 利用ElementTree.parse将文件直接解析成xml对象 。不需要打开文件
tree = ET.parse("first.xml")
print(tree,type(tree)) #执行结果:<xml.etree.ElementTree.ElementTree object at 0x0067FDB0> <class 'xml.etree.ElementTree.ElementTree'>
# 获取xml文件的根节点
root = tree.getroot()
#此时生成的tree是ElementTree的类对象
print(root,type(root)) #执行结果:<Element 'data' at 0x00502BA0> <class 'xml.etree.ElementTree.Element'>
#注意:1.在使用ElementTree类方法进行解析时,tree是ElementTree的对象。而root(根节点)却是Element的对象
    2.不管采用什么方法解析xml。解析后,所有的节点(不论根节点,还是子节点)都是Element对象
    3.ET.XML()是对字符串解析成xml。所以呢,如果是想解析xml文件,就需要先读取文件,获取到文件内容,然后再解析。
    4.ET。parse()是只能解析文件,不能将字符串解析为XML
class Element:
    """An XML element.

    This class is the reference implementation of the Element interface.

    An element's length is its number of subelements.  That means if you
    want to check if an element is truly empty, you should check BOTH
    its length AND its text attribute.

    The element tag, attribute names, and attribute values can be either
    bytes or strings.

    *tag* is the element name.  *attrib* is an optional dictionary containing
    element attributes. *extra* are additional element attributes given as
    keyword arguments.

    Example form:
        <tag attrib>text<child/>...</tag>tail

    """

    当前节点的标签名
    tag = None
    """The element's name."""

    当前节点的属性

    attrib = None
    """Dictionary of the element's attributes."""

    当前节点的内容
    text = None
    """
    Text before first subelement. This is either a string or the value None.
    Note that if there is no text, this attribute may be either
    None or the empty string, depending on the parser.

    """

    tail = None
    """
    Text after this element's end tag, but before the next sibling element's
    start tag.  This is either a string or the value None.  Note that if there
    was no text, this attribute may be either None or an empty string,
    depending on the parser.

    """

    def __init__(self, tag, attrib={}, **extra):
        if not isinstance(attrib, dict):
            raise TypeError("attrib must be dict, not %s" % (
                attrib.__class__.__name__,))
        attrib = attrib.copy()
        attrib.update(extra)
        self.tag = tag
        self.attrib = attrib
        self._children = []

    def __repr__(self):
        return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self))

    def makeelement(self, tag, attrib):
        创建一个新节点
        """Create a new element with the same type.

        *tag* is a string containing the element name.
        *attrib* is a dictionary containing the element attributes.

        Do not call this method, use the SubElement factory function instead.

        """
        return self.__class__(tag, attrib)

    def copy(self):
        """Return copy of current element.

        This creates a shallow copy. Subelements will be shared with the
        original tree.

        """
        elem = self.makeelement(self.tag, self.attrib)
        elem.text = self.text
        elem.tail = self.tail
        elem[:] = self
        return elem

    def __len__(self):
        return len(self._children)

    def __bool__(self):
        warnings.warn(
            "The behavior of this method will change in future versions.  "
            "Use specific 'len(elem)' or 'elem is not None' test instead.",
            FutureWarning, stacklevel=2
            )
        return len(self._children) != 0 # emulate old behaviour, for now

    def __getitem__(self, index):
        return self._children[index]

    def __setitem__(self, index, element):
        # if isinstance(index, slice):
        #     for elt in element:
        #         assert iselement(elt)
        # else:
        #     assert iselement(element)
        self._children[index] = element

    def __delitem__(self, index):
        del self._children[index]

    def append(self, subelement):
        为当前节点追加一个子节点
        """Add *subelement* to the end of this element.

        The new element will appear in document order after the last existing
        subelement (or directly after the text, if it's the first subelement),
        but before the end tag for this element.

        """
        self._assert_is_element(subelement)
        self._children.append(subelement)

    def extend(self, elements):
        为当前节点扩展 n 个子节点
        """Append subelements from a sequence.

        *elements* is a sequence with zero or more elements.

        """
        for element in elements:
            self._assert_is_element(element)
        self._children.extend(elements)

    def insert(self, index, subelement):
        在当前节点的子节点中插入某个节点,即:为当前节点创建子节点,然后插入指定位置
        """Insert *subelement* at position *index*."""
        self._assert_is_element(subelement)
        self._children.insert(index, subelement)

    def _assert_is_element(self, e):
        # Need to refer to the actual Python implementation, not the
        # shadowing C implementation.
        if not isinstance(e, _Element_Py):
            raise TypeError('expected an Element, not %s' % type(e).__name__)

    def remove(self, subelement):
        在当前节点在子节点中删除某个节点
        """Remove matching subelement.

        Unlike the find methods, this method compares elements based on
        identity, NOT ON tag value or contents.  To remove subelements by
        other means, the easiest way is to use a list comprehension to
        select what elements to keep, and then use slice assignment to update
        the parent element.

        ValueError is raised if a matching element could not be found.

        """
        # assert iselement(element)
        self._children.remove(subelement)

    def getchildren(self):
        获取所有的子节点(废弃)
        """(Deprecated) Return all subelements.

        Elements are returned in document order.

        """
        warnings.warn(
            "This method will be removed in future versions.  "
            "Use 'list(elem)' or iteration over elem instead.",
            DeprecationWarning, stacklevel=2
            )
        return self._children

    def find(self, path, namespaces=None):
        获取第一个寻找到的子节点
        """Find first matching element by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return the first matching element, or None if no element was found.

        """
        return ElementPath.find(self, path, namespaces)

    def findtext(self, path, default=None, namespaces=None):
        获取第一个寻找到的子节点的内容
        """Find text for first matching element by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *default* is the value to return if the element was not found,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return text content of first matching element, or default value if
        none was found.  Note that if an element is found having no text
        content, the empty string is returned.

        """
        return ElementPath.findtext(self, path, default, namespaces)

    def findall(self, path, namespaces=None):
        获取所有的子节点
        """Find all matching subelements by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Returns list containing all matching elements in document order.

        """
        return ElementPath.findall(self, path, namespaces)

    def iterfind(self, path, namespaces=None):
        获取当前节点下的指定的子节点(只能是子节点,孙节点都不行),并创建一个迭代器(可以被for循环)
        """Find all matching subelements by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return an iterable yielding all matching elements in document order.

        """
        return ElementPath.iterfind(self, path, namespaces)

    def clear(self):
        清空节点
        """Reset element.

        This function removes all subelements, clears all attributes, and sets
        the text and tail attributes to None.

        """
        self.attrib.clear()
        self._children = []
        self.text = self.tail = None

    def get(self, key, default=None):
        获取当前节点的属性值
        """Get element attribute.

        Equivalent to attrib.get, but some implementations may handle this a
        bit more efficiently.  *key* is what attribute to look for, and
        *default* is what to return if the attribute was not found.

        Returns a string containing the attribute value, or the default if
        attribute was not found.

        """
        return self.attrib.get(key, default)

    def set(self, key, value):
        为当前节点设置属性值
        """Set element attribute.

        Equivalent to attrib[key] = value, but some implementations may handle
        this a bit more efficiently.  *key* is what attribute to set, and
        *value* is the attribute value to set it to.

        """
        self.attrib[key] = value

    def keys(self):
        获取当前节点的所有属性的 key

        """Get list of attribute names.

        Names are returned in an arbitrary order, just like an ordinary
        Python dict.  Equivalent to attrib.keys()

        """
        return self.attrib.keys()

    def items(self):
        获取当前节点的所有属性值,每个属性都是一个键值对
        """Get element attributes as a sequence.

        The attributes are returned in arbitrary order.  Equivalent to
        attrib.items().

        Return a list of (name, value) tuples.

        """
        return self.attrib.items()

    def iter(self, tag=None):
        在当前节点的子孙中根据节点名称寻找所有指定的节点,并返回一个迭代器(可以被for循环)。
        """Create tree iterator.

        The iterator loops over the element and all subelements in document
        order, returning all elements with a matching tag.

        If the tree structure is modified during iteration, new or removed
        elements may or may not be included.  To get a stable set, use the
        list() function on the iterator, and loop over the resulting list.

        *tag* is what tags to look for (default is to return all elements)

        Return an iterator containing all the matching elements.

        """
        if tag == "*":
            tag = None
        if tag is None or self.tag == tag:
            yield self
        for e in self._children:
            yield from e.iter(tag)

    # compatibility
    def getiterator(self, tag=None):
        # Change for a DeprecationWarning in 1.4
        warnings.warn(
            "This method will be removed in future versions.  "
            "Use 'elem.iter()' or 'list(elem.iter())' instead.",
            PendingDeprecationWarning, stacklevel=2
        )
        return list(self.iter(tag))

    def itertext(self):
        在当前节点的子孙中根据节点名称寻找所有指定的节点的内容,并返回一个迭代器(可以被for循环)。
        """Create text iterator.

        The iterator loops over the element and all subelements in document
        order, returning all inner text.

        """
        tag = self.tag
        if not isinstance(tag, str) and tag is not None:
            return
        if self.text:
            yield self.text
        for e in self:
            yield from e.itertext()
            if e.tail:
                yield e.tail
Element类的所有方法
class ElementTree:
    """An XML element hierarchy.

    This class also provides support for serialization to and from
    standard XML.

    *element* is an optional root element node,
    *file* is an optional file handle or file name of an XML file whose
    contents will be used to initialize the tree with.

    """
    def __init__(self, element=None, file=None):
        # assert element is None or iselement(element)
        self._root = element # first node
        if file:
            self.parse(file)

    def getroot(self):
        """Return root element of this tree."""
        return self._root

    def _setroot(self, element):
        """Replace root element of this tree.

        This will discard the current contents of the tree and replace it
        with the given element.  Use with care!

        """
        # assert iselement(element)
        self._root = element

    def parse(self, source, parser=None):
        """Load external XML document into element tree.

        *source* is a file name or file object, *parser* is an optional parser
        instance that defaults to XMLParser.

        ParseError is raised if the parser fails to parse the document.

        Returns the root element of the given source document.

        """
        close_source = False
        if not hasattr(source, "read"):
            source = open(source, "rb")
            close_source = True
        try:
            if parser is None:
                # If no parser was specified, create a default XMLParser
                parser = XMLParser()
                if hasattr(parser, '_parse_whole'):
                    # The default XMLParser, when it comes from an accelerator,
                    # can define an internal _parse_whole API for efficiency.
                    # It can be used to parse the whole source without feeding
                    # it with chunks.
                    self._root = parser._parse_whole(source)
                    return self._root
            while True:
                data = source.read(65536)
                if not data:
                    break
                parser.feed(data)
            self._root = parser.close()
            return self._root
        finally:
            if close_source:
                source.close()

    def iter(self, tag=None):
        """Create and return tree iterator for the root element.

        The iterator loops over all elements in this tree, in document order.

        *tag* is a string with the tag name to iterate over
        (default is to return all elements).

        """
        # assert self._root is not None
        return self._root.iter(tag)

    # compatibility
    def getiterator(self, tag=None):
        # Change for a DeprecationWarning in 1.4
        warnings.warn(
            "This method will be removed in future versions.  "
            "Use 'tree.iter()' or 'list(tree.iter())' instead.",
            PendingDeprecationWarning, stacklevel=2
        )
        return list(self.iter(tag))

    def find(self, path, namespaces=None):
        """Find first matching element by tag name or path.

        Same as getroot().find(path), which is Element.find()

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return the first matching element, or None if no element was found.

        """
        # assert self._root is not None
        if path[:1] == "/":
            path = "." + path
            warnings.warn(
                "This search is broken in 1.3 and earlier, and will be "
                "fixed in a future version.  If you rely on the current "
                "behaviour, change it to %r" % path,
                FutureWarning, stacklevel=2
                )
        return self._root.find(path, namespaces)

    def findtext(self, path, default=None, namespaces=None):
        """Find first matching element by tag name or path.

        Same as getroot().findtext(path),  which is Element.findtext()

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return the first matching element, or None if no element was found.

        """
        # assert self._root is not None
        if path[:1] == "/":
            path = "." + path
            warnings.warn(
                "This search is broken in 1.3 and earlier, and will be "
                "fixed in a future version.  If you rely on the current "
                "behaviour, change it to %r" % path,
                FutureWarning, stacklevel=2
                )
        return self._root.findtext(path, default, namespaces)

    def findall(self, path, namespaces=None):
        """Find all matching subelements by tag name or path.

        Same as getroot().findall(path), which is Element.findall().

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return list containing all matching elements in document order.

        """
        # assert self._root is not None
        if path[:1] == "/":
            path = "." + path
            warnings.warn(
                "This search is broken in 1.3 and earlier, and will be "
                "fixed in a future version.  If you rely on the current "
                "behaviour, change it to %r" % path,
                FutureWarning, stacklevel=2
                )
        return self._root.findall(path, namespaces)

    def iterfind(self, path, namespaces=None):
        """Find all matching subelements by tag name or path.

        Same as getroot().iterfind(path), which is element.iterfind()

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return an iterable yielding all matching elements in document order.

        """
        # assert self._root is not None
        if path[:1] == "/":
            path = "." + path
            warnings.warn(
                "This search is broken in 1.3 and earlier, and will be "
                "fixed in a future version.  If you rely on the current "
                "behaviour, change it to %r" % path,
                FutureWarning, stacklevel=2
                )
        return self._root.iterfind(path, namespaces)

    def write(self, file_or_filename,
              encoding=None,
              xml_declaration=None,
              default_namespace=None,
              method=None, *,
              short_empty_elements=True):
        """Write element tree to a file as XML.

        Arguments:
          *file_or_filename* -- file name or a file object opened for writing

          *encoding* -- the output encoding (default: US-ASCII)

          *xml_declaration* -- bool indicating if an XML declaration should be
                               added to the output. If None, an XML declaration
                               is added if encoding IS NOT either of:
                               US-ASCII, UTF-8, or Unicode

          *default_namespace* -- sets the default XML namespace (for "xmlns")

          *method* -- either "xml" (default), "html, "text", or "c14n"

          *short_empty_elements* -- controls the formatting of elements
                                    that contain no content. If True (default)
                                    they are emitted as a single self-closed
                                    tag, otherwise they are emitted as a pair
                                    of start/end tags

        """
        if not method:
            method = "xml"
        elif method not in _serialize:
            raise ValueError("unknown method %r" % method)
        if not encoding:
            if method == "c14n":
                encoding = "utf-8"
            else:
                encoding = "us-ascii"
        enc_lower = encoding.lower()
        with _get_writer(file_or_filename, enc_lower) as write:
            if method == "xml" and (xml_declaration or
                    (xml_declaration is None and
                     enc_lower not in ("utf-8", "us-ascii", "unicode"))):
                declared_encoding = encoding
                if enc_lower == "unicode":
                    # Retrieve the default encoding for the xml declaration
                    import locale
                    declared_encoding = locale.getpreferredencoding()
                write("<?xml version='1.0' encoding='%s'?>\n" % (
                    declared_encoding,))
            if method == "text":
                _serialize_text(write, self._root)
            else:
                qnames, namespaces = _namespaces(self._root, default_namespace)
                serialize = _serialize[method]
                serialize(write, self._root, qnames, namespaces,
                          short_empty_elements=short_empty_elements)

    def write_c14n(self, file):
        # lxml.etree compatibility.  use output method instead
        return self.write(file, method="c14n")
ElementTree类的所有方法

2、遍历同级别的所有节点

由于 每个节点 都具有和根节点同样的方法,并且在上一步骤中解析时均得到了root(xml文件的根节点),so 可以利用以上方法进行操作xml文件。

from xml.etree import ElementTree as ET

############ 解析方式一 ############
"""
# 打开文件,读取XML内容
str_xml = open('first.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)
"""
############ 解析方式二 ############

# 直接解析xml文件
tree = ET.parse("first.xml")

# 获取xml文件的根节点
root = tree.getroot()

### 操作

# 顶层标签
print(root.tag)

# 遍历XML文档的第二层
for child in root:
    # 第二层节点的标签名称和标签属性
    print(child.tag, child.attrib)
    # 遍历XML文档的第三层
    for i in child:
        # 第二层节点的标签名称和内容
        print(i.tag,i.text)

3、仅遍历XML中指定的节点

from xml.etree import ElementTree as ET

############ 解析方式一 ############
"""
# 打开文件,读取XML内容
str_xml = open('first.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)
"""
############ 解析方式二 ############

# 直接解析xml文件
tree = ET.parse("first.xml")

# 获取xml文件的根节点
root = tree.getroot()

### 操作

# 顶层标签
print(root.tag)

#指定了遍历的节点:year 。那么就只会遍历标签为year的节点
for node in root.iter('year'):  
    # 节点的标签名称和内容
    print(node.tag, node.text)

4、修改节点内容:

  修改后的xml一定要保存,因为修改的动作只是发生在了内存中,并没有将xml进行重写

使用解析方式一:

from xml.etree import ElementTree as ET

############ 解析方式一 ############

# 打开文件,读取XML内容
str_xml = open('first.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)

############ 操作 ############

# 顶层标签
print(root.tag)

# 循环所有的year节点
for node in root.iter('year'):
    # 将year节点中的内容自增一
    new_year = int(node.text) + 1
    node.text = str(new_year)

    # 设置属性
    node.set('name', 'alex')
    node.set('age', '18')
    # 删除属性
    del node.attrib['name']

# 遍历data下的所有country节点
for country in root.findall('country'):
    # 获取每一个country节点下rank节点的内容
    rank = int(country.find('rank').text)

    if rank > 50:
        # 删除指定country节点
        root.remove(country)

############ 保存文件 ############
tree = ET.ElementTree(root)   #因为Element类方法中,没有保存文件这个功能,所以只能将类转为ElementTree来进行保存
tree.write("newnew.xml", encoding='utf-8')

使用解析方式二:

from xml.etree import ElementTree as ET

############ 解析方式二 ############

# 直接解析xml文件
tree = ET.parse("first.xml")

# 获取xml文件的根节点
root = tree.getroot()

############ 操作 ############

# 顶层标签
print(root.tag)

# 循环所有的year节点
for node in root.iter('year'):
    # 将year节点中的内容自增一
    new_year = int(node.text) + 1
    node.text = str(new_year)

    # 设置属性
    node.set('name', 'alex')
    node.set('age', '18')
    # 删除属性
    del node.attrib['name']

# 遍历data下的所有country节点
for country in root.findall('country'):
    # 获取每一个country节点下rank节点的内容
    rank = int(country.find('rank').text)

    if rank > 50:
        # 删除指定country节点
        root.remove(country)

############ 保存文件 ############
tree.write("newnew.xml", encoding='utf-8')

注意:这两者的区别就在于保存文件时不同 。

5.节点的创建

创建方式一,(推荐方式)

在python中一切皆对象,如"i=3"其本质就是i=int(3),所以i是int类的一个实例对象。同样在xml模块下,我们在本章的<1.xml报文的解析>中已经指出过,所有的节点都是Element类的对象。那么我们就可以直接son1=ET.Element('son', {'name': '儿1'})来创建节点。如下:

from xml.etree import ElementTree as ET


# 创建根节点
root = ET.Element("famliy")  


# 创建节点大儿子
son1 = ET.Element('son', {'name': '儿1'}) #节点标签是:son ,标签属性是:'name': '儿1'
# 创建小儿子
son2 = ET.Element('son', {"name": '儿2'})

# 在大儿子中创建两个孙子
grandson1 = ET.Element('grandson', {'name': '儿11'})
grandson2 = ET.Element('grandson', {'name': '儿12'})

#给son1添加子节点grandson1 son1.append(grandson1) #节点创建好,只是代表将节点在内存中生成了,并不代表就已经将所有节点连接在了一起,所以需要将子节点加入父节点中。 son1.append(grandson2)
# 把儿子添加到根节点中 root.append(son1) root.append(son1) tree = ET.ElementTree(root)
############ 写文件方式1(不推荐)##############
tree.write(
'oooo.xml') #没有任何要求的写入,这种写入方式,默认是:1、不识别中文,若写入中文会出现乱码 2、没有内容的节点会采取如:<neighbor direction="E"/>(非常简洁的闭合)
"""执行结果:
    <famliy><son name="&#20799;1"><age name="&#20799;11">&#23385;&#23376;</age></son><son name="&#20799;2" /></famliy>
"""
############ 写文件方式2(推荐)##############

tree.write('oooo.xml',encoding='utf-8') #能够识别中文,2,没有内容的节点会采取简单闭合
"""执行结果:
    <famliy><son name="儿1"><age name="儿11">孙子</age></son><son name="儿2" /></famliy>
"""

############ 写文件方式3(不推荐)##############
tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False) #能识别中文,2,没有内容的节点采取的闭合方式是:<rank updated="yes"></rank>
"""执行结果:
    <famliy><son name="儿1"><age name="儿11">孙子</age></son><son name="儿2"></son></famliy>
"""
#注:采取这种闭合方式,没有任何的优点,反而增加代码量
############ 写文件方式4(推荐)##############
et.write("test.xml", encoding="utf-8", xml_declaration=True) #能识别中文,同时在xml根节点前加入标识:<?xml version='1.0' encoding='utf-8'?>
"""执行结果:
<?xml version='1.0' encoding='utf-8'?>
<famliy><son name="儿1"><age name="儿11">孙子</age></son><son name="儿2" /></famliy>
"""
#注:根节点前的标识,只是一个注释的作用,代表使用的xml格式是什么版本,编码使用的是什么

创建方式二(强烈推荐):

直接调用SubElement类来创建对象,如:son1 = ET.SubElement(root, "son", attrib={'name': '儿1'}) 。在创建过程中就直接指明了,在根节点下,创建一个子节点。

from xml.etree import ElementTree as ET


# 创建根节点
root = ET.Element("famliy")


# 创建节点大儿子。 在root根节点下,创建一个子节点,节点标签为:son,属性为:'name': '儿1'
son1 = ET.SubElement(root, "son", attrib={'name': '儿1'})
# 创建小儿子
son2 = ET.SubElement(root, "son", attrib={"name": "儿2"})

# 在大儿子中创建一个孙子
grandson1 = ET.SubElement(son1, "age", attrib={'name': '儿11'})
grandson1.text = '孙子'


et = ET.ElementTree(root)  #生成文档对象
et.write("test.xml", encoding="utf-8", xml_declaration=True, short_empty_elements=False)
创建方式二

创建方式三:

调用Element类方法中的makeelement方法来节点:

from xml.etree import ElementTree as ET

# 创建根节点
root = ET.Element("famliy")


# 创建大儿子
# son1 = ET.Element('son', {'name': '儿1'})
son1 = root.makeelement('son', {'name': '儿1'})  #这个是调用Element类方法makeelement()来创建节点
# 创建小儿子
# son2 = ET.Element('son', {"name": '儿2'})
son2 = root.makeelement('son', {"name": '儿2'})

# 在大儿子中创建两个孙子
# grandson1 = ET.Element('grandson', {'name': '儿11'})
grandson1 = son1.makeelement('grandson', {'name': '儿11'})
# grandson2 = ET.Element('grandson', {'name': '儿12'})
grandson2 = son1.makeelement('grandson', {'name': '儿12'})

son1.append(grandson1)
son1.append(grandson2)


# 把儿子添加到根节点中
root.append(son1)
root.append(son1)

tree = ET.ElementTree(root)
tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)
创建方式三

6.xml格式化调整

注:格式化只是为了美观,没有实质意义,一般不建议格式化

由于原生保存的XML时默认无缩进,如果想要设置缩进的话, 需要修改保存方式:

from xml.etree import ElementTree as ET
from xml.dom import minidom  #用于来格式化xml


def prettify(elem,path):
    """将节点转换成字符串,并添加缩进。
    """
    rough_string = ET.tostring(elem, 'utf-8')  #将创建的xml转成字符串
    reparsed = minidom.parseString(rough_string)  #将字符串进行解析为xml
    raw_str= reparsed.toprettyxml(indent="\t")  #给xml进行格式化

    f = open(path, 'w', encoding='utf-8')
    f.write(raw_str)
    f.close()

# 创建根节点
root = ET.Element("famliy")

# 创建大儿子
son1 = root.makeelement('son', {'name': '儿1'})
# 创建小儿子
son2 = root.makeelement('son', {"name": '儿2'})

# 在大儿子中创建两个孙子
grandson1 = son1.makeelement('grandson', {'name': '儿11'})
grandson2 = son1.makeelement('grandson', {'name': '儿12'})

son1.append(grandson1)
son1.append(grandson2)

# 把儿子添加到根节点中
root.append(son1)
root.append(son1)

prettify(root,"test.xml")   #通过传入根节点,来指明刚创建的xml

7、命名空间

命名空间只是为了避免两个xml文件出现相同的节点名。

命名冲突
在 XML 中,元素名称是由开发者定义的,当两个不同的文档使用相同的元素名时,就会发生命名冲突。
有一个 XML 文档信息:
<table>
   <tr>
   <td>Apples</td>
   <td>Bananas</td>
   </tr>
</table>
然而还有另一个XML文档信息:
<table>
   <name>African Coffee Table</name>
   <width>80</width>
   <length>120</length>
</table>
假如这两个 XML 文档被一起使用,由于两个文档都定义的 <table> 元素,就会发生命名冲突。
XML 解析器无法确定如何处理这类冲突。
使用前缀来避免命名冲突
对第一个xml文档进行更改为:
<h:table>
   <h:tr>
   <h:td>Apples</h:td>
   <h:td>Bananas</h:td>
   </h:tr>
</h:table>
对另一个XML 文档进行更改为:
<f:table>
   <f:name>African Coffee Table</f:name>
   <f:width>80</f:width>
   <f:length>120</f:length>
</f:table>
现在,命名冲突不存在了,这是由于两个文档都使用了不同的名称来命名它们的 <table> 元素 (<h:table> 和 <f:table>)。

那么在程序中怎么实现的呢?

from xml.etree import ElementTree as ET

ET.register_namespace('com',"http://www.company.com") #定义以别名,"com"就代指"http://www.company.com"

# build a tree structure
root = ET.Element("{http://www.company.com}STUFF") 

#在创建节点的时候,里面{http://www.company.com}就代表要使用别名,在输出到文件中时{http://www.company.com}就会被自动解析为com body
= ET.SubElement(root, "{http://www.company.com}MORE_STUFF", attrib={"{http://www.company.com}hhh": "123"}) body.text = "STUFF EVERYWHERE!" # wrap it in an ElementTree instance, and save as XML tree = ET.ElementTree(root) tree.write("page.xml", xml_declaration=True, encoding='utf-8', method="xml")

二、requests和xml的结合应用:

检查QQ是否在线:

import requests
r = requests.get("http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=1223995142")
result = r.text
# XML 模块
from xml.etree import ElementTree as ET
#解析xml格式内容
#xml接受一个参数:字符串,格式化为特殊的对象
node = ET.XML(result)  
#注意:此处只能使用ET.XML() ,不可使用ET.parse(),原因:result是通过requests()方法获取到的字符串,而要将字符串解析成xml只能是ET.XML()ET.parse()是直接解析文件
#获取内容 if node.text == "Y": # text 是xml的值 print("在线") elif node.text == "N": print("离线") elif node.text == "V": print("隐身")
posted on 2017-06-05 17:13  进_进  阅读(2547)  评论(0编辑  收藏  举报