Python学习 day13-1

- day13-1
  - 上节内容回顾
  - xml操作的具体方法

上节内容回顾

1、__name__:用于主文件判断
2、__file__：获取当前文件的路径
3、将指定文件增加到sys.path中去
4、__doc__：获取注释
5、__cahed__:缓存
6、__builtins__:内置函数放置的模块

xml操作的具体方法

ElementTree.XML()：获取一个xml类型的字符串并将其转换为xml对象；

注意
这里转化后的对象是xml对象的根节点；可以将其比作html中的<html>标签来理解

操作方法详情

是一个Element对象
XML格式类型是节点嵌套节点，对于每一个节点均有以下功能，以便对当前节点进行操作

1.class Element:
2.An XML element.
3.
4.This class is the reference implementation of the Element interface.
5.
6.An element's length is its number of subelements.  That means if you
7.want to check if an element is truly empty, you should check BOTH
8.its length AND its text attribute.
9.
10.The element tag, attribute names, and attribute values can be either
11.bytes or strings.
12.
13.*tag* is the element name.  *attrib* is an optional dictionary containing
14.element attributes. *extra* are additional element attributes given as
15.keyword arguments.
16.
17.Example form:
18.    <tag attrib>text<child/>...</tag>tail
19.
20."""
21.
22.当前节点的标签名
23.tag = None
24."""The element's name."""
25.
26.当前节点的属性
27.
28.attrib = None
29."""Dictionary of the element's attributes."""
30.
31.当前节点的内容
32.text = None
33."""
34.Text before first subelement. This is either a string or the value None.
35.Note that if there is no text, this attribute may be either
36.None or the empty string, depending on the parser.
37.
38."""
39.
40.tail = None
41."""
42.Text after this element's end tag, but before the next sibling element's
43.start tag.  This is either a string or the value None.  Note that if there
44.was no text, this attribute may be either None or an empty string,
45.depending on the parser.
46.
47."""
48.
49.def __init__(self, tag, attrib={}, **extra):
50.    if not isinstance(attrib, dict):
51.        raise TypeError("attrib must be dict, not %s" % (
52.            attrib.__class__.__name__,))
53.    attrib = attrib.copy()
54.    attrib.update(extra)
55.    self.tag = tag
56.    self.attrib = attrib
57.    self._children = []
58.
59.def __repr__(self):
60.    return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self))
61.
62.def makeelement(self, tag, attrib):
63.    创建一个新节点;只是创建
64.    """Create a new element with the same type.
65.
66.    *tag* is a string containing the element name.
67.    *attrib* is a dictionary containing the element attributes.
68.
69.    Do not call this method, use the SubElement factory function instead.
70.
71.    """
72.    return self.__class__(tag, attrib)
73.
74.def copy(self):
75.    """Return copy of current element.
76.
77.    This creates a shallow copy. Subelements will be shared with the
78.    original tree.
79.
80.    """
81.    elem = self.makeelement(self.tag, self.attrib)
82.    elem.text = self.text
83.    elem.tail = self.tail
84.    elem[:] = self
85.    return elem
86.
87.def __len__(self):
88.    return len(self._children)
89.
90.def __bool__(self):
91.    warnings.warn(
92.        "The behavior of this method will change in future versions.  "
93.        "Use specific 'len(elem)' or 'elem is not None' test instead.",
94.        FutureWarning, stacklevel=2
95.        )
96.    return len(self._children) != 0 # emulate old behaviour, for now
97.
98.def __getitem__(self, index):
99.    return self._children[index]
100.
101.def __setitem__(self, index, element):
102.    # if isinstance(index, slice):
103.    #     for elt in element:
104.    #         assert iselement(elt)
105.    # else:
106.    #     assert iselement(element)
107.    self._children[index] = element
108.
109.def __delitem__(self, index):
110.    del self._children[index]
111.
112.def append(self, subelement):
113.    为当前节点追加一个子节点
114.    """Add *subelement* to the end of this element.
115.
116.    The new element will appear in document order after the last existing
117.    subelement (or directly after the text, if it's the first subelement),
118.    but before the end tag for this element.
119.
120.    """
121.    self._assert_is_element(subelement)
122.    self._children.append(subelement)
123.
124.def extend(self, elements):
125.    为当前节点扩展 n 个子节点
126.    """Append subelements from a sequence.
127.
128.    *elements* is a sequence with zero or more elements.
129.
130.    """
131.    for element in elements:
132.        self._assert_is_element(element)
133.    self._children.extend(elements)
134.
135.def insert(self, index, subelement):
136.    在当前节点的子节点中插入某个节点，即：为当前节点创建子节点，然后插入指定位置
137.    """Insert *subelement* at position *index*."""
138.    self._assert_is_element(subelement)
139.    self._children.insert(index, subelement)
140.
141.def _assert_is_element(self, e):
142.    # Need to refer to the actual Python implementation, not the
143.    # shadowing C implementation.
144.    if not isinstance(e, _Element_Py):
145.        raise TypeError('expected an Element, not %s' % type(e).__name__)
146.
147.def remove(self, subelement):
148.    在当前节点在子节点中删除某个节点
149.    """Remove matching subelement.
150.
151.    Unlike the find methods, this method compares elements based on
152.    identity, NOT ON tag value or contents.  To remove subelements by
153.    other means, the easiest way is to use a list comprehension to
154.    select what elements to keep, and then use slice assignment to update
155.    the parent element.
156.
157.    ValueError is raised if a matching element could not be found.
158.
159.    """
160.    # assert iselement(element)
161.    self._children.remove(subelement)
162.
163.def getchildren(self):
164.    获取所有的子节点（废弃）
165.    """(Deprecated) Return all subelements.
166.
167.    Elements are returned in document order.
168.
169.    """
170.    warnings.warn(
171.        "This method will be removed in future versions.  "
172.        "Use 'list(elem)' or iteration over elem instead.",
173.        DeprecationWarning, stacklevel=2
174.        )
175.    return self._children
176.
177.def find(self, path, namespaces=None):
178.    获取第一个寻找到的子节点
179.    """Find first matching element by tag name or path.
180.
181.    *path* is a string having either an element tag or an XPath,
182.    *namespaces* is an optional mapping from namespace prefix to full name.
183.
184.    Return the first matching element, or None if no element was found.
185.
186.    """
187.    return ElementPath.find(self, path, namespaces)
188.
189.def findtext(self, path, default=None, namespaces=None):
190.    获取第一个寻找到的子节点的内容
191.    """Find text for first matching element by tag name or path.
192.
193.    *path* is a string having either an element tag or an XPath,
194.    *default* is the value to return if the element was not found,
195.    *namespaces* is an optional mapping from namespace prefix to full name.
196.
197.    Return text content of first matching element, or default value if
198.    none was found.  Note that if an element is found having no text
199.    content, the empty string is returned.
200.
201.    """
202.    return ElementPath.findtext(self, path, default, namespaces)
203.
204.def findall(self, path, namespaces=None):
205.    获取所有的子节点
206.    """Find all matching subelements by tag name or path.
207.
208.    *path* is a string having either an element tag or an XPath,
209.    *namespaces* is an optional mapping from namespace prefix to full name.
210.
211.    Returns list containing all matching elements in document order.
212.
213.    """
214.    return ElementPath.findall(self, path, namespaces)
215.
216.def iterfind(self, path, namespaces=None):
217.    获取所有指定的节点，并创建一个迭代器（可以被for循环）
218.    """Find all matching subelements by tag name or path.
219.
220.    *path* is a string having either an element tag or an XPath,
221.    *namespaces* is an optional mapping from namespace prefix to full name.
222.
223.    Return an iterable yielding all matching elements in document order.
224.
225.    """
226.    return ElementPath.iterfind(self, path, namespaces)
227.
228.def clear(self):
229.    清空节点
230.    """Reset element.
231.
232.    This function removes all subelements, clears all attributes, and sets
233.    the text and tail attributes to None.
234.
235.    """
236.    self.attrib.clear()
237.    self._children = []
238.    self.text = self.tail = None
239.
240.def get(self, key, default=None):
241.    获取当前节点的属性值
242.    """Get element attribute.
243.
244.    Equivalent to attrib.get, but some implementations may handle this a
245.    bit more efficiently.  *key* is what attribute to look for, and
246.    *default* is what to return if the attribute was not found.
247.
248.    Returns a string containing the attribute value, or the default if
249.    attribute was not found.
250.
251.    """
252.    return self.attrib.get(key, default)
253.
254.def set(self, key, value):
255.    为当前节点设置属性值
256.    """Set element attribute.
257.
258.    Equivalent to attrib[key] = value, but some implementations may handle
259.    this a bit more efficiently.  *key* is what attribute to set, and
260.    *value* is the attribute value to set it to.
261.
262.    """
263.    self.attrib[key] = value
264.
265.def keys(self):
266.    获取当前节点的所有属性的 key
267.
268.    """Get list of attribute names.
269.
270.    Names are returned in an arbitrary order, just like an ordinary
271.    Python dict.  Equivalent to attrib.keys()
272.
273.    """
274.    return self.attrib.keys()
275.
276.def items(self):
277.    获取当前节点的所有属性值，每个属性都是一个键值对
278.    """Get element attributes as a sequence.
279.
280.    The attributes are returned in arbitrary order.  Equivalent to
281.    attrib.items().
282.
283.    Return a list of (name, value) tuples.
284.
285.    """
286.    return self.attrib.items()
287.
288.def iter(self, tag=None):
289.    在当前节点的子孙中根据节点名称寻找所有指定的节点，并返回一个迭代器（可以被for循环）。
290.    """Create tree iterator.
291.
292.    The iterator loops over the element and all subelements in document
293.    order, returning all elements with a matching tag.
294.
295.    If the tree structure is modified during iteration, new or removed
296.    elements may or may not be included.  To get a stable set, use the
297.    list() function on the iterator, and loop over the resulting list.
298.
299.    *tag* is what tags to look for (default is to return all elements)
300.
301.    Return an iterator containing all the matching elements.
302.
303.    """
304.    if tag == "*":
305.        tag = None
306.    if tag is None or self.tag == tag:
307.        yield self
308.    for e in self._children:
309.        yield from e.iter(tag)
310.
311.# compatibility
312.def getiterator(self, tag=None):
313.    # Change for a DeprecationWarning in 1.4
314.    warnings.warn(
315.        "This method will be removed in future versions.  "
316.        "Use 'elem.iter()' or 'list(elem.iter())' instead.",
317.        PendingDeprecationWarning, stacklevel=2
318.    )
319.    return list(self.iter(tag))
320.
321.def itertext(self):
322.    在当前节点的子孙中根据节点名称寻找所有指定的节点的内容，并返回一个迭代器（可以被for循环）。
323.    """Create text iterator.
324.
325.    The iterator loops over the element and all subelements in document
326.    order, returning all inner text.
327.
328.    """
329.    tag = self.tag
330.    if not isinstance(tag, str) and tag is not None:
331.        return
332.    if self.text:
333.        yield self.text
334.    for e in self:
335.        yield from e.itertext()
336.        if e.tail:
337.            yield e.tail

test.xml文档内容

1.<data>
2.    <country name="Liechtenstein">
3.        <rank updated="yes">2</rank>
4.        <year>2023</year>
5.        <gdppc>141100</gdppc>
6.        <neighbor direction="E" name="Austria" />
7.        <neighbor direction="W" name="Switzerland" />
8.    </country>
9.    <country name="Singapore">
10.        <rank updated="yes">5</rank>
11.        <year>2026</year>
12.        <gdppc>59900</gdppc>
13.        <neighbor direction="N" name="Malaysia" />
14.    </country>
15.    <country name="Panama">
16.        <rank updated="yes">69</rank>
17.        <year>2026</year>
18.        <gdppc>13600</gdppc>
19.        <neighbor direction="W" name="Costa Rica" />
20.        <neighbor direction="E" name="Colombia" />
21.    </country>
22.</data>

操作方法实践

1、引入模块

1.from xml.etree import ElementTree

2、.parse(file)：解析模块

1.from xml.etree import ElementTree as ET
2.
3.f = ET.parse("test.xml")
4.print(f) # 返回一个xml对象
5.# <xml.etree.ElementTree.ElementTree object at 0x013B5AB0>

3、具体操作
.getroot()：获取整个文档的根节点(Element对象)

1.from xml.etree import ElementTree as ET
2.
3.f = ET.parse("test.xml")
4.root = f.getroot()
5.print(root) # <Element 'data' at 0x02091F60>

起始上面三种方法的具体操作和下面这段代码等价：

1.from xml.etree import ElementTree as ET
2.
3.#这个是用我们的方法操作的
4.f = open("test.xml", "r", encoding="utf-8").read()
5.xml_f = ET.XML(f)
6.print(xml_f) # <Element 'data' at 0x02061810>

说明：
.parse()：它会自动去打开文件并读取文件中的内容；并将获取的内容解析成xml对象(ElementTree对象；)；open()方法我们一般不用；用xml提供的方法替代

我们知道parse()创建的是ElementTree对象；既然是ElementTree对象；那和str(字符串)对象一样可以使用str来创建；那ElementTree对象也可以用ElementTree来创建了

1.from xml.etree import ElementTree as ET
2.
3.f = ET.parse("test.xml")
4.print(type(f))
5.
6.#### 直接使用ElementTree来创建
7.from xml.etree.ElementTree import ElementTree as ET
8.
9.s = ET("test.xml")
10.print(type(s))

结果显示：

Alt text

makeelement(tag, attrib)：创建节点(Element对象)

tag：创建节点的名称
attrib：节点的属性

1.from xml.etree import ElementTree as ET
2.
3.f = ET.parse("test.xml")
4.root = f.getroot()
5.son = root.makeelement("tt", {"name":"haha"})

.append()：为当前节点追加子节点

1.# 接着上面的代码
2.root.append(son)

注意：
这里要注意；上面的操作都是在内存中进行的；上面的操作；是不会对原文件有什么影响

问题：那要怎么才能对源文件进行操作那？
既然在内存中操作的要改变原文件必须重新写入才行
.write(file, encoding=”编码”, xml_declaration = False, short_empty_elements = True)：写入文档；接收4个参数；且是ElementTree类的方法

file：要写入的文件
encoding：写入时候的编码
xml_declaration：是否对文档进行说明；默认是不用；当将其设为True的时候；会自动在xml文档最顶部出现对文档的说明
short_empty_elements = True：没有内容的标签是否以单标签闭合；默认是 True；会以单标签闭合

1.f.write("out.xml")

注意：
这里的写入是写入一个新的文件。

看结果显示：

Alt text

与原文件对比；在data子节点最后增加的。

注意：
仔细点会发现这个增加的是个单标签

问题：为什么是单标签那？那什么时候是单什么时候是双那？
结果：标签里面有内容的时候就是双标签；没内容的时候就会显示单标签
问题：有没办法没内容也是双标签那？
有；只需要在写入的时候输入：short_empty_elements = False ；但是不建议这么做

1.f.write("out.xml", short_empty_elements = False)

结果显示：

Alt text

扩展：
我们知道字符串(str)的创建；有两种方式；其中一种是直接使用类名来进行创建的；如：s = str("abc")；那么element类型时候也可以那？

1.from xml.etree import ElementTree as ET
2.
3.f = ET.parse("test.xml")
4.root = f.getroot()
5.son = ET.Element("aa", {"age":"18"})
6.root.append(son)
7.f.write("out.xml", short_empty_elements = False)

结果显示：

Alt text

分析：
其实在python内部 .makeelement()创建节点的时候也是调用的 .Element()方法

.iter(tag)：在当前节点的子孙节点中根据输入的节点名称寻找所有指定的节点，并返回一个可迭代器(可以被for循环)

学到这里得出小结论：
所有的类名都可以创建属于自己类型的对象；如：str 可以创建str对象(即字符串对象)，element：创建的是element 对象等

在python中一切皆是对象

Element 对象

.Element()：创建节点；即 Element对象

注意：
Element对象不具备保存功能；它的操作只能局限于内存

ElementTree 对象

.ElementTree()：接收根节点为参数；返回ElementTree对象

注意：
ElementTree对象具有保存功能；能将内存中的内容通过 .write() 接收一个文件名，将内容保存到硬盘中

创建子节点的另一种方式：
.SubElement(root, tag, {“name”:”haha”})：直接为root节点创建儿子节点；同时返回儿子节点

root：要增加子节点的节点
tag：子节点的名字
{ }：里面是数据对应；新增子节点的属性

1.from xml.etree import ElementTree as ET
2.# 创建一个新得节点 data
3.root = ET.Element("data")
4.
5.ET.SubElement(root, "nihao", {"heiehi":"56"})
6.# 上面的操作是Element对象；是不具备保存功能的；
7.# 要想保存得创建ElementTree对象
8.tree = ET.ElementTree(root)
9.# 利用ElementTree得write功能；保存文件
10.tree.write("new_file")

结果显示：

Alt text

注意：
SubElement()为指定节点创建了子节点；同时会返回这个子节点

用上面得方法创建孙子节点：

1.from xml.etree import ElementTree as ET
2.# 创建一个新得节点 data
3.root = ET.Element("data")
4.
5.ET.SubElement(root, "nihao", {"heiehi":"56"})
6.# 获取当前创建得节点
7.son = ET.SubElement(root, "nihao", {"heiehi":"56"})
8.ET.SubElement(son, "buhao", {"kaka":"eueu"})
9.# 上面的操作是Element对象；是不具备保存功能的；
10.# 要想保存得创建ElementTree对象
11.tree = ET.ElementTree(root)
12.# 利用ElementTree得write功能；保存文件
13.tree.write("new_file")

结果显示：

Alt text

分析：
要创建子节点；必须要获取父节点才行。
上面得例子要穿概念root子节点得字节；所以必须获取root得子节点；son 则是。

问题：当我们增加的属性中有中文的时候；存入文档会被转化为一串不知什么码

1.# 接上面的例子
2.ET.SubElement(son, "buhao", {"kaka":"哈哈"})

结果显示：

Alt text

如果要改变，怎么处理？
.write()在写入的时候可以接收2个参数；第二个参数就是字符编码

1.tree.write("new_file", encoding="utf-8")

显示结果正常

xml注释

在用.write()写入文件的时候；传入第三个参数；xml_declaration = True

1.tree.write("new_file", encoding="utf-8", xml_declaration = True)

结果会在xml文档的最上面自动出现对文档的说明。

Alt text

注意：
xml注释虽然在文档头部；但是它不占任何位置；只是个说明的意思。当我们操作文档的时候；不会影响正常操作

xml文件内容添加缩进

问题：上面的操作我们看到写入的数据是没有缩进的；那怎么处理？
能在xml中添加缩进必须引入新的模块 xml.dom下的 minidom

1.from xml.dom import minidom

引进模块；同时要在代码中编写一个函数；对这个功能进行操作

1.def prettify(elem):
2.    """将节点转换成字符串，并添加缩进。
3.    """
4.    rough_string = ET.tostring(elem, 'utf-8')
5.    reparsed = minidom.parseString(rough_string)
6.    return reparsed.toprettyxml(indent="\t")

注意：
这个函数操作完后；返回得是个字符串

具体操作：

1.from xml.etree import ElementTree as ET
2.from xml.dom import minidom
3.
4.def prettify(elem):
5.    """将节点转换成字符串，并添加缩进。
6.    """
7.    rough_string = ET.tostring(elem, 'utf-8')
8.    reparsed = minidom.parseString(rough_string)
9.    return reparsed.toprettyxml(indent="\t")
10.
11.root = ET.Element("data")
12.
13.ET.SubElement(root, "nihao", {"heiehi":"56"})
14.
15.son = ET.SubElement(root, "nihao", {"heiehi":"56"})
16.ET.SubElement(son, "buhao", {"kaka":"哈哈"})
17.
18.new_root = prettify(root) # 将root传入函数进行转化
19.# 注意：因为返回的是字符串；所以的我们自己手动写入
20.f = open("nn.xml", "w", encoding="utf-8")
21.f.write(new_root)
22.f.close()

结果显示：

Alt text

我们还可对函数进行封装；将后面的操作文档也封装在一个函数里面。
封装后的函数；具备转换写入

1.def prettify(elem, path_file):
2.    """将节点转换成字符串，并添加缩进。
3.    同时将修改后的内容添加进执行的文件
4.    """
5.    rough_string = ET.tostring(elem, 'utf-8')
6.    reparsed = minidom.parseString(rough_string)
7.    new_root = reparsed.toprettyxml(indent="\t")
8.    f = open(path_file, "w", encoding="utf-8")
9.    f.write(new_root)
10.    f.close()

posted on 2016-10-27 01:52 jayafs 阅读(157) 评论(0) 编辑收藏举报

刷新页面返回顶部

手里有糖