二、xml
import xml.etree.ElementTree as ET
tree=ET.parse('a.xml')
root=tree.getroot()
三种查找节点的方式
res=root.iter('rank')#会在整个树中进行查找，而且查找到所有
item.tag 标签名
item.attrib 属性
item.text 文本内容

root.find 只能在当前元素的下一级开始查找，并且找到一个就结束
root.findall 只能在当前元素的下一级开始查找，找到全部


改========
import xml.etree.ElementTree as ET
tree=ET.parse('a.xml')
root=tree.getroot()
for year in root.iter('year')：
　　year.text=str(int(year.text)+10)
　　year.attrib={'updated':'yes'}
tree.write('b.xml')
tree.write('a.xml')
增========
for country in root.iter('country'):
　　year=rountry.find('year')
　　if int(year.text)>2020:
　　　　ele=et.element('egon')
　　　　ele.attrib={'nb':'yes'}
　　　　ele.text='aaa'
　　　　country.append(ele)
　　删=========
　　　　country.remove(year)
tree.write('b.xml')

三 re
正则就是用一些具有特殊含义的符号组合到一起（称为正则表达式）来描述字符或者字符串的方法。或者说：正则就是用来描述一类事物的规则。

a='ab 12 \+-*&_\n\r'

print(re.findall('\w',a))#字母、数字、下划线

print(re.findall('\W',a))

print(re.findall('\s',a))#空白字符\r\n

print(re.findall('\S',a))#除了空白字符以外

print(re.findall('\d',a))#匹配数字（0，9）

print(re.findall('\D',a))#除了数字

b='abcalex is salexb'

print(re.findall('\Aalex',b))#匹配开头

基本上用print(re.findall(''))

print(re.findall('sb\Z',b))#匹配结尾

基本上用print(re.findall('sb$'))

重复匹配：

.   ?   *   +  {m,n}  .*  .*?
c='abc a1c aAc aaaaaca\nc'

1.代表除了换行符外的任意一个字符
print(re.findall('a.c',c)
要能认到\n print(re.findall('a.c',c,re.dotall))
?:代表左边那一个字符重复0或1次
print(re.findall('ab?','a ab abb abbb abbbb abbbb'))===['a', 'ab', 'ab', 'ab', 'ab', 'ab']
*:代表左边那一个字符出现0次或无穷次

print(re.findall('ab*','a ab abb abbb abbbb abbbb'))===['a', 'ab', 'abb', 'abbb', 'abbbb', 'abbbb', 'a']
+：代表左边那一个字符出现1次或无穷次

print(re.findall('ab+','a ab abb abbb abbbb abbbb a1bbbbbbb'))====['ab', 'abb', 'abbb', 'abbbb', 'abbbb']
{m,n}:代表左边那一个字符出现m次或n次

print(re.findall('ab{0,1}','a ab abb abbb abbbb abbbb'))===['a', 'ab', 'ab', 'ab', 'ab', 'ab']

.*：匹配任意长度，任意的字符=====》贪婪匹配 匹配最后的一个

print(re.findall('a.*c','ac a123c aaaac a *123)()c asdfasfdsadf'))==['ac a123c aaaac a *123)()c']

.*？：非贪婪匹配 匹配最近的一个

print(re.findall('a.*?c','a123c456c'))==['a123c']

():分组 取的是（）里的值所有的

print(re.findall('(alex)_sb','alex_sb asdfsafdafdaalex_sb'))===['alex', 'alex']

 []:匹配一个指定范围内的字符（这一个字符来自于括号内定义的）

print(re.findall('a[0-9][0-9]c','a1c a+c a2c a9c a11c a-c acc aAc'))===['a11c']

当-需要被当中普通符号匹配时，只能放到[]的最左边或最右边

print(re.findall('a[-+*]c','a1c a+c a2c a9c a*c a11c a-c acc aAc'))==['a+c', 'a*c', 'a-c']

[]内的^代表取反的意思

print(re.findall('a[^0-9]c','a c a1c a+c a2c a9c a*c a11c a-c acc aAc'))==['a c', 'a+c', 'a*c', 'a-c', 'acc', 'aAc']

| :或者

print(re.findall('compan(ies|y)','Too many companies have gone bankrupt, and the next one is my company'))==['ies', 'y']

(?:):代表取匹配成功的所有内容，而不仅仅只是括号内的内容

print(re.findall('compan(?:ies|y)','Too many companies have gone bankrupt, and the next one is my company'))===['companies', 'company']

re模块的其他方法：

只到找到第一个匹配然后返回一个包含匹配信息的对象,该对象可以通过调用group()方法得到匹配的字符串,如果字符串没有匹配，则返回None。

print(re.match('alex','alex sb sadfsadfasdfegon alex sb egon').group())===alex
print(re.match('alex','123213 alex sb sadfsadfasdfegon alex sb egon'))====none

同search,不过在字符串开始处进行匹配,完全可以用search+^代替

print(re.match('alex','alex sb sadfsadfasdfegon alex sb egon').group())


split按[]内:\/切分，\转译

info=r'get :a.txt\3333/rwx'
print(re.split('[ :\\\/]',info))

sub 按位置修改值

print(re.sub('(.*?)(egon)(.*?)(egon)(.*?)',r'\1\2\3EGON\5','123 egon is beutifull egon 123'))==123 egon is beutifull EGON 123

compile 全部找到匹配的字段

posted on 2018-04-09 15:11 lg04551 阅读(177) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

导航