(董付国)Python 学习笔记---Python字符串与正则表达式（1）

第四章字符串与正则表达式

最早的字符串编码是美国的标准信息交换码ASCII。随着信息技术的发展和信息交换的需要，各国的文字都需要进行编码，不同的应用领域和场合对字符串编码的要求也略有不同，常见的有UTF-8、UTF-16、UTF-32、GB2312(我国)、GBK、CP936、CP437等等。
不同编码格式之间相差很大，采用不同的编码格式意味着不同的表示和存储形式，把同一字符存入文件时，写入的内容可能会不同，在试图理解其内容时必须了解编码规则并进行正确的解码。如果解码方法不正确就无法还原信息，从这个角度来讲，字符串编码也具有加密效果。
Python 3.X完全支持中文字符，默认使用UTF8编码格式，无论是一个数字、一个英文字母，还是一个汉字，都按一个字符对待和
处理。
在Python中，字符串属于不可变序列类型，除了支持序列通用方法（包括切片操作）以外，还支持特有的字符串操作方法，反向索引等操作。
Python字符串驻留机制：对于短字符串，将其赋值给多个不同对象时，内存中只有一个副本，就是一个短字符串可以同时对应多个不同变量，多个对象共享该副本。但是长字符串不遵守驻留机制，即当字符串比较长时，把它赋给多个变量是，内存中是有多个不同地址存储它的。

4.1.1 字符串格式化
在这里插入图片描述

常用格式字符

>>> x = 1235
>>> so = "%o" % x
>>> so
'2323'
>>> sh = "%x" % x
>>> sh
'4d3'
>>> se = "%e" % x
>>> se
'1.235000e+03'
>>> chr(ord("3")+1)
'4'
>>> "%s" % 65
'65'
>>> "%s" % 65333
'65333'
>>> "%d" % "555"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: %d format: a number is required, not str
>>> int('555')
555
>>> '%s' %[1,2,3]
'[1, 2, 3]'
>>> str((1,2,3))
'(1, 2, 3)'
>>> str([1,2,3])
'[1, 2, 3]'

使用format方法进行格式化

>>> print("The number {0:,} in hex is : {0:#x}, the number {1} in oct is {1:#o}".format(5555,55))
The number 5,555 in hex is : 0x15b3, the number 55 in oct is 0o67
>>> print("The number {1:,} in hex is : {1:#x}, the number {0} in oct is {0:#o}".format(5555,55))
The number 55 in hex is : 0x37, the number 5555 in oct is 0o12663
>>> print("my name is {name},my age is {age}, and my QQ is {qq}".format(name = "He Zhibin",age = 22,qq = "30647355"))
my name is He Zhibin,my age is 22, and my QQ is 30647355
>>> position = (5,8,13)
>>> print("X:{0[0]};Y:{0[1]};Z:{0[2]}".format(position))
X:5;Y:8;Z:13

>>> weather = [("Monday","rain"),("Tuesday","sunny"),("Wednesday","sunny"),("Thursday","rain"),("Friday","Cloudy")]
>>> formatter = "Weather of '{0[0]}' is '{0[1]}'".format
>>> for item in map(formatter,weather):
...     print(item)
...
Weather of 'Monday' is 'rain'
Weather of 'Tuesday' is 'sunny'
Weather of 'Wednesday' is 'sunny'
Weather of 'Thursday' is 'rain'
Weather of 'Friday' is 'Cloudy'
>>> for item in weather:
...     print(formatter(item))
...
Weather of 'Monday' is 'rain'
Weather of 'Tuesday' is 'sunny'
Weather of 'Wednesday' is 'sunny'
Weather of 'Thursday' is 'rain'
Weather of 'Friday' is 'Cloudy'

从Python 3.6.X开始支持一种新的字符串格式化方式，官方叫做Formatted String Literals，其含义与字符串对象的format()方法类似，但形式更加简介。

>>> name = 'He'
>>> age = 22
>>> f'My name is {name},and I am {age} years old.'
'My name is He,and I am 22 years old.'
>>> width = 10
>>> precision = 4
>>> value = 11/3
>>> f'result:{value:{width}.{precision}}'
'result:     3.667'
>>> #值是11/3，宽度是10，精确度是4位

4.1.2 字符串常用方法

find()、rfind()、index()、rindex()、count()
(1)find()和rfind()方法分别用来查找一个字符串在另一个字符串指定范围（默认是整个字符串）中首次和最后一次出现的位置，如果不存在则返回-1。
(2)index()和rindex()方法用来返回一个字符串在另一个字符串指定范围中首次出现和最后一次出现的位置，如果不存在则抛出异常
(3)count()方法用来返回一个字符串在另一个字符串中出现的次数。

>>> s = "apple,peach,banana,peach,pear"
>>> s.find("peach")
6
>>> s.find("peach",7)
19
>>> s.find("peach",7,20)
-1
>>> s.rfind('p')
25
>>> s.index('p')
1
>>> s.index('pe')
6
>>> s.index('pear')
25
>>> s.index('ppp')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found
>>> s.count('p')
5
>>> s.count('pp')
1
>>> s.count('ppp')
0

split()、replit()、paitition()、rpartition()
（1）split()和rsplit()方法分别用来以指定字符为分隔符，将字符串左端和右端开始将其分割成多个字符串，并返回包含分割结果的列表；
（2）partition()和rpartition()用来以指定字符串为分隔符将原字符串分割为3部分，即分隔符前的字符串、分隔符字符串、分隔符后的字符串如果指定的分隔符不在原字符串中，则返回字符串和两个空字符串。

>>> s = "apple,peach,banana,pear"
>>> li = s.split(",")
>>> li
['apple', 'peach', 'banana', 'pear']
>>> s.partition(',')
('apple', ',', 'peach,banana,pear')
>>> s.rpartition(',')
('apple,peach,banana', ',', 'pear')
>>> s.rpartition('banana')
('apple,peach,', 'banana', ',pear')
>>> s = "2014-10-31"
>>> t = s.split("-")
>>> print('-')
-
>>> print(t)
['2014', '10', '31']
>>> print(list(map(int,t)))
[2014, 10, 31]

对于split()和rsplit()方法，如果不指定分隔符，则字符串中的任何空白符号（包括空格、换行符、制表符等等）都将被认为是分隔符，返回包含最终分割结果的列表。

>>> s = "hello world \n\n My name is He   "
>>> s.split()
['hello', 'world', 'My', 'name', 'is', 'He']
>>> s = '\n\nhello world \n\n My name is He'
>>> s.split()
['hello', 'world', 'My', 'name', 'is', 'He']
>>> s = '\n\nhello \t\t world \n\n\n My name is He.'
>>> s.split()
['hello', 'world', 'My', 'name', 'is', 'He.']

split()和rsplit()方法还匀速指定最大分割次数

>>> s = '\n\nhello\t\tworld \n\n\n My name is He'
>>> s.split(None,1)
['hello', 'world \n\n\n My name is He']
>>> s.rsplit(None,1)
['\n\nhello\t\tworld \n\n\n My name is', 'He']
>>> s.split(None,2)
['hello', 'world', 'My name is He']
>>> s.rsplit(None,2)
['\n\nhello\t\tworld \n\n\n My name', 'is', 'He']
>>> s.split(maxsplit=6)
['hello', 'world', 'My', 'name', 'is', 'He']
>>> s.split(maxsplit=100)
['hello', 'world', 'My', 'name', 'is', 'He']

调用split()方法并且不传递任何参数时，将使用任何空白字符作为分隔符，把连续多个空白字符看作一个；明确传递参数指定split()使用的分隔符时，情况略有不同。

>>> 'a,,,bb,,ccc'.split(',')
['a', '', '', 'bb', '', 'ccc']
>>> 'a\t\t\tbb\t\tccc'.split('\t')
['a', '', '', 'bb', '', 'ccc']
>>> 'a\t\t\tbb\t\tccc'.split()
['a', 'bb', 'ccc']

partition()和rpartition()方法以指定字符串为分隔符将原字符串分隔为3部分，即分隔符之前的字符串、分隔符字符串和分隔符之后的字符串。

>>> s = "apple,peach,banana,pear"
>>> s.partition(',')
('apple', ',', 'peach,banana,pear')
>>> s.rpartition(',')
('apple,peach,banana', ',', 'pear')
>>> s.partition('banana')
('apple,peach,', 'banana', ',pear')
>>> 'abababab'.partition('a')
('', 'a', 'bababab')
>>> 'abababab'.rpartition('a')
('ababab', 'a', 'b')

字符串连接join()

>>> li = ["apple","peach","banana","pear"]
>>> sep = ","
>>> s = sep.join(li)
>>> s
'apple,peach,banana,pear'

不推荐使用“+”运算符连接字符串，优先使用join()方法
lower()、upper()、capitalize()、title()、swapcase()他们的特点是返回一个新字符，不对原字符串进行修改

>>> s = "What is your Name?"
>>> s.lower()                   #返回小写字符串
'what is your name?'
>>> s.upper()                   #返回大写字符串
'WHAT IS YOUR NAME?'
>>> s.capitalize()              #字符串首字符大写
'What is your name?'
>>> s.title()                   #每个单词的首字母大 写
'What Is Your Name?'
>>> s.swapcase()                #大小写互换
'wHAT IS YOUR nAME?'

查找替换replace()，类似于“查找与替换”功能，它是整体全部替换

>>> s = "中国，中国"
>>> s
'中国，中国'
>>> s2 = s.replace("中国","中华人民共和国")
>>> s2
'中华人民共和国，中华人民共和国'

测试用户输入中是否有敏感词，如果有的话就把敏感词替换为3个星号***。

>>> words = ('测试','非法','暴力','话')
>>> text = '这句话里含有非法内容'
>>> for word in words:
...     if word in text:
...             text = text.replace(word,'***')
...
>>> text
'这句***里含有***内容'

字符串对象的maketrans()方法用来生成字符映射表，而translate()方法用来根据映射表中定义的对应关系转换字符串并替换其中的字符，使用这两个方法的组合可以同时处理斗个不同的字符，replace()方法则无法满足这一要求。该方法也可以用作加密算法。

>>> table = ''.maketrans('abcdef123','uvwxyz@#$')       #创建 映射表，将字符“abcdef123”一一对应地转换为“uvwxyz@#$”
>>> s = "Python is a great programming language.I like it!"
>>> s.translate(table)
'Python is u gryut progrumming lunguugy.I liky it!'

映射表可以看成密钥，s可以看成明文，最后的就是密文。

- 凯撒加密

>>> import string
>>> def kaisa(s,k):             #传入字符串和加密的位置下标。
...     lower = string.ascii_lowercase          #小写字母
...     upper = string.ascii_uppercase          #大写字母
...     before = string.ascii_letters
...     after = lower[k:] + lower[:k] + upper[k:] + upper[:k]
...     table = ''.maketrans(before,after)      #创建映射表
...     return s.translate(table)
...
>>> s = "Python is a greate programming language.I like it!"
>>> kaisa(s,3)
'Sbwkrq lv d juhdwh surjudpplqj odqjxdjh.L olnh lw!'

strip()、rstrip()、lstrip()

>>> s = "  abc   "
>>> s.strip()                           #删除空白字符
'abc'
>>> s
'  abc   '
>>> "\n\nhello world   \n\n".strip()    #删除指定字符
'hello world'
>>> "aaaassddf".strip("a")              #删除指定字符
'ssddf'
>>> "aaaassddf".strip("af")             #删除指定字符
'ssdd'
>>> "aaaassddf".rstrip("a")             #删除字符串右端指定字符
'aaaassddf'
>>> "aaaassddfaaa".lstrip("a")          #删除字符串左端指定字符
'ssddfaaa'

这三个函数的参数指定的字符串并不作为一个整体对待，而是在原字符串的两侧、右侧、左侧删除参数字符串中包含的所有字符，一层一层地从外往里扒。

>>> 'aabbccddeeeffg'.strip('af')        #字母f不在字符两侧，所以不删除
'bbccddeeeffg'
>>> 'aabbccddeeeffg'.strip('gaf')
'bbccddeee'
>>> 'aabbccddeeeffg'.strip('gaef')
'bbccdd'
>>> 'aabbccddeeeffg'.strip('gbaef')
'ccdd'
>>> 'aabbccddeeeffg'.strip('gbaefcd')
''

内置函数eval()

>>> eval("3+4")
7
>>> a = 3
>>> b = 5
>>> eval('a+b')
8
>>>
>>> import math
>>> eval('help(math.sqrt)')
Help on built-in function sqrt in module math:

sqrt(x, /)
    Return the square root of x.

>>> eval('math.sqrt(3)')
1.7320508075688772
>>> eval('aa')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
NameError: name 'aa' is not defined

注：eval()函数是非常危险的，如果有用户精心构造一些恶意输入的话，可能会影响安全。

- 成员判断，关键字in

>>> "a" in "abcd"       #测试一个字符中是否存在于另一个字符串中
True
>>> 'ab' in 'abcd'
True
>>> 'ac' in 'abcd'      #关键字in左边的字符串作为一个整体对待
False
>>> "j" in "abcd"
False

Python字符串支持与整数的乘法运算，表示序列重复，也就是字符串内容的重复

>>> 'abc'*3
'abcabcabc'

s.startswith(t)、s.endswith(t)，判断字符串是否以指定字符串开始或结束

>>> s = "Beautiful is better than ugly"
>>> s.startswith('Be')          #检测整个字符串
True
>>> s.startswith('Be',5)        #指定检测范围其实位置
False
>>> s.startswith('Be',0,5)      #指定检测范围起始和结束位置
True
>>> s.endswith('h')
False
>>> s.endswith('y')
True

>>> import os
>>> [filename for filename in os.listdic(r'c:\\') if filename.endswith(('.bmp','.jpg','.gif'))]

center()、ljust()、rjust()，返回指定宽度的新字符串，原字符串居中、左对齐或右对齐出现在新字符串中，如果指定宽度大于字符串长度，则使用指定的字符（默认为空格）进行填充。

>>> 'Hello world!'.center(20)           #居中对齐，以空格进行填充
'    Hello world!    '
>>> 'Hello world!'.center(20,'=')               #居中对齐，以字符‘=’进行填充
'====Hello world!===='
>>> 'Hello world!'.ljust(20,'=')                #居中对齐，以字符‘=’进行填充
'Hello world!========'
>>> 'Hello world!'.rjust(20,'=')                #右对齐，以字符‘=’进行填充
'========Hello world!'

zfill()返回指定宽度的字符串，在左侧以字符0进行填充。

>>> 'abc'.zfill(5)              #在左侧填充数字字符0
'00abc'
>>> 'abc'.zfill(2)              #制定宽度小于字符串长度时，返回字符串本身
'abc'
>>> 'uio'.zfill(20)
'00000000000000000uio'

isalnum()、isalpha()、isdigit()、isdecimal()、isnumeric()、isspace()、isupper()、islower()，用来测试字符串是否为数字或字母、是否为字母、是否为数字字符和汉字数字、罗马数字、是否为空白字符、是否为大写字母以及是否为小写字母。

>>> '1234abcd'.isalnum()
True
>>> '1234abcd'.isalpha()                #全部为英文字母时返回True
False
>>> '1234abcd'.isdigit()                #全部为数字时返回True
False
>>> 'abcd'.isalpha()
True
>>> '1234.0'.isdigit()
False

除了字符串对象提供的方法以外，很多Python内置函数也可以对字符串进行操作，例如：len()测试字符串长度，max(),min()以及zip()函数等。
切片也适用于字符串，列表，元组，但仅限于读取其中的元素，不支持字符串修改。字典不支持切片，因为是无序的。

>>> 'Explicit is better than implicit.'[:8]
'Explicit'
>>> 'Explicit is better than implicit.'[9:23]
'is better than'

Python标准库zlib中提供的compress()和decompress()

>>> import zlib
>>> x = 'Python程序设计系列书，董付国编著，清华大学出版社'.encode()
>>> len(x)
69
>>> y = zlib.compress(x)                #对于重复度比较小的信息，压缩比小
>>> len(y)
80
>>> x = ('Python系列图书'*3).encode()
>>> len(x)
54
>>> y = zlib.compress(x)
>>> len(y)
30
>>> y = zlib.compress(x)
>>> len(y)
30
>>> z = zlib.decompress(y)
>>> len(z)
54
>>> z.decode()
'Python系列图书Python系列图书Python系列图书'

>>> x = ['何帅帅']*8
>>> y = str(x).encode()
>>> len(y)
104
>>> z = zlib.compress(y)        #只能对字节串进行压缩
>>> len(z)
26
>>> zlib.decompress(z).decode()
"['何帅帅', '何帅帅', '何帅帅', '何帅帅', '何帅帅', '何帅帅', '何帅帅', '何帅帅']"

posted @ 2019-08-14 09:49 旅人_Eric 阅读(408) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

瑾毅

仰望星空，脚踏实地

(董付国)Python 学习笔记---Python字符串与正则表达式（1）

第四章字符串与正则表达式

公告

瑾毅

仰望星空，脚踏实地

(董付国)Python 学习笔记---Python字符串与正则表达式（1）

第四章 字符串与正则表达式

公告

第四章字符串与正则表达式