python常用模块

1、re模块

re模块用于对python的正则表达式的操作

1.1 什么是正则

　　正则就是用一些具有特殊含义的符号组合到一起（称为正则表达式）来描述字符或者字符串的方法。或者说：正则就是用来描述一类事物的规则。（在Python中）它内嵌在Python中，并通过 re 模块实现。正则表达式模式被编译成一系列的字节码，然后由用 C 编写的匹配引擎执行。

1.2 常用匹配模式（元字符）

'.'     默认匹配除\n之外的任意一个字符，若指定flag DOTALL,则匹配任意字符，包括换行
'^'     匹配字符开头，若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
'$'     匹配字符结尾，或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以
'*'     匹配*号前的字符0次或多次，re.findall("ab*","cabb3abcbbac")  结果为['abb', 'ab', 'a']
'+'     匹配前一个字符1次或多次，re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']
'?'     匹配前一个字符1次或0次
'{m}'   匹配前一个字符m次
'{n,m}' 匹配前一个字符n到m次，re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']
'|'     匹配|左或|右的字符，re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'
'(...)' 分组匹配，re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c
'[a-z]' 匹配a到z任意一个字符
'[^()]' 匹配除()以外的任意一个字符
  
r' '    转义引号里的字符 针对\字符  详情查看⑦
'\A'    只从字符开头匹配，re.search("\Aabc","alexabc") 是匹配不到的
'\Z'    匹配字符结尾，同$
'\d'    匹配数字0-9
'\D'    匹配非数字
'\w'    匹配[A-Za-z0-9]
'\W'    匹配非[A-Za-z0-9]
'\s'    匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 '\t'
    
'(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city")
结果{'province': '3714', 'city': '81', 'birthday': '1993'}
re.IGNORECASE  忽略大小写 re.search('(\A|\s)red(\s+|$)',i,re.IGNORECASE)

贪婪模式、懒惰模式：

import re
print(re.findall('a.*c','a123c456c'))
输出结果如下：
C:\Users\Administrator\AppData\Local\Programs\Python\Python36\python.exe C:/Users/Administrator/PycharmProjects/python18/day6/正则re.py
['a123c456c']

1.3 match：

从起始位置开始根据模型去字符串中匹配指定内容：

import re                              
 
obj = re.match('\d+', '123uua123sf')       #<==从第一个字符开始匹配一个到多个数字
print(obj)                               
#<_sre.SRE_Match object; span=(0, 3), match='123'>  #<==输出结果
 
if obj:                                   #<==如果有匹配到字符则执行，为空不执行
    print(obj.group())                    #<==打印匹配到的内容
#123            #<==输出结果

匹配ip地址：

import re
 
ip = '255.255.255.253'
result=re.match(r'^([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.'
                r'([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])$',ip)
print(result)
# <_sre.SRE_Match object; span=(0, 15), match='255.255.255.253'>

1.4 search：

根据模型去字符串中匹配指定内容（不一定是最开始位置），匹配最前

#search
import  re
obj = re.search('\d+', 'a123uu234asf')     #从数字开始匹配一个到多个数字
print(obj)
#<_sre.SRE_Match object; span=(1, 4), match='123'>
 
if obj:                                   #如果有匹配到字符则执行，为空不执行
    print(obj.group())                    #打印匹配到的内容
#123
 
 
import  re
obj = re.search('\([^()]+\)', 'sdds(a1fwewe2(3uusfdsf2)34as)f')     #匹配最里面（）的内容
print(obj)
#<_sre.SRE_Match object; span=(13, 24), match='(3uusfdsf2)'>
 
if obj:                                   #如果有匹配到字符则执行，为空不执行
    print(obj.group())                    #打印匹配到的内容
#(3uusfdsf2)

1.5 group与groups的区别：

#group与groups的区别
import  re
a = "123abc456"
b = re.search("([0-9]*)([a-z]*)([0-9]*)", a)
print(b)
#<_sre.SRE_Match object; span=(0, 9), match='123abc456'>
print(b.group())
#123abc456
print(b.group(0))
#123abc456
print(b.group(1))
#123
print(b.group(2))
#abc
print(b.group(3))
#456
print(b.groups())
#('123', 'abc', '456')

1.6 findall：

上述两中方式均用于匹配单值，即：只能匹配字符串中的一个，如果想要匹配到字符串中所有符合条件的元素，则需要使用 findall；findall没有group用法　　

#findall
import  re
obj = re.findall('\d+', 'a123uu234asf')     #匹配多个
 
if obj:                                   #如果有匹配到字符则执行，为空不执行
    print(obj)                             #生成的内容为列表
#['123', '234']

1.7 sub：

用于替换匹配的字符串(pattern, repl, string, count=0, flags=0)

#sub
import  re
 
content = "123abc456"
new_content = re.sub('\d+', 'ABC', content)
print(new_content)
#ABCabcABC

1.8 split：

根据指定匹配进行分组(pattern, string, maxsplit=0, flags=0)

#split
import  re
 
content = "1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )"
new_content = re.split('\*', content)       #用*进行分割，分割为列表
print(new_content)
#['1 - 2 ', ' ((60-30+1', '(9-2', '5/3+7/3', '99/4', '2998+10', '568/14))-(-4', '3)/(16-3', '2) )']
 
content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
new_content = re.split('[\+\-\*\/]+', content)
# new_content = re.split('\*', content, 1)
print(new_content)
#["'1 ", ' 2 ', ' ((60', '30', '1', '(9', '2', '5', '3', '7', '3', '99', '4', '2998', '10', '568', '14))',
#  '(', '4', '3)', '(16', '3', "2) )'"]
 
inpp = '1-2*((60-30 +(-40-5)*(9-2*5/3 + 7 /3*99/4*2998 +10 * 568/14 )) - (-4*3)/ (16-3*2))'
inpp = re.sub('\s*','',inpp)                #把空白字符去掉
print(inpp)
new_content = re.split('\(([\+\-\*\/]?\d+[\+\-\*\/]?\d+){1}\)', inpp, 1)
print(new_content)
#['1-2*((60-30+', '-40-5', '*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2))']

1.9 补充r' ' 转义：

#文件njx.txt
fdfdsfds\fds
sfdsfds& @$

首先要清楚，程序读取文件里的\字符时，添加到列表里面的是\\：　　

import re,sys
ning = []
with open('njx.txt','r',encoding="utf-8") as file:
    for line in file:
        ning.append(line)
print(ning)                   # 注意：文件中的单斜杠，读出来后会变成双斜杠
# ['fdfdsfds\\fds\n', 'sfdsfds& @$']
print(ning[0])                # print打印的时候还是单斜杠
# fdfdsfds\fds

r字符的意义，对字符\进行转义，\只做为字符出现：

import re,sys
ning = []
with open('njx.txt','r',encoding="utf-8") as file:
    for line in file:
        print(re.findall(r's\\f', line))  #第一种方式匹配
        # print(re.findall('\\\\', line))  #第二种方式匹配
        ning.append(line)
print(ning)                   # 注意：文件中的单斜杠，读出来后会变成双斜杠
# ['s\\f']
# []
# ['fdfdsfds\\fds\n', 'sfdsfds& @$']

补充：看完下面的代码你可能更懵了

import re
re.findall(r'\\', line)  # 正则中只能这样写 不能写成 r'\' 这样
print(r'\\')            # 只能这样写 不能写成r'\' \只能是双数
# \\        结果
# 如果想值打印单个\ 写成如下
print('\\')             # 只能是双数
# \         结果

总结：文件中的单斜杠\，读出到程序中时是双斜杠\\，print打印出来是单斜杠\；正则匹配文件但斜杠\时,用r'\\'双斜杠去匹配，或者不用r直接用'\\\\'四个斜杠去匹配

1.10 compile函数：

说明：

　　Python通过re模块提供对正则表达式的支持。使用re的一般步骤是先使用re.compile()函数，将正则表达式的字符串形式编译为Pattern实例，然后使用Pattern实例处理文本并获得匹配结果（一个Match实例），最后使用Match实例获得信息，进行其他的操作

举一个简单的例子，在寻找一个字符串中所有的英文字符：

import re
pattern = re.compile('[a-zA-Z]')
result = pattern.findall('as3SiOPdj#@23awe')
print(result)
# ['a', 's', 'S', 'i', 'O', 'P', 'd', 'j', 'a', 'w', 'e']

匹配IP地址（255.255.255.255）：

import re
 
pattern = re.compile(r'^(([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])\.){3}([1-9]?\d|1\d\d|2[0-4]\d|25[0-5])$')
result = pattern.match('255.255.255.255')
print(result)
# <_sre.SRE_Match object; span=(0, 15), match='255.255.255.255'>

2 、time模块

在Python中，通常有这几种方式来表示时间：

时间戳(timestamp)：通常来说，时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。我们运行“type(time.time())”，返回的是float类型。
格式化的时间字符串(Format String)
结构化的时间(struct_time)：struct_time元组共有9个元素共九个元素:(年，月，日，时，分，秒，一年中第几周，一年中第几天，夏令时)

import time
#--------------------------我们先以当前时间为准,让大家快速认识三种形式的时间
print(time.time()) # 时间戳:1487130156.419527
print(time.strftime("%Y-%m-%d %X")) #格式化的时间字符串:'2017-02-15 11:40:53'

print(time.localtime()) #本地时区的struct_time
print(time.gmtime())    #UTC时区的struct_time

其中计算机认识的时间只能是'时间戳'格式，而程序员可处理的或者说人类能看懂的时间有: '格式化的时间字符串'，'结构化的时间' ，于是有了下图的转换关系　　

#--------------------------按图1转换时间
# localtime([secs])
# 将一个时间戳转换为当前时区的struct_time。secs参数未提供，则以当前时间为准。
time.localtime()
time.localtime(1473525444.037215)

# gmtime([secs]) 和localtime()方法类似，gmtime()方法是将一个时间戳转换为UTC时区（0时区）的struct_time。

# mktime(t) : 将一个struct_time转化为时间戳。
print(time.mktime(time.localtime()))#1473525749.0


# strftime(format[, t]) : 把一个代表时间的元组或者struct_time（如由time.localtime()和
# time.gmtime()返回）转化为格式化的时间字符串。如果t未指定，将传入time.localtime()。如果元组中任何一个
# 元素越界，ValueError的错误将会被抛出。
print(time.strftime("%Y-%m-%d %X", time.localtime()))#2016-09-11 00:49:56

# time.strptime(string[, format])
# 把一个格式化时间字符串转化为struct_time。实际上它和strftime()是逆操作。
print(time.strptime('2011-05-05 16:37:06', '%Y-%m-%d %X'))
#time.struct_time(tm_year=2011, tm_mon=5, tm_mday=5, tm_hour=16, tm_min=37, tm_sec=6,
#  tm_wday=3, tm_yday=125, tm_isdst=-1)
#在这个函数中，format默认为："%a %b %d %H:%M:%S %Y"。

#--------------------------按图2转换时间
# asctime([t]) : 把一个表示时间的元组或者struct_time表示为这种形式：'Sun Jun 20 23:21:05 1993'。
# 如果没有参数，将会将time.localtime()作为参数传入。
print(time.asctime())#Sun Sep 11 00:43:43 2016

# ctime([secs]) : 把一个时间戳（按秒计算的浮点数）转化为time.asctime()的形式。如果参数未给或者为
# None的时候，将会默认time.time()为参数。它的作用相当于time.asctime(time.localtime(secs))。
print(time.ctime())  # Sun Sep 11 00:46:38 2016
print(time.ctime(time.time()))  # Sun Sep 11 00:46:38 2016

#--------------------------其他用法
# sleep(secs)
# 线程推迟指定的时间运行，单位为秒。

3、random模块

import random
# print(random.sample([1,'23',[4,5]],2))
#
# print(random.uniform(1,3))

#
# item=[1,3,5,7,9]
# random.shuffle(item)
# print(item)



def make_code(n):
    res=''
    for i in range(n):
        s1=str(random.randint(0,9))
        s2=chr(random.randint(65,90))
        res+=random.choice([s1,s2])
    return res
print(make_code(10))

4、os模块　　

os模块是与操作系统交互的一个接口

#os模块
import os
 
os.getcwd() #获取当前工作目录，即当前python脚本工作的目录路径
os.chdir("dirname")  #改变当前脚本工作目录；相当于shell下cd
os.curdir  #返回当前目录: ('.')
os.pardir  #获取当前目录的父目录字符串名：('..')
os.makedirs('dirname1/dirname2')    #可生成多层递归目录
os.removedirs('dirname1')   # 若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
os.mkdir('dirname')   # 生成单级目录；相当于shell中mkdir dirname
os.rmdir('dirname')    #删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
os.listdir('dirname')    #列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
os.remove() # 删除一个文件
os.rename("oldname","newname") # 重命名文件/目录
os.stat('path/filename') # 获取文件/目录信息
os.sep    #输出操作系统特定的路径分隔符，win下为"\\",Linux下为"/"
os.linesep    #输出当前平台使用的行终止符，win下为"\t\n",Linux下为"\n"
os.pathsep    #输出用于分割文件路径的字符串
os.name    #输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command")  #运行shell命令，直接显示
os.environ  #获取系统环境变量
os.path.abspath(path)  #返回path规范化的绝对路径
os.path.split(path)  #将path分割成目录和文件名二元组返回
os.path.dirname(path) # 返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path) # 返回path最后的文件名。如何path以／或\结尾，那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path)  #如果path存在，返回True；如果path不存在，返回False
os.path.isabs(path)  #如果path是绝对路径，返回True
os.path.isfile(path)  #如果path是一个存在的文件，返回True。否则返回False
os.path.isdir(path)  #如果path是一个存在的目录，则返回True。否则返回False
os.path.join(path1[, path2[, ...]]) # 将多个路径组合后返回，第一个绝对路径之前的参数将被忽略
os.path.getatime(path)  #返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path)  #返回path所指向的文件或者目录的最后修改时间

5、sys模块

用于提供对解释器相关的操作　　

#sys模块
import sys
 
sys.argv           #命令行参数List，第一个元素是程序本身路径
sys.exit(n)        #退出程序，正常退出时exit(0)
sys.version       # 获取Python解释程序的版本信息
sys.maxint         #最大的Int值
sys.path           #返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
sys.platform      #返回操作系统平台名称
sys.stdout.write('please:')

def progress(percent,width=50): #51
if percent >= 100:
# print('\r[%s] 100%%' %(width*'#'))
percent=100
show_str=('[%%-%ds]' %width) %(int(width*percent/100)*'#')
print('\r%s %d%%' %(show_str,percent),file=sys.stdout,flush=True,end='')
#
total_size=1025121
recv_size=0

while recv_size < total_size:
time.sleep(0.01) #模拟下载的网络延迟
recv_size+=1024
recv_per=int(100*recv_size/total_size)
progress(recv_per,width=10)

6、json 和 pickle模块　　

文件只能存二进制或字符串，不能存其他类型，所以用到了用于序列化的两个模块：

json，用于字符串和python数据类型间进行转换，将数据通过特殊的形式转换为所有语言都认识的字符串（字典，变量，列表）

pickle，用于python特有的类型和python的数据类型间进行转换，将数据通过特殊的形式转换为只有python认识的字符串（函数，类）

json模块

#json 序列化和反序列化
import json
 
info ={　　　　　　　　　　　　　　 #字典
    "name":"jun",
    "age":"18"
}
 
with open("test","w") as f:
    f.write(json.dumps(info))   #用json把info写入到文件test中
 
with open("test","r") as f:
    info = json.loads(f.read())
    print(info["name"])
 
#jun

pickle模块　　

#pickle 序列化和反序列化
import pickle　　　　　　　　#pickle支持python特有的所有类型
 
def func():                 #函数
    info ={
        "name":"jun",
        "age":"18"
    }
    print(info,type(info))
 
func()
#{'age': '18', 'name': 'jun'} <class 'dict'>
 
with open("test","wb") as f:
    f.write(pickle.dumps(func))   #用pickle把func写入到文件test中 如果用json此时会报错
 
with open("test","rb") as f:
    func_new = pickle.loads(f.read())
    func_new()
#{'age': '18', 'name': 'jun'} <class 'dict'>

7、shelve模块　　

shelve模块内部对pickle进行了封装，shelve模块是一个简单的k,v将内存数据通过文件持久化的模块，可以持久化任何pickle可支持的python数据格式　　

import shelve
 
# k，v方式存储数据
s = shelve.open("shelve_test")  # 打开一个文件
tuple = (1, 2, 3, 4)
list = ['a', 'b', 'c', 'd']
info = {"name": "jun", "age": 18}
s["tuple"] = tuple  # 持久化元组
s["list"] = list
s["info"] = info
s.close()
 
 
# 通过key获取value值
d = shelve.open("shelve_test")  # 打开一个文件
print(d["tuple"])  # 读取
print(d.get("list"))
print(d.get("info"))
 
# (1, 2, 3, 4)
# ['a', 'b', 'c', 'd']
# {'name': 'jun', 'age': 18}
d.close()
 
 
# 循环打印key值
s = shelve.open("shelve_test")  # 打开一个文件
for k in s.keys():              # 循环key值
    print(k)
 
# list
# tuple
# info
s.close()
 
 
# 更新key的value值
s = shelve.open("shelve_test")  # 打开一个文件
s.update({"list":[22,33]})      #重新赋值或者s["list"] = [22,33]
print(s["list"])
 
#[22, 33]
s.close()

posted @ 2017-08-09 15:07 junxun 阅读(240) 评论(0) 收藏举报

刷新页面返回顶部

junxun

python常用模块

1、re模块

1.1 什么是正则

1.2 常用匹配模式（元字符）

1.3 match：

1.4 search：

1.5 group与groups的区别：

1.6 findall：

1.7 sub：

1.8 split：

1.9 补充r' ' 转义：

1.10 compile函数：

2 、time模块

3、random模块

4、os模块

6、json 和 pickle模块

公告

4、os模块　　

6、json 和 pickle模块