Python3基础之内置模块
正文
模块和包
一、定义:
模块:用来从逻辑上组织Python代码(变量,函数,类,逻辑:实现一个功能),
本质就是.py结尾的Python文件
包:用来从逻辑上组织模块,本质就是一个目录(必须带有一个__init__.py文件)
二、导入方法:
import module_name,module_name2,...
from module_name import *
from module_name import m1,m2,m3
from module_name import logger as logger1
三、import本质(路径搜索和搜索路径)
导入模块的本质就是把Python文件解释一遍,解释器解释该py文件
(import test test='test.py all code')
(from test import name name='code')
import module_name----->module_name.py----->module_name.py的路径
导入包的本质就是执行该包的__init__.py文件,解释器解释该包下的 __init__.py 文件
__name__
当做脚本运行:
__name__ 等于'__main__'
当做模块导入:
__name__= 模块名
我们可以借助这个特性来控制我们的py文件在不同的应用场景下执行不同的逻辑。
举个例子:
四、导入优化
from module_test import test
五、模块的分类
import加载的模块分为四个通用类别:
1 使用python编写的代码(.py文件)
2 已被编译为共享库或DLL的C或C++扩展
3 包好一组模块的包
4 使用C编写并链接到python解释器的内置模块
常用内置模块
(一)时间模块
在Python中,通常有这几种方式来表示时间:
- 时间戳 1970年1月1日之后的秒,即:time.time()
- 格式化的字符串 2014-11-11 11:11, 即:time.strftime('%Y-%m-%d')
- 结构化时间 元组包含了:年、日、星期等... time.struct_time 即:time.localtime()
由于Python的time模块实现主要调用C库,所以各个平台可能有所不同。
UTC(Coordinated Universal Time,世界协调时)亦即格林威治天文时间,世界标准时间。
在中国为UTC+8。DST(Daylight Saving Time)即夏令时。
时间戳(timestamp)的方式:通常来说,时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。
我们运行“type(time.time())”,返回的是float类型。返回时间戳方式的函数主要有time(),clock()等
索引(Index) | 属性(Attribute) | 值(Values) |
---|---|---|
0 | tm_year(年) | 比如2011 |
1 | tm_mon(月) | 1 - 12 |
2 | tm_mday(日) | 1 - 31 |
3 | tm_hour(时) | 0 - 23 |
4 | tm_min(分) | 0 - 59 |
5 | tm_sec(秒) | 0 - 61 |
6 | tm_wday(weekday) | 0 - 6(0表示周日) |
7 | tm_yday(一年中的第几天) | 1 - 366 |
8 | tm_isdst(是否是夏令时) | 默认为-1 |
time模块的常用方法(函数):
1)time.localtime([secs]):将一个时间戳转换为当前时区的struct_time。secs参数未提供,则以当前时间为准。
>>> import time
>>> time.localtime()
time.struct_time(tm_year=2018, tm_mon=10, tm_mday=25, tm_hour=22, tm_min=57, tm_sec=42, tm_wday=3, tm_yday=298, tm_isdst=0)
2)time.time():返回当前时间的时间戳。
>>>import time
>>> time.time()
1540479500.1852782
3)time.gmtime([secs]):和localtime()方法类似,gmtime()方法是将一个时间戳转换为UTC时区(0时区)的struct_time。
>>>import time
>>> time.gmtime()
time.struct_time(tm_year=2018, tm_mon=10, tm_mday=25, tm_hour=14, tm_min=59, tm_sec=23, tm_wday=3, tm_yday=298, tm_isdst=0)
4)time.mktime(t):将一个struct_time(UTC+8)转化为时间戳。
>>>import time
>>> x=time.localtime()
>>> time.mktime(x)
1540479626.0
5)time.sleep(secs):线程推迟指定的时间运行。单位为秒。
import time
'''
运行程序,睡眠2秒后输出"Hello Python!"
'''
time.sleep(2)
print("Hello Python!")
6)time.asctime([t]):把一个表示时间的元组或者struct_time表示为这种形式:'Sun Jun 20 23:21:05 1993'。如果没有参数,将会将time.localtime()作为参数传入。
>>>import time
>>>x=time.localtime()
>>> time.asctime(x)
'Thu Oct 25 23:00:26 2018'
>>>
7)time.ctime([secs]):把一个时间戳(按秒计算的浮点数)转化为time.asctime()的形式。如果参数未给或者为None的时候,将会默认time.time()为参数。它的作用相当于time.asctime(time.localtime(secs))。
1 import time
2 >>> time.time()
3 1540459453.0845733
4 >>> time.ctime(time.time())
5 'Thu Oct 25 17:24:36 2018'
6 >>>
8)time.strftime(format[, t]):把一个代表时间的元组或者struct_time(如由time.localtime()和time.gmtime()返回)转化为格式化的时间字符串。
如果t未指定,将传入time.localtime()。如果元组中任何一个元素越界,ValueError的错误将会被抛出。
格式 | 含义 | 备注 |
---|---|---|
%a | 本地(locale)简化星期名称 | |
%A | 本地完整星期名称 | |
%b | 本地简化月份名称 | |
%B | 本地完整月份名称 | |
%c | 本地相应的日期和时间表示 | |
%d | 一个月中的第几天(01 - 31) | |
%H | 一天中的第几个小时(24小时制,00 - 23) | |
%I | 第几个小时(12小时制,01 - 12) | |
%j | 一年中的第几天(001 - 366) | |
%m | 月份(01 - 12) | |
%M | 分钟数(00 - 59) | |
%p | 本地am或者pm的相应符 | 一 |
%S | 秒(01 - 61) | 二 |
%U | 一年中的星期数。(00 - 53星期天是一个星期的开始。)第一个星期天之前的所有天数都放在第0周。 | 三 |
%w | 一个星期中的第几天(0 - 6,0是星期天) | 三 |
%W | 和%U基本相同,不同的是%W以星期一为一个星期的开始。 | |
%x | 本地相应日期 | |
%X | 本地相应时间 | |
%y | 去掉世纪的年份(00 - 99) | |
%Y | 完整的年份 | |
%Z | 时区的名字(如果不存在为空字符) | |
%% | ‘%’字符 |
备注:
- “%p”只有与“%I”配合使用才有效果。
- 文档中强调确实是0 - 61,而不是59,闰年秒占两秒(汗一个)。
- 当使用strptime()函数时,只有当在这年中的周数和天数被确定的时候%U和%W才会被计算。
1 import time
2
3 >>> time.strftime("%Y-%m-%d %A %H:%M:%S ")
4 '2018-10-25 Thursday 17:33:29 '
5
6 >>> time.strftime(" %A %H:%M:%S %Y-%m-%d ")
7 ' Thursday 17:35:09 2018-10-25 '
8 >>>
9)time.strptime(string[, format]):把一个格式化时间字符串转化为struct_time。实际上它和strftime()是逆操作。
import time
>>> time.strptime(' Thursday 17:35:09 2018-10-25',' %A %H:%M:%S %Y-%m-%d')
time.struct_time(tm_year=2018, tm_mon=10, tm_mday=25, tm_hour=17, tm_min=35, tm_sec=9, tm_wday=3, tm_yday=298, tm_isdst=-1)
>>>
10)time.clock():这个需要注意,在不同的系统上含义不同。在UNIX系统上,它返回的是“进程时间”,它是用秒表示的浮点数(时间戳)。
而在WINDOWS中,第一次调用,返回的是进程运行的实际时间。而第二次之后的调用是自第一次调用以后到现在的运行时间。
(实际上是以WIN32上QueryPerformanceCounter()为基础,它比毫秒表示更为精确)
>>>import time
>>> if __name__ =='__main__':
... time.sleep(1)
... print("clock1:%s"%time.clock())
... time.sleep(1)
... print("clock2:%s" % time.clock())
... time.sleep(1)
... print("clock3:%s" % time.clock())
...
clock1:2.5e-06
clock2:1.0002382
clock3:2.0004314
>>>
时间关系转换
datetime
>>>import datetime
#当前时间
>>> datetime.datetime.now()
datetime.datetime(2018, 10, 25, 14, 58, 9, 526923)
#当前时间为未来3天
>>> print ( datetime.datetime.now()+datetime.timedelta(3))
2018-10-28 14:59:58.085724
#当前时间为-3天
>>> print ( datetime.datetime.now()+datetime.timedelta(-3))
2018-10-22 15:01:00.604181
>>>
#当前时间+3小时
>>>print ( datetime.datetime.now()+datetime.timedelta(hours=3))
2018-10-25 18:02:36.695773
#当前时间+30分钟
>>> print ( datetime.datetime.now()+datetime.timedelta(minutes=30))
2018-10-25 15:33:21.053755
>>>
#时间替换
>>> c_time=datetime.datetime.now()
>>> print(c_time.replace(minute=3,hour=2))
2018-10-25 02:03:39.820451
>>>
datetime.date.today() 本地日期对象,(用str函数可得到它的字面表示(2014-03-24))
datetime.date.isoformat(obj) 当前[年-月-日]字符串表示(2014-03-24)
datetime.date.fromtimestamp() 返回一个日期对象,参数是时间戳,返回 [年-月-日]
datetime.date.weekday(obj) 返回一个日期对象的星期数,周一是0
datetime.date.isoweekday(obj) 返回一个日期对象的星期数,周一是1
datetime.date.isocalendar(obj) 把日期对象返回一个带有年月日的元组
datetime对象:
datetime.datetime.today() 返回一个包含本地时间(含微秒数)的datetime对象 2014-03-24 23:31:50.419000
datetime.datetime.now([tz]) 返回指定时区的datetime对象 2014-03-24 23:31:50.419000
datetime.datetime.utcnow() 返回一个零时区的datetime对象
datetime.fromtimestamp(timestamp[,tz]) 按时间戳返回一个datetime对象,可指定时区,可用于strftime转换为日期表示
datetime.utcfromtimestamp(timestamp) 按时间戳返回一个UTC-datetime对象
datetime.datetime.strptime(‘2014-03-16 12:21:21‘,”%Y-%m-%d %H:%M:%S”) 将字符串转为datetime对象
datetime.datetime.strftime(datetime.datetime.now(), ‘%Y%m%d %H%M%S‘) 将datetime对象转换为str表示形式
datetime.date.today().timetuple() 转换为时间戳datetime元组对象,可用于转换时间戳
datetime.datetime.now().timetuple()
time.mktime(timetupleobj) 将datetime元组对象转为时间戳
time.time() 当前时间戳
time.localtime
time.gmtime
(二)random模块
random.random()#用于生成一个0到1的随机符点数: 0 <= n < 1.0
>>> import random
>>> random.random()
0.8048731160537441
>>> random.random()
0.540423134210193
>>> random.random()
0.5877892352747521
>>>
>>>import random >>> random.randint(1,9) 3 >>> random.randint(1,9) 7 >>> random.randint(1,9) 5 >>> random.randint(1,9) 9
randrange([start], stop[, step]), # 从指定范围内,按指定基数递增的集合中 获取一个随机数。
如:random.randrange(10, 100, 2), # 结果相当于从[10, 12, 14, 16, ... 96, 98]序列中获取一个随机数。
>>>import random
>>> random.randrange(1,10,2)
1
>>> random.randrange(1,10,2)
7
>>> random.randrange(1,10,2)
9
>>> random.randrange(1,10,2)
5
>>> random.randrange(1,10,2)
1
>>> random.randrange(1,10,2)
9
>>> random.randrange(1,10,2)
3
random.choice(sequence)参数sequence表示一个有序类型。
从序列中获取一个随机元素
sequence
在python不是一种特定的类型,而是泛指一系列的类型。如 list, tuple, 字符串都属于sequence。
>>>import random
>>> random.choice("I Love You")
'o'
>>> random.choice("I Love You")
'Y'
>>> random.choice("I Love You")
'v'
>>> random.choice("I Love You")
' '
>>> random.choice("I Love You")
' '
>>> random.choice("I Love You")
'L'
>>> random.choice("I Love You")
' '
>>>
1 |
实际应用:
import random
import string
#随机整数:
print( random.randint(0,99)) #70
#随机选取0到100间的偶数:
print(random.randrange(0, 101, 2)) #4
#随机浮点数:
print( random.random()) #0.2746445568079129
print(random.uniform(1, 10)) #9.887001463194844
#随机字符:
print(random.choice('abcdefg&#%^*f')) #f
#多个字符中选取特定数量的字符:
print(random.sample('abcdefghij',3)) #['f', 'h', 'd']
#随机选取字符串:
print( random.choice ( ['apple', 'pear', 'peach', 'orange', 'lemon'] )) #apple
#洗牌#
items = [1,2,3,4,5,6,7]
print(items) #[1, 2, 3, 4, 5, 6, 7]
random.shuffle(items)
print(items) #[1, 4, 7, 2, 5, 3, 6]
生成随机验证码
import random
checkcode=''
for i in range(4):
current=random.randrange(0,4)
if i==current:
# 字母
tmp=chr(random.randint(65,122))
else:
#数字
tmp=random.randint(0,9)
checkcode+=str(tmp)
print(checkcode)
(三)os模块模块
提供对操作系统进行调用的接口
os.getcwd() 获取当前工作目录,即当前python脚本工作的目录路径
os.chdir("dirname") 改变当前脚本工作目录;相当于shell下cd
os.curdir 返回当前目录: ('.')
os.pardir 获取当前目录的父目录字符串名:('..')
os.makedirs('dirname1/dirname2') 可生成多层递归目录
os.removedirs('dirname1') 若目录为空,则删除,并递归到上一级目录,如若也为空,则删除,依此类推
os.mkdir('dirname') 生成单级目录;相当于shell中mkdir dirname
os.rmdir('dirname') 删除单级空目录,若目录不为空则无法删除,报错;相当于shell中rmdir dirname
os.listdir('dirname') 列出指定目录下的所有文件和子目录,包括隐藏文件,并以列表方式打印
os.remove() 删除一个文件
os.rename("oldname","newname") 重命名文件/目录
os.stat('path/filename') 获取文件/目录信息
os.sep 输出操作系统特定的路径分隔符,win下为"\\",Linux下为"/"
os.linesep 输出当前平台使用的行终止符,win下为"\t\n",Linux下为"\n"
os.pathsep 输出用于分割文件路径的字符串
os.name 输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command") 运行shell命令,直接显示
os.environ 获取系统环境变量
os.path.abspath(path) 返回path规范化的绝对路径
os.path.split(path) 将path分割成目录和文件名二元组返回
os.path.dirname(path) 返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path) 返回path最后的文件名。如何path以/或\结尾,那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path) 如果path存在,返回True;如果path不存在,返回False
os.path.isabs(path) 如果path是绝对路径,返回True
os.path.isfile(path) 如果path是一个存在的文件,返回True。否则返回False
os.path.isdir(path) 如果path是一个存在的目录,则返回True。否则返回False
os.path.join(path1[, path2[, ...]]) 将多个路径组合后返回,第一个绝对路径之前的参数将被忽略
os.path.getatime(path) 返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path) 返回path所指向的文件或者目录的最后修改时间
(四)sys模块
sys.argv 命令行参数List,第一个元素是程序本身路径
sys.exit(n) 退出程序,正常退出时exit(0)
sys.version 获取Python解释程序的版本信息
sys.maxint 最大的Int值
sys.path 返回模块的搜索路径,初始化时使用PYTHONPATH环境变量的值
sys.platform 返回操作系统平台名称
sys.stdout.write('please:')
val = sys.stdin.readline()[:-1]
(五)shutil模块
高级的 文件、文件夹、压缩包 处理模块
shutil.copyfileobj(fsrc, fdst[, length])
将文件内容拷贝到另一个文件中,可以部分内容

def copyfileobj(fsrc, fdst, length=16*1024): """copy data from file-like object fsrc to file-like object fdst""" while 1: buf = fsrc.read(length) if not buf: break fdst.write(buf)

import shutil f1 = open("程序员必逛的网站.txt",encoding='gbk') f2 = open("笔记本2",'w',encoding='utf-8') shutil.copyfileobj(f1,f2)
shutil.copyfile(src, dst)
拷贝文件

def copyfile(src, dst): """Copy data from src to dst""" if _samefile(src, dst): raise Error("`%s` and `%s` are the same file" % (src, dst)) for fn in [src, dst]: try: st = os.stat(fn) except OSError: # File most likely does not exist pass else: # XXX What about other special files? (sockets, devices...) if stat.S_ISFIFO(st.st_mode): raise SpecialFileError("`%s` is a named pipe" % fn) with open(src, 'rb') as fsrc: with open(dst, 'wb') as fdst: copyfileobj(fsrc, fdst)

import shutil shutil.copyfile('笔记本2','笔记本3')
shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变

def copymode(src, dst): """Copy mode bits from src to dst""" if hasattr(os, 'chmod'): st = os.stat(src) mode = stat.S_IMODE(st.st_mode) os.chmod(dst, mode)
shutil.copystat(src, dst)
拷贝状态的信息,包括:mode bits, atime, mtime, flags

def copystat(src, dst): """Copy all stat info (mode bits, atime, mtime, flags) from src to dst""" st = os.stat(src) mode = stat.S_IMODE(st.st_mode) if hasattr(os, 'utime'): os.utime(dst, (st.st_atime, st.st_mtime)) if hasattr(os, 'chmod'): os.chmod(dst, mode) if hasattr(os, 'chflags') and hasattr(st, 'st_flags'): try: os.chflags(dst, st.st_flags) except OSError, why: for err in 'EOPNOTSUPP', 'ENOTSUP': if hasattr(errno, err) and why.errno == getattr(errno, err): break else: raise
shutil.copy(src, dst)
拷贝文件和权限

def copy(src, dst): """Copy data and mode bits ("cp src dst"). The destination may be a directory. """ if os.path.isdir(dst): dst = os.path.join(dst, os.path.basename(src)) copyfile(src, dst) copymode(src, dst)
shutil.copy2(src, dst)
拷贝文件和状态信息

def copy2(src, dst): """Copy data and all stat info ("cp -p src dst"). The destination may be a directory. """ if os.path.isdir(dst): dst = os.path.join(dst, os.path.basename(src)) copyfile(src, dst) copystat(src, dst)
shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
递归的去拷贝文件

import shutil shutil.copytree('a','new_a')
shutil.rmtree(path[, ignore_errors[, onerror]])
递归的去删除文件

import shutil shutil.rmtree('new_a')
shutil.move(src, dst)
递归的去移动文件

def move(src, dst): """Recursively move a file or directory to another location. This is similar to the Unix "mv" command. If the destination is a directory or a symlink to a directory, the source is moved inside the directory. The destination path must not already exist. If the destination already exists but is not a directory, it may be overwritten depending on os.rename() semantics. If the destination is on our current filesystem, then rename() is used. Otherwise, src is copied to the destination and then removed. A lot more could be done here... A look at a mv.c shows a lot of the issues this implementation glosses over. """ real_dst = dst if os.path.isdir(dst): if _samefile(src, dst): # We might be on a case insensitive filesystem, # perform the rename anyway. os.rename(src, dst) return real_dst = os.path.join(dst, _basename(src)) if os.path.exists(real_dst): raise Error, "Destination path '%s' already exists" % real_dst try: os.rename(src, real_dst) except OSError: if os.path.isdir(src): if _destinsrc(src, dst): raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst) copytree(src, real_dst, symlinks=True) rmtree(src) else: copy2(src, real_dst) os.unlink(src)
shutil.make_archive(base_name, format,...)

import shutil shutil.make_archive('shutil_make_archive','zip','H:\Python3_study\jichu\day1')

1 def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0, 2 dry_run=0, owner=None, group=None, logger=None): 3 """Create an archive file (eg. zip or tar). 4 5 'base_name' is the name of the file to create, minus any format-specific 6 extension; 'format' is the archive format: one of "zip", "tar", "bztar" 7 or "gztar". 8 9 'root_dir' is a directory that will be the root directory of the 10 archive; ie. we typically chdir into 'root_dir' before creating the 11 archive. 'base_dir' is the directory where we start archiving from; 12 ie. 'base_dir' will be the common prefix of all files and 13 directories in the archive. 'root_dir' and 'base_dir' both default 14 to the current directory. Returns the name of the archive file. 15 16 'owner' and 'group' are used when creating a tar archive. By default, 17 uses the current owner and group. 18 """ 19 save_cwd = os.getcwd() 20 if root_dir is not None: 21 if logger is not None: 22 logger.debug("changing into '%s'", root_dir) 23 base_name = os.path.abspath(base_name) 24 if not dry_run: 25 os.chdir(root_dir) 26 27 if base_dir is None: 28 base_dir = os.curdir 29 30 kwargs = {'dry_run': dry_run, 'logger': logger} 31 32 try: 33 format_info = _ARCHIVE_FORMATS[format] 34 except KeyError: 35 raise ValueError, "unknown archive format '%s'" % format 36 37 func = format_info[0] 38 for arg, val in format_info[1]: 39 kwargs[arg] = val 40 41 if format != 'zip': 42 kwargs['owner'] = owner 43 kwargs['group'] = group 44 45 try: 46 filename = func(base_name, base_dir, **kwargs) 47 finally: 48 if root_dir is not None: 49 if logger is not None: 50 logger.debug("changing back to '%s'", save_cwd) 51 os.chdir(save_cwd) 52 53 return filename
创建压缩包并返回文件路径,例如:zip、tar
base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径,
如:www =>保存至当前路径
如:/Users/wupeiqi/www =>保存至/Users/wupeiqi/
format: 压缩包种类,“zip”, “tar”, “bztar”,“gztar”
root_dir: 要压缩的文件夹路径(默认当前目录)
owner: 用户,默认当前用户
group: 组,默认当前组
logger: 用于记录日志,通常是logging.Logger对象
shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的,详细:

import zipfile # 压缩 z = zipfile.ZipFile('laxi.zip', 'w') z.write('a.log') z.write('data.data') z.close() # 解压 z = zipfile.ZipFile('laxi.zip', 'r') z.extractall() z.close() zipfile 压缩解压

import tarfile # 压缩 tar = tarfile.open('your.tar','w') tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip') tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip') tar.close() # 解压 tar = tarfile.open('your.tar','r') tar.extractall() # 可设置解压地址 tar.close()

class ZipFile(object): """ Class with methods to open, read, write, close, list zip files. z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False) file: Either the path to the file, or a file-like object. If it is a path, the file will be opened and closed by ZipFile. mode: The mode can be either read "r", write "w" or append "a". compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib). allowZip64: if True ZipFile will create files with ZIP64 extensions when needed, otherwise it will raise an exception when this would be necessary. """ fp = None # Set here since __del__ checks it def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False): """Open the ZIP file with mode read "r", write "w" or append "a".""" if mode not in ("r", "w", "a"): raise RuntimeError('ZipFile() requires mode "r", "w", or "a"') if compression == ZIP_STORED: pass elif compression == ZIP_DEFLATED: if not zlib: raise RuntimeError,\ "Compression requires the (missing) zlib module" else: raise RuntimeError, "That compression method is not supported" self._allowZip64 = allowZip64 self._didModify = False self.debug = 0 # Level of printing: 0 through 3 self.NameToInfo = {} # Find file info given name self.filelist = [] # List of ZipInfo instances for archive self.compression = compression # Method of compression self.mode = key = mode.replace('b', '')[0] self.pwd = None self._comment = '' # Check if we were passed a file-like object if isinstance(file, basestring): self._filePassed = 0 self.filename = file modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'} try: self.fp = open(file, modeDict[mode]) except IOError: if mode == 'a': mode = key = 'w' self.fp = open(file, modeDict[mode]) else: raise else: self._filePassed = 1 self.fp = file self.filename = getattr(file, 'name', None) try: if key == 'r': self._RealGetContents() elif key == 'w': # set the modified flag so central directory gets written # even if no files are added to the archive self._didModify = True elif key == 'a': try: # See if file is a zip file self._RealGetContents() # seek to start of directory and overwrite self.fp.seek(self.start_dir, 0) except BadZipfile: # file is not a zip file, just append self.fp.seek(0, 2) # set the modified flag so central directory gets written # even if no files are added to the archive self._didModify = True else: raise RuntimeError('Mode must be "r", "w" or "a"') except: fp = self.fp self.fp = None if not self._filePassed: fp.close() raise def __enter__(self): return self def __exit__(self, type, value, traceback): self.close() def _RealGetContents(self): """Read in the table of contents for the ZIP file.""" fp = self.fp try: endrec = _EndRecData(fp) except IOError: raise BadZipfile("File is not a zip file") if not endrec: raise BadZipfile, "File is not a zip file" if self.debug > 1: print endrec size_cd = endrec[_ECD_SIZE] # bytes in central directory offset_cd = endrec[_ECD_OFFSET] # offset of central directory self._comment = endrec[_ECD_COMMENT] # archive comment # "concat" is zero, unless zip was concatenated to another file concat = endrec[_ECD_LOCATION] - size_cd - offset_cd if endrec[_ECD_SIGNATURE] == stringEndArchive64: # If Zip64 extension structures are present, account for them concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator) if self.debug > 2: inferred = concat + offset_cd print "given, inferred, offset", offset_cd, inferred, concat # self.start_dir: Position of start of central directory self.start_dir = offset_cd + concat fp.seek(self.start_dir, 0) data = fp.read(size_cd) fp = cStringIO.StringIO(data) total = 0 while total < size_cd: centdir = fp.read(sizeCentralDir) if len(centdir) != sizeCentralDir: raise BadZipfile("Truncated central directory") centdir = struct.unpack(structCentralDir, centdir) if centdir[_CD_SIGNATURE] != stringCentralDir: raise BadZipfile("Bad magic number for central directory") if self.debug > 2: print centdir filename = fp.read(centdir[_CD_FILENAME_LENGTH]) # Create ZipInfo instance to store file information x = ZipInfo(filename) x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH]) x.comment = fp.read(centdir[_CD_COMMENT_LENGTH]) x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET] (x.create_version, x.create_system, x.extract_version, x.reserved, x.flag_bits, x.compress_type, t, d, x.CRC, x.compress_size, x.file_size) = centdir[1:12] x.volume, x.internal_attr, x.external_attr = centdir[15:18] # Convert date/time code to (year, month, day, hour, min, sec) x._raw_time = t x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F, t>>11, (t>>5)&0x3F, (t&0x1F) * 2 ) x._decodeExtra() x.header_offset = x.header_offset + concat x.filename = x._decodeFilename() self.filelist.append(x) self.NameToInfo[x.filename] = x # update total bytes read from central directory total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH] + centdir[_CD_EXTRA_FIELD_LENGTH] + centdir[_CD_COMMENT_LENGTH]) if self.debug > 2: print "total", total def namelist(self): """Return a list of file names in the archive.""" l = [] for data in self.filelist: l.append(data.filename) return l def infolist(self): """Return a list of class ZipInfo instances for files in the archive.""" return self.filelist def printdir(self): """Print a table of contents for the zip file.""" print "%-46s %19s %12s" % ("File Name", "Modified ", "Size") for zinfo in self.filelist: date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6] print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size) def testzip(self): """Read all the files and check the CRC.""" chunk_size = 2 ** 20 for zinfo in self.filelist: try: # Read by chunks, to avoid an OverflowError or a # MemoryError with very large embedded files. with self.open(zinfo.filename, "r") as f: while f.read(chunk_size): # Check CRC-32 pass except BadZipfile: return zinfo.filename def getinfo(self, name): """Return the instance of ZipInfo given 'name'.""" info = self.NameToInfo.get(name) if info is None: raise KeyError( 'There is no item named %r in the archive' % name) return info def setpassword(self, pwd): """Set default password for encrypted files.""" self.pwd = pwd @property def comment(self): """The comment text associated with the ZIP file.""" return self._comment @comment.setter def comment(self, comment): # check for valid comment length if len(comment) > ZIP_MAX_COMMENT: import warnings warnings.warn('Archive comment is too long; truncating to %d bytes' % ZIP_MAX_COMMENT, stacklevel=2) comment = comment[:ZIP_MAX_COMMENT] self._comment = comment self._didModify = True def read(self, name, pwd=None): """Return file bytes (as a string) for name.""" return self.open(name, "r", pwd).read() def open(self, name, mode="r", pwd=None): """Return file-like object for 'name'.""" if mode not in ("r", "U", "rU"): raise RuntimeError, 'open() requires mode "r", "U", or "rU"' if not self.fp: raise RuntimeError, \ "Attempt to read ZIP archive that was already closed" # Only open a new file for instances where we were not # given a file object in the constructor if self._filePassed: zef_file = self.fp should_close = False else: zef_file = open(self.filename, 'rb') should_close = True try: # Make sure we have an info object if isinstance(name, ZipInfo): # 'name' is already an info object zinfo = name else: # Get info object for name zinfo = self.getinfo(name) zef_file.seek(zinfo.header_offset, 0) # Skip the file header: fheader = zef_file.read(sizeFileHeader) if len(fheader) != sizeFileHeader: raise BadZipfile("Truncated file header") fheader = struct.unpack(structFileHeader, fheader) if fheader[_FH_SIGNATURE] != stringFileHeader: raise BadZipfile("Bad magic number for file header") fname = zef_file.read(fheader[_FH_FILENAME_LENGTH]) if fheader[_FH_EXTRA_FIELD_LENGTH]: zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH]) if fname != zinfo.orig_filename: raise BadZipfile, \ 'File name in directory "%s" and header "%s" differ.' % ( zinfo.orig_filename, fname) # check for encrypted flag & handle password is_encrypted = zinfo.flag_bits & 0x1 zd = None if is_encrypted: if not pwd: pwd = self.pwd if not pwd: raise RuntimeError, "File %s is encrypted, " \ "password required for extraction" % name zd = _ZipDecrypter(pwd) # The first 12 bytes in the cypher stream is an encryption header # used to strengthen the algorithm. The first 11 bytes are # completely random, while the 12th contains the MSB of the CRC, # or the MSB of the file time depending on the header type # and is used to check the correctness of the password. bytes = zef_file.read(12) h = map(zd, bytes[0:12]) if zinfo.flag_bits & 0x8: # compare against the file type from extended local headers check_byte = (zinfo._raw_time >> 8) & 0xff else: # compare against the CRC otherwise check_byte = (zinfo.CRC >> 24) & 0xff if ord(h[11]) != check_byte: raise RuntimeError("Bad password for file", name) return ZipExtFile(zef_file, mode, zinfo, zd, close_fileobj=should_close) except: if should_close: zef_file.close() raise def extract(self, member, path=None, pwd=None): """Extract a member from the archive to the current working directory, using its full name. Its file information is extracted as accurately as possible. `member' may be a filename or a ZipInfo object. You can specify a different directory using `path'. """ if not isinstance(member, ZipInfo): member = self.getinfo(member) if path is None: path = os.getcwd() return self._extract_member(member, path, pwd) def extractall(self, path=None, members=None, pwd=None): """Extract all members from the archive to the current working directory. `path' specifies a different directory to extract to. `members' is optional and must be a subset of the list returned by namelist(). """ if members is None: members = self.namelist() for zipinfo in members: self.extract(zipinfo, path, pwd) def _extract_member(self, member, targetpath, pwd): """Extract the ZipInfo object 'member' to a physical file on the path targetpath. """ # build the destination pathname, replacing # forward slashes to platform specific separators. arcname = member.filename.replace('/', os.path.sep) if os.path.altsep: arcname = arcname.replace(os.path.altsep, os.path.sep) # interpret absolute pathname as relative, remove drive letter or # UNC path, redundant separators, "." and ".." components. arcname = os.path.splitdrive(arcname)[1] arcname = os.path.sep.join(x for x in arcname.split(os.path.sep) if x not in ('', os.path.curdir, os.path.pardir)) if os.path.sep == '\\': # filter illegal characters on Windows illegal = ':<>|"?*' if isinstance(arcname, unicode): table = {ord(c): ord('_') for c in illegal} else: table = string.maketrans(illegal, '_' * len(illegal)) arcname = arcname.translate(table) # remove trailing dots arcname = (x.rstrip('.') for x in arcname.split(os.path.sep)) arcname = os.path.sep.join(x for x in arcname if x) targetpath = os.path.join(targetpath, arcname) targetpath = os.path.normpath(targetpath) # Create all upper directories if necessary. upperdirs = os.path.dirname(targetpath) if upperdirs and not os.path.exists(upperdirs): os.makedirs(upperdirs) if member.filename[-1] == '/': if not os.path.isdir(targetpath): os.mkdir(targetpath) return targetpath with self.open(member, pwd=pwd) as source, \ file(targetpath, "wb") as target: shutil.copyfileobj(source, target) return targetpath def _writecheck(self, zinfo): """Check for errors before writing a file to the archive.""" if zinfo.filename in self.NameToInfo: import warnings warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3) if self.mode not in ("w", "a"): raise RuntimeError, 'write() requires mode "w" or "a"' if not self.fp: raise RuntimeError, \ "Attempt to write ZIP archive that was already closed" if zinfo.compress_type == ZIP_DEFLATED and not zlib: raise RuntimeError, \ "Compression requires the (missing) zlib module" if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED): raise RuntimeError, \ "That compression method is not supported" if not self._allowZip64: requires_zip64 = None if len(self.filelist) >= ZIP_FILECOUNT_LIMIT: requires_zip64 = "Files count" elif zinfo.file_size > ZIP64_LIMIT: requires_zip64 = "Filesize" elif zinfo.header_offset > ZIP64_LIMIT: requires_zip64 = "Zipfile size" if requires_zip64: raise LargeZipFile(requires_zip64 + " would require ZIP64 extensions") def write(self, filename, arcname=None, compress_type=None): """Put the bytes from filename into the archive under the name arcname.""" if not self.fp: raise RuntimeError( "Attempt to write to ZIP archive that was already closed") st = os.stat(filename) isdir = stat.S_ISDIR(st.st_mode) mtime = time.localtime(st.st_mtime) date_time = mtime[0:6] # Create ZipInfo instance to store file information if arcname is None: arcname = filename arcname = os.path.normpath(os.path.splitdrive(arcname)[1]) while arcname[0] in (os.sep, os.altsep): arcname = arcname[1:] if isdir: arcname += '/' zinfo = ZipInfo(arcname, date_time) zinfo.external_attr = (st[0] & 0xFFFF) << 16L # Unix attributes if compress_type is None: zinfo.compress_type = self.compression else: zinfo.compress_type = compress_type zinfo.file_size = st.st_size zinfo.flag_bits = 0x00 zinfo.header_offset = self.fp.tell() # Start of header bytes self._writecheck(zinfo) self._didModify = True if isdir: zinfo.file_size = 0 zinfo.compress_size = 0 zinfo.CRC = 0 zinfo.external_attr |= 0x10 # MS-DOS directory flag self.filelist.append(zinfo) self.NameToInfo[zinfo.filename] = zinfo self.fp.write(zinfo.FileHeader(False)) return with open(filename, "rb") as fp: # Must overwrite CRC and sizes with correct data later zinfo.CRC = CRC = 0 zinfo.compress_size = compress_size = 0 # Compressed size can be larger than uncompressed size zip64 = self._allowZip64 and \ zinfo.file_size * 1.05 > ZIP64_LIMIT self.fp.write(zinfo.FileHeader(zip64)) if zinfo.compress_type == ZIP_DEFLATED: cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15) else: cmpr = None file_size = 0 while 1: buf = fp.read(1024 * 8) if not buf: break file_size = file_size + len(buf) CRC = crc32(buf, CRC) & 0xffffffff if cmpr: buf = cmpr.compress(buf) compress_size = compress_size + len(buf) self.fp.write(buf) if cmpr: buf = cmpr.flush() compress_size = compress_size + len(buf) self.fp.write(buf) zinfo.compress_size = compress_size else: zinfo.compress_size = file_size zinfo.CRC = CRC zinfo.file_size = file_size if not zip64 and self._allowZip64: if file_size > ZIP64_LIMIT: raise RuntimeError('File size has increased during compressing') if compress_size > ZIP64_LIMIT: raise RuntimeError('Compressed size larger than uncompressed size') # Seek backwards and write file header (which will now include # correct CRC and file sizes) position = self.fp.tell() # Preserve current position in file self.fp.seek(zinfo.header_offset, 0) self.fp.write(zinfo.FileHeader(zip64)) self.fp.seek(position, 0) self.filelist.append(zinfo) self.NameToInfo[zinfo.filename] = zinfo def writestr(self, zinfo_or_arcname, bytes, compress_type=None): """Write a file into the archive. The contents is the string 'bytes'. 'zinfo_or_arcname' is either a ZipInfo instance or the name of the file in the archive.""" if not isinstance(zinfo_or_arcname, ZipInfo): zinfo = ZipInfo(filename=zinfo_or_arcname, date_time=time.localtime(time.time())[:6]) zinfo.compress_type = self.compression if zinfo.filename[-1] == '/': zinfo.external_attr = 0o40775 << 16 # drwxrwxr-x zinfo.external_attr |= 0x10 # MS-DOS directory flag else: zinfo.external_attr = 0o600 << 16 # ?rw------- else: zinfo = zinfo_or_arcname if not self.fp: raise RuntimeError( "Attempt to write to ZIP archive that was already closed") if compress_type is not None: zinfo.compress_type = compress_type zinfo.file_size = len(bytes) # Uncompressed size zinfo.header_offset = self.fp.tell() # Start of header bytes self._writecheck(zinfo) self._didModify = True zinfo.CRC = crc32(bytes) & 0xffffffff # CRC-32 checksum if zinfo.compress_type == ZIP_DEFLATED: co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15) bytes = co.compress(bytes) + co.flush() zinfo.compress_size = len(bytes) # Compressed size else: zinfo.compress_size = zinfo.file_size zip64 = zinfo.file_size > ZIP64_LIMIT or \ zinfo.compress_size > ZIP64_LIMIT if zip64 and not self._allowZip64: raise LargeZipFile("Filesize would require ZIP64 extensions") self.fp.write(zinfo.FileHeader(zip64)) self.fp.write(bytes) if zinfo.flag_bits & 0x08: # Write CRC and file sizes after the file data fmt = '<LQQ' if zip64 else '<LLL' self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size, zinfo.file_size)) self.fp.flush() self.filelist.append(zinfo) self.NameToInfo[zinfo.filename] = zinfo def __del__(self): """Call the "close()" method in case the user forgot.""" self.close() def close(self): """Close the file, and for mode "w" and "a" write the ending records.""" if self.fp is None: return try: if self.mode in ("w", "a") and self._didModify: # write ending records pos1 = self.fp.tell() for zinfo in self.filelist: # write central directory dt = zinfo.date_time dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2] dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2) extra = [] if zinfo.file_size > ZIP64_LIMIT \ or zinfo.compress_size > ZIP64_LIMIT: extra.append(zinfo.file_size) extra.append(zinfo.compress_size) file_size = 0xffffffff compress_size = 0xffffffff else: file_size = zinfo.file_size compress_size = zinfo.compress_size if zinfo.header_offset > ZIP64_LIMIT: extra.append(zinfo.header_offset) header_offset = 0xffffffffL else: header_offset = zinfo.header_offset extra_data = zinfo.extra if extra: # Append a ZIP64 field to the extra's extra_data = struct.pack( '<HH' + 'Q'*len(extra), 1, 8*len(extra), *extra) + extra_data extract_version = max(45, zinfo.extract_version) create_version = max(45, zinfo.create_version) else: extract_version = zinfo.extract_version create_version = zinfo.create_version try: filename, flag_bits = zinfo._encodeFilenameFlags() centdir = struct.pack(structCentralDir, stringCentralDir, create_version, zinfo.create_system, extract_version, zinfo.reserved, flag_bits, zinfo.compress_type, dostime, dosdate, zinfo.CRC, compress_size, file_size, len(filename), len(extra_data), len(zinfo.comment), 0, zinfo.internal_attr, zinfo.external_attr, header_offset) except DeprecationWarning: print >>sys.stderr, (structCentralDir, stringCentralDir, create_version, zinfo.create_system, extract_version, zinfo.reserved, zinfo.flag_bits, zinfo.compress_type, dostime, dosdate, zinfo.CRC, compress_size, file_size, len(zinfo.filename), len(extra_data), len(zinfo.comment), 0, zinfo.internal_attr, zinfo.external_attr, header_offset) raise self.fp.write(centdir) self.fp.write(filename) self.fp.write(extra_data) self.fp.write(zinfo.comment) pos2 = self.fp.tell() # Write end-of-zip-archive record centDirCount = len(self.filelist) centDirSize = pos2 - pos1 centDirOffset = pos1 requires_zip64 = None if centDirCount > ZIP_FILECOUNT_LIMIT: requires_zip64 = "Files count" elif centDirOffset > ZIP64_LIMIT: requires_zip64 = "Central directory offset" elif centDirSize > ZIP64_LIMIT: requires_zip64 = "Central directory size" if requires_zip64: # Need to write the ZIP64 end-of-archive records if not self._allowZip64: raise LargeZipFile(requires_zip64 + " would require ZIP64 extensions") zip64endrec = struct.pack( structEndArchive64, stringEndArchive64, 44, 45, 45, 0, 0, centDirCount, centDirCount, centDirSize, centDirOffset) self.fp.write(zip64endrec) zip64locrec = struct.pack( structEndArchive64Locator, stringEndArchive64Locator, 0, pos2, 1) self.fp.write(zip64locrec) centDirCount = min(centDirCount, 0xFFFF) centDirSize = min(centDirSize, 0xFFFFFFFF) centDirOffset = min(centDirOffset, 0xFFFFFFFF) endrec = struct.pack(structEndArchive, stringEndArchive, 0, 0, centDirCount, centDirCount, centDirSize, centDirOffset, len(self._comment)) self.fp.write(endrec) self.fp.write(self._comment) self.fp.flush() finally: fp = self.fp self.fp = None if not self._filePassed: fp.close() ZipFile

1 class TarFile(object): 2 """The TarFile Class provides an interface to tar archives. 3 """ 4 5 debug = 0 # May be set from 0 (no msgs) to 3 (all msgs) 6 7 dereference = False # If true, add content of linked file to the 8 # tar file, else the link. 9 10 ignore_zeros = False # If true, skips empty or invalid blocks and 11 # continues processing. 12 13 errorlevel = 1 # If 0, fatal errors only appear in debug 14 # messages (if debug >= 0). If > 0, errors 15 # are passed to the caller as exceptions. 16 17 format = DEFAULT_FORMAT # The format to use when creating an archive. 18 19 encoding = ENCODING # Encoding for 8-bit character strings. 20 21 errors = None # Error handler for unicode conversion. 22 23 tarinfo = TarInfo # The default TarInfo class to use. 24 25 fileobject = ExFileObject # The default ExFileObject class to use. 26 27 def __init__(self, name=None, mode="r", fileobj=None, format=None, 28 tarinfo=None, dereference=None, ignore_zeros=None, encoding=None, 29 errors=None, pax_headers=None, debug=None, errorlevel=None): 30 """Open an (uncompressed) tar archive `name'. `mode' is either 'r' to 31 read from an existing archive, 'a' to append data to an existing 32 file or 'w' to create a new file overwriting an existing one. `mode' 33 defaults to 'r'. 34 If `fileobj' is given, it is used for reading or writing data. If it 35 can be determined, `mode' is overridden by `fileobj's mode. 36 `fileobj' is not closed, when TarFile is closed. 37 """ 38 modes = {"r": "rb", "a": "r+b", "w": "wb"} 39 if mode not in modes: 40 raise ValueError("mode must be 'r', 'a' or 'w'") 41 self.mode = mode 42 self._mode = modes[mode] 43 44 if not fileobj: 45 if self.mode == "a" and not os.path.exists(name): 46 # Create nonexistent files in append mode. 47 self.mode = "w" 48 self._mode = "wb" 49 fileobj = bltn_open(name, self._mode) 50 self._extfileobj = False 51 else: 52 if name is None and hasattr(fileobj, "name"): 53 name = fileobj.name 54 if hasattr(fileobj, "mode"): 55 self._mode = fileobj.mode 56 self._extfileobj = True 57 self.name = os.path.abspath(name) if name else None 58 self.fileobj = fileobj 59 60 # Init attributes. 61 if format is not None: 62 self.format = format 63 if tarinfo is not None: 64 self.tarinfo = tarinfo 65 if dereference is not None: 66 self.dereference = dereference 67 if ignore_zeros is not None: 68 self.ignore_zeros = ignore_zeros 69 if encoding is not None: 70 self.encoding = encoding 71 72 if errors is not None: 73 self.errors = errors 74 elif mode == "r": 75 self.errors = "utf-8" 76 else: 77 self.errors = "strict" 78 79 if pax_headers is not None and self.format == PAX_FORMAT: 80 self.pax_headers = pax_headers 81 else: 82 self.pax_headers = {} 83 84 if debug is not None: 85 self.debug = debug 86 if errorlevel is not None: 87 self.errorlevel = errorlevel 88 89 # Init datastructures. 90 self.closed = False 91 self.members = [] # list of members as TarInfo objects 92 self._loaded = False # flag if all members have been read 93 self.offset = self.fileobj.tell() 94 # current position in the archive file 95 self.inodes = {} # dictionary caching the inodes of 96 # archive members already added 97 98 try: 99 if self.mode == "r": 100 self.firstmember = None 101 self.firstmember = self.next() 102 103 if self.mode == "a": 104 # Move to the end of the archive, 105 # before the first empty block. 106 while True: 107 self.fileobj.seek(self.offset) 108 try: 109 tarinfo = self.tarinfo.fromtarfile(self) 110 self.members.append(tarinfo) 111 except EOFHeaderError: 112 self.fileobj.seek(self.offset) 113 break 114 except HeaderError, e: 115 raise ReadError(str(e)) 116 117 if self.mode in "aw": 118 self._loaded = True 119 120 if self.pax_headers: 121 buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy()) 122 self.fileobj.write(buf) 123 self.offset += len(buf) 124 except: 125 if not self._extfileobj: 126 self.fileobj.close() 127 self.closed = True 128 raise 129 130 def _getposix(self): 131 return self.format == USTAR_FORMAT 132 def _setposix(self, value): 133 import warnings 134 warnings.warn("use the format attribute instead", DeprecationWarning, 135 2) 136 if value: 137 self.format = USTAR_FORMAT 138 else: 139 self.format = GNU_FORMAT 140 posix = property(_getposix, _setposix) 141 142 #-------------------------------------------------------------------------- 143 # Below are the classmethods which act as alternate constructors to the 144 # TarFile class. The open() method is the only one that is needed for 145 # public use; it is the "super"-constructor and is able to select an 146 # adequate "sub"-constructor for a particular compression using the mapping 147 # from OPEN_METH. 148 # 149 # This concept allows one to subclass TarFile without losing the comfort of 150 # the super-constructor. A sub-constructor is registered and made available 151 # by adding it to the mapping in OPEN_METH. 152 153 @classmethod 154 def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs): 155 """Open a tar archive for reading, writing or appending. Return 156 an appropriate TarFile class. 157 158 mode: 159 'r' or 'r:*' open for reading with transparent compression 160 'r:' open for reading exclusively uncompressed 161 'r:gz' open for reading with gzip compression 162 'r:bz2' open for reading with bzip2 compression 163 'a' or 'a:' open for appending, creating the file if necessary 164 'w' or 'w:' open for writing without compression 165 'w:gz' open for writing with gzip compression 166 'w:bz2' open for writing with bzip2 compression 167 168 'r|*' open a stream of tar blocks with transparent compression 169 'r|' open an uncompressed stream of tar blocks for reading 170 'r|gz' open a gzip compressed stream of tar blocks 171 'r|bz2' open a bzip2 compressed stream of tar blocks 172 'w|' open an uncompressed stream for writing 173 'w|gz' open a gzip compressed stream for writing 174 'w|bz2' open a bzip2 compressed stream for writing 175 """ 176 177 if not name and not fileobj: 178 raise ValueError("nothing to open") 179 180 if mode in ("r", "r:*"): 181 # Find out which *open() is appropriate for opening the file. 182 for comptype in cls.OPEN_METH: 183 func = getattr(cls, cls.OPEN_METH[comptype]) 184 if fileobj is not None: 185 saved_pos = fileobj.tell() 186 try: 187 return func(name, "r", fileobj, **kwargs) 188 except (ReadError, CompressionError), e: 189 if fileobj is not None: 190 fileobj.seek(saved_pos) 191 continue 192 raise ReadError("file could not be opened successfully") 193 194 elif ":" in mode: 195 filemode, comptype = mode.split(":", 1) 196 filemode = filemode or "r" 197 comptype = comptype or "tar" 198 199 # Select the *open() function according to 200 # given compression. 201 if comptype in cls.OPEN_METH: 202 func = getattr(cls, cls.OPEN_METH[comptype]) 203 else: 204 raise CompressionError("unknown compression type %r" % comptype) 205 return func(name, filemode, fileobj, **kwargs) 206 207 elif "|" in mode: 208 filemode, comptype = mode.split("|", 1) 209 filemode = filemode or "r" 210 comptype = comptype or "tar" 211 212 if filemode not in ("r", "w"): 213 raise ValueError("mode must be 'r' or 'w'") 214 215 stream = _Stream(name, filemode, comptype, fileobj, bufsize) 216 try: 217 t = cls(name, filemode, stream, **kwargs) 218 except: 219 stream.close() 220 raise 221 t._extfileobj = False 222 return t 223 224 elif mode in ("a", "w"): 225 return cls.taropen(name, mode, fileobj, **kwargs) 226 227 raise ValueError("undiscernible mode") 228 229 @classmethod 230 def taropen(cls, name, mode="r", fileobj=None, **kwargs): 231 """Open uncompressed tar archive name for reading or writing. 232 """ 233 if mode not in ("r", "a", "w"): 234 raise ValueError("mode must be 'r', 'a' or 'w'") 235 return cls(name, mode, fileobj, **kwargs) 236 237 @classmethod 238 def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs): 239 """Open gzip compressed tar archive name for reading or writing. 240 Appending is not allowed. 241 """ 242 if mode not in ("r", "w"): 243 raise ValueError("mode must be 'r' or 'w'") 244 245 try: 246 import gzip 247 gzip.GzipFile 248 except (ImportError, AttributeError): 249 raise CompressionError("gzip module is not available") 250 251 try: 252 fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj) 253 except OSError: 254 if fileobj is not None and mode == 'r': 255 raise ReadError("not a gzip file") 256 raise 257 258 try: 259 t = cls.taropen(name, mode, fileobj, **kwargs) 260 except IOError: 261 fileobj.close() 262 if mode == 'r': 263 raise ReadError("not a gzip file") 264 raise 265 except: 266 fileobj.close() 267 raise 268 t._extfileobj = False 269 return t 270 271 @classmethod 272 def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs): 273 """Open bzip2 compressed tar archive name for reading or writing. 274 Appending is not allowed. 275 """ 276 if mode not in ("r", "w"): 277 raise ValueError("mode must be 'r' or 'w'.") 278 279 try: 280 import bz2 281 except ImportError: 282 raise CompressionError("bz2 module is not available") 283 284 if fileobj is not None: 285 fileobj = _BZ2Proxy(fileobj, mode) 286 else: 287 fileobj = bz2.BZ2File(name, mode, compresslevel=compresslevel) 288 289 try: 290 t = cls.taropen(name, mode, fileobj, **kwargs) 291 except (IOError, EOFError): 292 fileobj.close() 293 if mode == 'r': 294 raise ReadError("not a bzip2 file") 295 raise 296 except: 297 fileobj.close() 298 raise 299 t._extfileobj = False 300 return t 301 302 # All *open() methods are registered here. 303 OPEN_METH = { 304 "tar": "taropen", # uncompressed tar 305 "gz": "gzopen", # gzip compressed tar 306 "bz2": "bz2open" # bzip2 compressed tar 307 } 308 309 #-------------------------------------------------------------------------- 310 # The public methods which TarFile provides: 311 312 def close(self): 313 """Close the TarFile. In write-mode, two finishing zero blocks are 314 appended to the archive. 315 """ 316 if self.closed: 317 return 318 319 if self.mode in "aw": 320 self.fileobj.write(NUL * (BLOCKSIZE * 2)) 321 self.offset += (BLOCKSIZE * 2) 322 # fill up the end with zero-blocks 323 # (like option -b20 for tar does) 324 blocks, remainder = divmod(self.offset, RECORDSIZE) 325 if remainder > 0: 326 self.fileobj.write(NUL * (RECORDSIZE - remainder)) 327 328 if not self._extfileobj: 329 self.fileobj.close() 330 self.closed = True 331 332 def getmember(self, name): 333 """Return a TarInfo object for member `name'. If `name' can not be 334 found in the archive, KeyError is raised. If a member occurs more 335 than once in the archive, its last occurrence is assumed to be the 336 most up-to-date version. 337 """ 338 tarinfo = self._getmember(name) 339 if tarinfo is None: 340 raise KeyError("filename %r not found" % name) 341 return tarinfo 342 343 def getmembers(self): 344 """Return the members of the archive as a list of TarInfo objects. The 345 list has the same order as the members in the archive. 346 """ 347 self._check() 348 if not self._loaded: # if we want to obtain a list of 349 self._load() # all members, we first have to 350 # scan the whole archive. 351 return self.members 352 353 def getnames(self): 354 """Return the members of the archive as a list of their names. It has 355 the same order as the list returned by getmembers(). 356 """ 357 return [tarinfo.name for tarinfo in self.getmembers()] 358 359 def gettarinfo(self, name=None, arcname=None, fileobj=None): 360 """Create a TarInfo object for either the file `name' or the file 361 object `fileobj' (using os.fstat on its file descriptor). You can 362 modify some of the TarInfo's attributes before you add it using 363 addfile(). If given, `arcname' specifies an alternative name for the 364 file in the archive. 365 """ 366 self._check("aw") 367 368 # When fileobj is given, replace name by 369 # fileobj's real name. 370 if fileobj is not None: 371 name = fileobj.name 372 373 # Building the name of the member in the archive. 374 # Backward slashes are converted to forward slashes, 375 # Absolute paths are turned to relative paths. 376 if arcname is None: 377 arcname = name 378 drv, arcname = os.path.splitdrive(arcname) 379 arcname = arcname.replace(os.sep, "/") 380 arcname = arcname.lstrip("/") 381 382 # Now, fill the TarInfo object with 383 # information specific for the file. 384 tarinfo = self.tarinfo() 385 tarinfo.tarfile = self 386 387 # Use os.stat or os.lstat, depending on platform 388 # and if symlinks shall be resolved. 389 if fileobj is None: 390 if hasattr(os, "lstat") and not self.dereference: 391 statres = os.lstat(name) 392 else: 393 statres = os.stat(name) 394 else: 395 statres = os.fstat(fileobj.fileno()) 396 linkname = "" 397 398 stmd = statres.st_mode 399 if stat.S_ISREG(stmd): 400 inode = (statres.st_ino, statres.st_dev) 401 if not self.dereference and statres.st_nlink > 1 and \ 402 inode in self.inodes and arcname != self.inodes[inode]: 403 # Is it a hardlink to an already 404 # archived file? 405 type = LNKTYPE 406 linkname = self.inodes[inode] 407 else: 408 # The inode is added only if its valid. 409 # For win32 it is always 0. 410 type = REGTYPE 411 if inode[0]: 412 self.inodes[inode] = arcname 413 elif stat.S_ISDIR(stmd): 414 type = DIRTYPE 415 elif stat.S_ISFIFO(stmd): 416 type = FIFOTYPE 417 elif stat.S_ISLNK(stmd): 418 type = SYMTYPE 419 linkname = os.readlink(name) 420 elif stat.S_ISCHR(stmd): 421 type = CHRTYPE 422 elif stat.S_ISBLK(stmd): 423 type = BLKTYPE 424 else: 425 return None 426 427 # Fill the TarInfo object with all 428 # information we can get. 429 tarinfo.name = arcname 430 tarinfo.mode = stmd 431 tarinfo.uid = statres.st_uid 432 tarinfo.gid = statres.st_gid 433 if type == REGTYPE: 434 tarinfo.size = statres.st_size 435 else: 436 tarinfo.size = 0L 437 tarinfo.mtime = statres.st_mtime 438 tarinfo.type = type 439 tarinfo.linkname = linkname 440 if pwd: 441 try: 442 tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0] 443 except KeyError: 444 pass 445 if grp: 446 try: 447 tarinfo.gname = grp.getgrgid(tarinfo.gid)[0] 448 except KeyError: 449 pass 450 451 if type in (CHRTYPE, BLKTYPE): 452 if hasattr(os, "major") and hasattr(os, "minor"): 453 tarinfo.devmajor = os.major(statres.st_rdev) 454 tarinfo.devminor = os.minor(statres.st_rdev) 455 return tarinfo 456 457 def list(self, verbose=True): 458 """Print a table of contents to sys.stdout. If `verbose' is False, only 459 the names of the members are printed. If it is True, an `ls -l'-like 460 output is produced. 461 """ 462 self._check() 463 464 for tarinfo in self: 465 if verbose: 466 print filemode(tarinfo.mode), 467 print "%s/%s" % (tarinfo.uname or tarinfo.uid, 468 tarinfo.gname or tarinfo.gid), 469 if tarinfo.ischr() or tarinfo.isblk(): 470 print "%10s" % ("%d,%d" \ 471 % (tarinfo.devmajor, tarinfo.devminor)), 472 else: 473 print "%10d" % tarinfo.size, 474 print "%d-%02d-%02d %02d:%02d:%02d" \ 475 % time.localtime(tarinfo.mtime)[:6], 476 477 print tarinfo.name + ("/" if tarinfo.isdir() else ""), 478 479 if verbose: 480 if tarinfo.issym(): 481 print "->", tarinfo.linkname, 482 if tarinfo.islnk(): 483 print "link to", tarinfo.linkname, 484 print 485 486 def add(self, name, arcname=None, recursive=True, exclude=None, filter=None): 487 """Add the file `name' to the archive. `name' may be any type of file 488 (directory, fifo, symbolic link, etc.). If given, `arcname' 489 specifies an alternative name for the file in the archive. 490 Directories are added recursively by default. This can be avoided by 491 setting `recursive' to False. `exclude' is a function that should 492 return True for each filename to be excluded. `filter' is a function 493 that expects a TarInfo object argument and returns the changed 494 TarInfo object, if it returns None the TarInfo object will be 495 excluded from the archive. 496 """ 497 self._check("aw") 498 499 if arcname is None: 500 arcname = name 501 502 # Exclude pathnames. 503 if exclude is not None: 504 import warnings 505 warnings.warn("use the filter argument instead", 506 DeprecationWarning, 2) 507 if exclude(name): 508 self._dbg(2, "tarfile: Excluded %r" % name) 509 return 510 511 # Skip if somebody tries to archive the archive... 512 if self.name is not None and os.path.abspath(name) == self.name: 513 self._dbg(2, "tarfile: Skipped %r" % name) 514 return 515 516 self._dbg(1, name) 517 518 # Create a TarInfo object from the file. 519 tarinfo = self.gettarinfo(name, arcname) 520 521 if tarinfo is None: 522 self._dbg(1, "tarfile: Unsupported type %r" % name) 523 return 524 525 # Change or exclude the TarInfo object. 526 if filter is not None: 527 tarinfo = filter(tarinfo) 528 if tarinfo is None: 529 self._dbg(2, "tarfile: Excluded %r" % name) 530 return 531 532 # Append the tar header and data to the archive. 533 if tarinfo.isreg(): 534 with bltn_open(name, "rb") as f: 535 self.addfile(tarinfo, f) 536 537 elif tarinfo.isdir(): 538 self.addfile(tarinfo) 539 if recursive: 540 for f in os.listdir(name): 541 self.add(os.path.join(name, f), os.path.join(arcname, f), 542 recursive, exclude, filter) 543 544 else: 545 self.addfile(tarinfo) 546 547 def addfile(self, tarinfo, fileobj=None): 548 """Add the TarInfo object `tarinfo' to the archive. If `fileobj' is 549 given, tarinfo.size bytes are read from it and added to the archive. 550 You can create TarInfo objects using gettarinfo(). 551 On Windows platforms, `fileobj' should always be opened with mode 552 'rb' to avoid irritation about the file size. 553 """ 554 self._check("aw") 555 556 tarinfo = copy.copy(tarinfo) 557 558 buf = tarinfo.tobuf(self.format, self.encoding, self.errors) 559 self.fileobj.write(buf) 560 self.offset += len(buf) 561 562 # If there's data to follow, append it. 563 if fileobj is not None: 564 copyfileobj(fileobj, self.fileobj, tarinfo.size) 565 blocks, remainder = divmod(tarinfo.size, BLOCKSIZE) 566 if remainder > 0: 567 self.fileobj.write(NUL * (BLOCKSIZE - remainder)) 568 blocks += 1 569 self.offset += blocks * BLOCKSIZE 570 571 self.members.append(tarinfo) 572 573 def extractall(self, path=".", members=None): 574 """Extract all members from the archive to the current working 575 directory and set owner, modification time and permissions on 576 directories afterwards. `path' specifies a different directory 577 to extract to. `members' is optional and must be a subset of the 578 list returned by getmembers(). 579 """ 580 directories = [] 581 582 if members is None: 583 members = self 584 585 for tarinfo in members: 586 if tarinfo.isdir(): 587 # Extract directories with a safe mode. 588 directories.append(tarinfo) 589 tarinfo = copy.copy(tarinfo) 590 tarinfo.mode = 0700 591 self.extract(tarinfo, path) 592 593 # Reverse sort directories. 594 directories.sort(key=operator.attrgetter('name')) 595 directories.reverse() 596 597 # Set correct owner, mtime and filemode on directories. 598 for tarinfo in directories: 599 dirpath = os.path.join(path, tarinfo.name) 600 try: 601 self.chown(tarinfo, dirpath) 602 self.utime(tarinfo, dirpath) 603 self.chmod(tarinfo, dirpath) 604 except ExtractError, e: 605 if self.errorlevel > 1: 606 raise 607 else: 608 self._dbg(1, "tarfile: %s" % e) 609 610 def extract(self, member, path=""): 611 """Extract a member from the archive to the current working directory, 612 using its full name. Its file information is extracted as accurately 613 as possible. `member' may be a filename or a TarInfo object. You can 614 specify a different directory using `path'. 615 """ 616 self._check("r") 617 618 if isinstance(member, basestring): 619 tarinfo = self.getmember(member) 620 else: 621 tarinfo = member 622 623 # Prepare the link target for makelink(). 624 if tarinfo.islnk(): 625 tarinfo._link_target = os.path.join(path, tarinfo.linkname) 626 627 try: 628 self._extract_member(tarinfo, os.path.join(path, tarinfo.name)) 629 except EnvironmentError, e: 630 if self.errorlevel > 0: 631 raise 632 else: 633 if e.filename is None: 634 self._dbg(1, "tarfile: %s" % e.strerror) 635 else: 636 self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename)) 637 except ExtractError, e: 638 if self.errorlevel > 1: 639 raise 640 else: 641 self._dbg(1, "tarfile: %s" % e) 642 643 def extractfile(self, member): 644 """Extract a member from the archive as a file object. `member' may be 645 a filename or a TarInfo object. If `member' is a regular file, a 646 file-like object is returned. If `member' is a link, a file-like 647 object is constructed from the link's target. If `member' is none of 648 the above, None is returned. 649 The file-like object is read-only and provides the following 650 methods: read(), readline(), readlines(), seek() and tell() 651 """ 652 self._check("r") 653 654 if isinstance(member, basestring): 655 tarinfo = self.getmember(member) 656 else: 657 tarinfo = member 658 659 if tarinfo.isreg(): 660 return self.fileobject(self, tarinfo) 661 662 elif tarinfo.type not in SUPPORTED_TYPES: 663 # If a member's type is unknown, it is treated as a 664 # regular file. 665 return self.fileobject(self, tarinfo) 666 667 elif tarinfo.islnk() or tarinfo.issym(): 668 if isinstance(self.fileobj, _Stream): 669 # A small but ugly workaround for the case that someone tries 670 # to extract a (sym)link as a file-object from a non-seekable 671 # stream of tar blocks. 672 raise StreamError("cannot extract (sym)link as file object") 673 else: 674 # A (sym)link's file object is its target's file object. 675 return self.extractfile(self._find_link_target(tarinfo)) 676 else: 677 # If there's no data associated with the member (directory, chrdev, 678 # blkdev, etc.), return None instead of a file object. 679 return None 680 681 def _extract_member(self, tarinfo, targetpath): 682 """Extract the TarInfo object tarinfo to a physical 683 file called targetpath. 684 """ 685 # Fetch the TarInfo object for the given name 686 # and build the destination pathname, replacing 687 # forward slashes to platform specific separators. 688 targetpath = targetpath.rstrip("/") 689 targetpath = targetpath.replace("/", os.sep) 690 691 # Create all upper directories. 692 upperdirs = os.path.dirname(targetpath) 693 if upperdirs and not os.path.exists(upperdirs): 694 # Create directories that are not part of the archive with 695 # default permissions. 696 os.makedirs(upperdirs) 697 698 if tarinfo.islnk() or tarinfo.issym(): 699 self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname)) 700 else: 701 self._dbg(1, tarinfo.name) 702 703 if tarinfo.isreg(): 704 self.makefile(tarinfo, targetpath) 705 elif tarinfo.isdir(): 706 self.makedir(tarinfo, targetpath) 707 elif tarinfo.isfifo(): 708 self.makefifo(tarinfo, targetpath) 709 elif tarinfo.ischr() or tarinfo.isblk(): 710 self.makedev(tarinfo, targetpath) 711 elif tarinfo.islnk() or tarinfo.issym(): 712 self.makelink(tarinfo, targetpath) 713 elif tarinfo.type not in SUPPORTED_TYPES: 714 self.makeunknown(tarinfo, targetpath) 715 else: 716 self.makefile(tarinfo, targetpath) 717 718 self.chown(tarinfo, targetpath) 719 if not tarinfo.issym(): 720 self.chmod(tarinfo, targetpath) 721 self.utime(tarinfo, targetpath) 722 723 #-------------------------------------------------------------------------- 724 # Below are the different file methods. They are called via 725 # _extract_member() when extract() is called. They can be replaced in a 726 # subclass to implement other functionality. 727 728 def makedir(self, tarinfo, targetpath): 729 """Make a directory called targetpath. 730 """ 731 try: 732 # Use a safe mode for the directory, the real mode is set 733 # later in _extract_member(). 734 os.mkdir(targetpath, 0700) 735 except EnvironmentError, e: 736 if e.errno != errno.EEXIST: 737 raise 738 739 def makefile(self, tarinfo, targetpath): 740 """Make a file called targetpath. 741 """ 742 source = self.extractfile(tarinfo) 743 try: 744 with bltn_open(targetpath, "wb") as target: 745 copyfileobj(source, target) 746 finally: 747 source.close() 748 749 def makeunknown(self, tarinfo, targetpath): 750 """Make a file from a TarInfo object with an unknown type 751 at targetpath. 752 """ 753 self.makefile(tarinfo, targetpath) 754 self._dbg(1, "tarfile: Unknown file type %r, " \ 755 "extracted as regular file." % tarinfo.type) 756 757 def makefifo(self, tarinfo, targetpath): 758 """Make a fifo called targetpath. 759 """ 760 if hasattr(os, "mkfifo"): 761 os.mkfifo(targetpath) 762 else: 763 raise ExtractError("fifo not supported by system") 764 765 def makedev(self, tarinfo, targetpath): 766 """Make a character or block device called targetpath. 767 """ 768 if not hasattr(os, "mknod") or not hasattr(os, "makedev"): 769 raise ExtractError("special devices not supported by system") 770 771 mode = tarinfo.mode 772 if tarinfo.isblk(): 773 mode |= stat.S_IFBLK 774 else: 775 mode |= stat.S_IFCHR 776 777 os.mknod(targetpath, mode, 778 os.makedev(tarinfo.devmajor, tarinfo.devminor)) 779 780 def makelink(self, tarinfo, targetpath): 781 """Make a (symbolic) link called targetpath. If it cannot be created 782 (platform limitation), we try to make a copy of the referenced file 783 instead of a link. 784 """ 785 if hasattr(os, "symlink") and hasattr(os, "link"): 786 # For systems that support symbolic and hard links. 787 if tarinfo.issym(): 788 if os.path.lexists(targetpath): 789 os.unlink(targetpath) 790 os.symlink(tarinfo.linkname, targetpath) 791 else: 792 # See extract(). 793 if os.path.exists(tarinfo._link_target): 794 if os.path.lexists(targetpath): 795 os.unlink(targetpath) 796 os.link(tarinfo._link_target, targetpath) 797 else: 798 self._extract_member(self._find_link_target(tarinfo), targetpath) 799 else: 800 try: 801 self._extract_member(self._find_link_target(tarinfo), targetpath) 802 except KeyError: 803 raise ExtractError("unable to resolve link inside archive") 804 805 def chown(self, tarinfo, targetpath): 806 """Set owner of targetpath according to tarinfo. 807 """ 808 if pwd and hasattr(os, "geteuid") and os.geteuid() == 0: 809 # We have to be root to do so. 810 try: 811 g = grp.getgrnam(tarinfo.gname)[2] 812 except KeyError: 813 g = tarinfo.gid 814 try: 815 u = pwd.getpwnam(tarinfo.uname)[2] 816 except KeyError: 817 u = tarinfo.uid 818 try: 819 if tarinfo.issym() and hasattr(os, "lchown"): 820 os.lchown(targetpath, u, g) 821 else: 822 if sys.platform != "os2emx": 823 os.chown(targetpath, u, g) 824 except EnvironmentError, e: 825 raise ExtractError("could not change owner") 826 827 def chmod(self, tarinfo, targetpath): 828 """Set file permissions of targetpath according to tarinfo. 829 """ 830 if hasattr(os, 'chmod'): 831 try: 832 os.chmod(targetpath, tarinfo.mode) 833 except EnvironmentError, e: 834 raise ExtractError("could not change mode") 835 836 def utime(self, tarinfo, targetpath): 837 """Set modification time of targetpath according to tarinfo. 838 """ 839 if not hasattr(os, 'utime'): 840 return 841 try: 842 os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime)) 843 except EnvironmentError, e: 844 raise ExtractError("could not change modification time") 845 846 #-------------------------------------------------------------------------- 847 def next(self): 848 """Return the next member of the archive as a TarInfo object, when 849 TarFile is opened for reading. Return None if there is no more 850 available. 851 """ 852 self._check("ra") 853 if self.firstmember is not None: 854 m = self.firstmember 855 self.firstmember = None 856 return m 857 858 # Read the next block. 859 self.fileobj.seek(self.offset) 860 tarinfo = None 861 while True: 862 try: 863 tarinfo = self.tarinfo.fromtarfile(self) 864 except EOFHeaderError, e: 865 if self.ignore_zeros: 866 self._dbg(2, "0x%X: %s" % (self.offset, e)) 867 self.offset += BLOCKSIZE 868 continue 869 except InvalidHeaderError, e: 870 if self.ignore_zeros: 871 self._dbg(2, "0x%X: %s" % (self.offset, e)) 872 self.offset += BLOCKSIZE 873 continue 874 elif self.offset == 0: 875 raise ReadError(str(e)) 876 except EmptyHeaderError: 877 if self.offset == 0: 878 raise ReadError("empty file") 879 except TruncatedHeaderError, e: 880 if self.offset == 0: 881 raise ReadError(str(e)) 882 except SubsequentHeaderError, e: 883 raise ReadError(str(e)) 884 break 885 886 if tarinfo is not None: 887 self.members.append(tarinfo) 888 else: 889 self._loaded = True 890 891 return tarinfo 892 893 #-------------------------------------------------------------------------- 894 # Little helper methods: 895 896 def _getmember(self, name, tarinfo=None, normalize=False): 897 """Find an archive member by name from bottom to top. 898 If tarinfo is given, it is used as the starting point. 899 """ 900 # Ensure that all members have been loaded. 901 members = self.getmembers() 902 903 # Limit the member search list up to tarinfo. 904 if tarinfo is not None: 905 members = members[:members.index(tarinfo)] 906 907 if normalize: 908 name = os.path.normpath(name) 909 910 for member in reversed(members): 911 if normalize: 912 member_name = os.path.normpath(member.name) 913 else: 914 member_name = member.name 915 916 if name == member_name: 917 return member 918 919 def _load(self): 920 """Read through the entire archive file and look for readable 921 members. 922 """ 923 while True: 924 tarinfo = self.next() 925 if tarinfo is None: 926 break 927 self._loaded = True 928 929 def _check(self, mode=None): 930 """Check if TarFile is still open, and if the operation's mode 931 corresponds to TarFile's mode. 932 """ 933 if self.closed: 934 raise IOError("%s is closed" % self.__class__.__name__) 935 if mode is not None and self.mode not in mode: 936 raise IOError("bad operation for mode %r" % self.mode) 937 938 def _find_link_target(self, tarinfo): 939 """Find the target member of a symlink or hardlink member in the 940 archive. 941 """ 942 if tarinfo.issym(): 943 # Always search the entire archive. 944 linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname))) 945 limit = None 946 else: 947 # Search the archive before the link, because a hard link is 948 # just a reference to an already archived file. 949 linkname = tarinfo.linkname 950 limit = tarinfo 951 952 member = self._getmember(linkname, tarinfo=limit, normalize=True) 953 if member is None: 954 raise KeyError("linkname %r not found" % linkname) 955 return member 956 957 def __iter__(self): 958 """Provide an iterator object. 959 """ 960 if self._loaded: 961 return iter(self.members) 962 else: 963 return TarIter(self) 964 965 def _dbg(self, level, msg): 966 """Write debugging output to sys.stderr. 967 """ 968 if level <= self.debug: 969 print >> sys.stderr, msg 970 971 def __enter__(self): 972 self._check() 973 return self 974 975 def __exit__(self, type, value, traceback): 976 if type is None: 977 self.close() 978 else: 979 # An exception occurred. We must not call close() because 980 # it would try to write end-of-archive blocks and padding. 981 if not self._extfileobj: 982 self.fileobj.close() 983 self.closed = True 984 # class TarFile 985 986 TarFile
(六)json和pickle模块
用于序列化的两个模块
-
json,用于字符串 和 python数据类型间进行转换
-
''' 序列化 ''' import json info={ 'name':'鲁班', 'age':22 } f=open('test.txt','w') f.write(json.dumps(info))#用于将Python数据以字符串的形式写入到文件中 f.close()
''' 反序列化 ''' import json #json不同语言之间进行交互 f = open('test.txt','r') data=json.loads(f.read())#从文件中加载出Python的数据类型 print(data['age'])
-
pickle,用于python特有的类型 和 python的数据类型间进行转换
-
''' 序列化 ''' import pickle def sayhi(name): print("hello python",name) info = { 'name':'鲁班', 'age':22, 'func':'sayhi' } f=open("pickle_test.txt",'rb') pickle.dump(info,f)#==f.write(pickle.dumps(info)) f.close()
''' 反序列化 ''' import pickle f=open("pickle_test.txt",'rb') data=pickle.load(f) print(data["age"])
Json模块提供了四个功能:dumps、dump、loads、load
pickle模块提供了四个功能:dumps、dump、loads、load
(七)shelve模块
shelve模块是一个简单的k,v将内存数据通过文件持久化的模块,可以持久化任何pickle可支持的python数据格式
'''
利用shelve模块把Python数据写入文件
'''
import shelve
d = shelve.open('shelve_test') # 打开一个文件
t = '123'
t2 = '123334'
name = ["鲁班", "rain", "test"]
d["test"] = name # 持久化列表
d["t1"] = t # 持久化类
d["t2"] = t2
d.close()
'''
利用shelve模块从文件中读取Python数据
'''
import shelve
d=shelve.open('shelve_test') # 打开一个文件
print(d.get("test"))
print(d.get("t1"))
print(d.get("t2"))
(七)xml处理模块
xml是实现不同语言或程序之间进行数据交换的协议,跟json差不多,但json使用起来更简单,不过,古时候,在json还没诞生的黑暗年代,
大家只能选择用xml呀,至今很多传统公司如金融行业的很多系统的接口还主要是xml。
xml的格式如下,就是通过<>节点来区别数据结构的:

<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
xml协议在各个语言里的都 是支持的,在python中可以用以下模块操作xml。

import xml.etree.ElementTree as ET
tree = ET.parse("xml_hehe.xml")
root = tree.getroot()
print(root)
print(root.tag)
# 遍历xml文档
for child in root:
print(child.tag, child.attrib)
for i in child:
print(i.tag, i.text,i.attrib)
# 只遍历year 节点
for node in root.iter('year'):
print(node.tag, node.text)
修改和删除xml文档内容

import xml.etree.ElementTree as ET tree = ET.parse("xml_hehe.xml") root = tree.getroot() # 修改 for node in root.iter('year'): new_year = int(node.text) + 1 node.text = str(new_year) node.set("updated_by", "Yun") tree.write("xmltest.xml") # 删除node for country in root.findall('country'): rank = int(country.find('rank').text) if rank > 50: root.remove(country) tree.write('output.xml')
自己创建xml文档

import xml.etree.ElementTree as ET new_xml = ET.Element("namelist") Personal = ET.SubElement(new_xml, "Personal", attrib={"enrolled": "yes"}) name = ET.SubElement(Personal,"name") name.text="鲁班大师" age = ET.SubElement(Personal, "age", attrib={"checked": "no"}) sex = ET.SubElement(Personal, "sex") age.text = '33' sex.text='man' Personal = ET.SubElement(new_xml, "Personal2", attrib={"enrolled": "no"}) name = ET.SubElement(Personal, "name") name.text="安琪拉" sex = ET.SubElement(Personal, "sex") sex.text='men' age = ET.SubElement(Personal, "age") age.text = '19' et = ET.ElementTree(new_xml) # 生成文档对象 et.write("test.xml", encoding="utf-8", xml_declaration=True) ET.dump(new_xml) # 打印生成的格式

<?xml version='1.0' encoding='utf-8'?> <namelist> <Personal enrolled="yes"> <name>鲁班大师</name> <age checked="no">33</age> <sex>man</sex></Personal> <Personal2 enrolled="no"> <name>安琪拉</name> <sex>men</sex> <age>19</age></Personal2> </namelist>
(八)PyYAML模块
Python也可以很容易的处理ymal文档格式,只不过需要安装一个模块,参考文档:http://pyyaml.org/wiki/PyYAMLDocumentation
(九)ConfigParser模块
用于生成和修改常见配置文档,当前模块的名称在 python 3.x 版本中变更为 configparser。
来看一个好多软件的常见文档格式如下
[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes
[bitbucket.org]
User = hg
[topsecret.server.com]
Port = 50022
ForwardX11 = no
如果想用python生成一个这样的文档怎么做呢?

import configparser config = configparser.ConfigParser() config["DEFAULT"] = {'ServerAliveInterval': '45', 'Compression': 'yes', 'CompressionLevel': '9'} config['bitbucket.org'] = {} config['bitbucket.org']['User'] = 'hg' config['topsecret.server.com'] = {} topsecret = config['topsecret.server.com'] topsecret['Host Port'] = '50022' # mutates the parser topsecret['ForwardX11'] = 'no' # same here config['DEFAULT']['ForwardX11'] = 'yes' with open('example.ini', 'w') as configfile: config.write(configfile)
读取Config文档内容

import configparser conf = configparser.ConfigParser() conf.read("example.ini") print(conf.defaults()) print(conf.sections()) print(conf['bitbucket.org']['user'])
configparser增删改查语法

[section1] k1 = v1 k2:v2 [section2] k1 = v1 import ConfigParser config = ConfigParser.ConfigParser() config.read('i.cfg') # ########## 读 ########## #secs = config.sections() #print secs #options = config.options('group2') #print options #item_list = config.items('group2') #print item_list #val = config.get('group1','key') #val = config.getint('group1','key') # ########## 改写 ########## #sec = config.remove_section('group1') #config.write(open('i.cfg', "w")) #sec = config.has_section('wupeiqi') #sec = config.add_section('wupeiqi') #config.write(open('i.cfg', "w")) #config.set('group2','k1',11111) #config.write(open('i.cfg', "w")) #config.remove_option('group2','age') #config.write(open('i.cfg', "w"))
(十)hashlib模块
用于加密相关的操作,3.x里代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法

import hashlib m = hashlib.md5() m.update(b'hello') print(m.hexdigest()) m.update(b'world!') print(m.hexdigest()) m2 = hashlib.md5() m2.update(b'helloworld!') print(m2.hexdigest()) #sha256() hash=hashlib.sha256() hash.update('微微一笑很倾城'.encode(encoding='utf-8')) print(hash.hexdigest()) #sha384() hash1 = hashlib.sha384() hash1.update('微微一笑很倾城'.encode(encoding='utf-8')) print(hash1.hexdigest()) #sha512() hash2 = hashlib.sha512() hash2.update('微微一笑很倾城'.encode(encoding='utf-8')) print(hash2.hexdigest()) ''' python 还有一个 hmac 模块,它内部对我们创建 key 和 内容 再进行处理然后再加密 散列消息鉴别码,简称HMAC,是一种基于消息鉴别码MAC(Message Authentication Code) 的鉴别机制。使用HMAC时,消息通讯的双方,通过验证消息中加入的鉴别密钥K 来鉴别消息的真伪; 一般用于网络通信中消息加密,前提是双方先要约定好key,就像接头暗号一样, 然后消息发送把用key把消息加密,接收方用key + 消息明文再加密, 拿加密后的值 跟 发送者的相对比是否相等,这样就能验证消息的真实性, 及发送者的合法性了。 ''' import hmac h = hmac.new('鲁班大师'.encode(encoding='utf-8'), '智障二百五'.encode(encoding='utf-8')) print (h.hexdigest())
(十一)re模块
常用正则表达式符号
'.' 默认匹配除\n之外的任意一个字符,若指定flag DOTALL,则匹配任意字符,包括换行
'^' 匹配字符开头,若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
'$' 匹配字符结尾,或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以
'*' 匹配*号前的字符0次或多次,re.findall("ab*","cabb3abcbbac") 结果为['abb', 'ab', 'a']
'+' 匹配前一个字符1次或多次,re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']
'?' 匹配前一个字符1次或0次
'{m}' 匹配前一个字符m次
'{n,m}' 匹配前一个字符n到m次,re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']
'|' 匹配|左或|右的字符,re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'
'(...)' 分组匹配,re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c
'\A' 只从字符开头匹配,re.search("\Aabc","alexabc") 是匹配不到的
'\Z' 匹配字符结尾,同$
'\d' 匹配数字0-9
'\D' 匹配非数字
'\w' 匹配[A-Za-z0-9]
'\W' 匹配非[A-Za-z0-9]
's' 匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 '\t'
'(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city")
结果{'province': '3714', 'city': '81', 'birthday': '1993'}
演示
>>> import re
>>> re.match('.','dsskdslds211')
<_sre.SRE_Match object; span=(0, 1), match='d'>
>>>import re
>>> re.match('^ds','dsdsdsdsadj1212')
<_sre.SRE_Match object; span=(0, 2), match='ds'>
>>>
>>>import re
>>> re.match('^ds\d','ds12123dsdsdsadj1212')
<_sre.SRE_Match object; span=(0, 3), match='ds1'>
>>> re.match('^ds\d+','ds12123dsdsdsadj1212')
<_sre.SRE_Match object; span=(0, 7), match='ds12123'>
>>>
>>>import re
>>>re.search('k[a-z]+a','sahsaj1212kaHEHEsha12sakasha')
<_sre.SRE_Match object; span=(23, 28), match='kasha'>
>>>import re
>>>re.search('k[a-zA-Z]+a','sahsaj1212kaHEHEsha12sakasha')
<_sre.SRE_Match object; span=(10, 19), match='kaHEHEsha'>
>>>import re
>>>re.search('#.+#','as#hello#ha')
<_sre.SRE_Match object; span=(2, 9), match='#hello#'>
>>>import re
>>>print(re.search('a?','asnksaaaha'))
>>>print(re.search('aa?','asnksaaaha'))
>>>print(re.search('aaa?','asnksaaaha'))
<_sre.SRE_Match object; span=(0, 1), match='a'>
<_sre.SRE_Match object; span=(0, 1), match='a'>
<_sre.SRE_Match object; span=(5, 8), match='aaa'>
import re
print(re.search('[0-9]{3}','asn1k2sa1213aaha'))
print(re.search('[0-9]{1,3}','asn1k2sa1213aaha'))
<_sre.SRE_Match object; span=(8, 11), match='121'>
<_sre.SRE_Match object; span=(3, 4), match='1'>
import re
print(re.findall('[0-9]{3}','asn1k2sa1213aaha'))
print(re.findall('[0-9]{1,3}','asn1k2sa1213aaha'))
['121']
['1', '2', '121', '3']
import re
print(re.findall('abc|ABC','asabcn1k2sABCa1213aaha'))
print(re.search('abc|ABC','asabcn1k2sABCa1213aaha').group())
['abc', 'ABC']
abc
import re
print(re.search('(abc){2}','asabcn1abcabcka'))
print(re.search('(abc){2}\|','asabcn1abcabc|ka'))
print(re.search('(abc){2}\|{2}','asabcn1abcabc||ka'))
print(re.search('(abc){2}\|\|=','asabcn1abcabc||=ka'))
print(re.search('(abc){2}(\|\|=){2}','asabcn1abcabc||=||=ka'))
<_sre.SRE_Match object; span=(7, 13), match='abcabc'>
<_sre.SRE_Match object; span=(7, 14), match='abcabc|'>
<_sre.SRE_Match object; span=(7, 15), match='abcabc||'>
<_sre.SRE_Match object; span=(7, 16), match='abcabc||='>
<_sre.SRE_Match object; span=(7, 19), match='abcabc||=||='>
import re
print(re.search('\A[0-9]+[a-z]\Z','1213a'))
<_sre.SRE_Match object; span=(0, 5), match='1213a'>
import re
print(re.search('\D+','1213asa |?$#@'))
print(re.search('\W+','1213asa |?$#@'))
print(re.search('\s+','1213asa \r\n\t'))
<_sre.SRE_Match object; span=(4, 13), match='asa |?$#@'>
<_sre.SRE_Match object; span=(7, 13), match=' |?$#@'>
<_sre.SRE_Match object; span=(7, 11), match=' \r\n\t'>
import re
re.search("(?P<province>[0-9]{2})(?P<city>[0-9]{2})
(?P<local>[0-9]{2})(?P<birthday>[0-9]{8})",
"371481199306143242").groupdict("city")
{'province': '37', 'city': '14', 'local': '81', 'birthday': '19930614'}
正则表达式
在线测试工具 http://tool.chinaz.com/regex/
同一个位置上可以出现的字符的范围。
字符组 : [字符组] 在同一个位置可能出现的各种字符组成了一个字符组,在正则表达式中用[]表示 字符分为很多类,比如数字、字母、标点等等。 假如你现在要求一个位置"只能出现一个数字",那么这个位置上的字符只能是0、1、2...9这10个数之一。
字符:
元字符 |
匹配内容 |
. | 匹配除换行符以外的任意字符 |
\w | 匹配字母或数字或下划线 |
\s | 匹配任意的空白符 |
\d | 匹配数字 |
\n | 匹配一个换行符 |
\t | 匹配一个制表符 |
\b | 匹配一个单词的结尾 |
^ | 匹配字符串的开始 |
$ | 匹配字符串的结尾 |
\W |
匹配非字母或数字或下划线 |
\D |
匹配非数字
|
\S |
匹配非空白符
|
a|b |
匹配字符a或字符b |
() |
匹配括号内的表达式,也表示一个组 |
[...] |
匹配字符组中的字符 |
[^...] |
匹配除了字符组中字符的所有字符 |
量词:
量词 |
用法说明 |
* | 重复零次或更多次 |
+ | 重复一次或更多次 |
? | 重复零次或一次 |
{n} | 重复n次 |
{n,} | 重复n次或更多次 |
{n,m} | 重复n到m次 |
. ^ $
正则 | 待匹配字符 | 匹配 结果 |
说明 |
海. | 海燕海娇海东 | 海燕海娇海东 | 匹配所有"海."的字符 |
^海. | 海燕海娇海东 | 海燕 | 只从开头匹配"海." |
海.$ | 海燕海娇海东 | 海东 | 只匹配结尾的"海.$" |
* + ? { }
正则 | 待匹配字符 | 匹配 结果 |
说明 |
李.? | 李杰和李莲英和李二棍子 |
李杰
|
?表示重复零次或一次,即只匹配"李"后面一个任意字符 |
李.* | 李杰和李莲英和李二棍子 | 李杰和李莲英和李二棍子 |
*表示重复零次或多次,即匹配"李"后面0或多个任意字符 |
李.+ | 李杰和李莲英和李二棍子 | 李杰和李莲英和李二棍子 |
+表示重复一次或多次,即只匹配"李"后面1个或多个任意字符 |
李.{1,2} | 李杰和李莲英和李二棍子 |
李杰和 |
{1,2}匹配1到2次任意字符 |
注意:前面的*,+,?等都是贪婪匹配,也就是尽可能匹配,后面加?号使其变成惰性匹配
正则 | 待匹配字符 | 匹配 结果 |
说明 |
李.*? | 李杰和李莲英和李二棍子 |
李
李 李 |
惰性匹配 |
字符集[][^]
正则 | 待匹配字符 | 匹配 结果 |
说明 |
李[杰莲英二棍子]* | 李杰和李莲英和李二棍子 |
李杰
|
表示匹配"李"字后面[杰莲英二棍子]的字符任意次 |
李[^和]* | 李杰和李莲英和李二棍子 |
李杰 |
表示匹配一个不是"和"的字符任意次 |
[\d] | 456bdha3 |
4 |
表示匹配任意一个数字,匹配到4个结果 |
[\d]+ | 456bdha3 |
456 |
表示匹配任意个数字,匹配到2个结果 |
分组 ()与 或 |[^]
身份证号码是一个长度为15或18个字符的字符串,如果是15位则全部由数字组成,首位不能为0;如果是18位,则前17位全部是数字,末位可能是数字或x,下面我们尝试用正则来表示:
正则 | 待匹配字符 | 匹配 结果 |
说明 |
^[1-9]\d{13,16}[0-9x]$ | 110101198001017032 |
110101198001017032 |
表示可以匹配一个正确的身份证号 |
^[1-9]\d{13,16}[0-9x]$ | 1101011980010170 |
1101011980010170 |
表示也可以匹配这串数字,但这并不是一个正确的身份证号码,它是一个16位的数字 |
^[1-9]\d{14}(\d{2}[0-9x])?$ | 1101011980010170 |
False |
现在不会匹配错误的身份证号了 |
^([1-9]\d{16}[0-9x]|[1-9]\d{14})$ | 110105199812067023 |
110105199812067023 |
表示先匹配[1-9]\d{16}[0-9x]如果没有匹配上就匹配[1-9]\d{14} |
转义符 \
在正则表达式中,有很多有特殊意义的是元字符,比如\n和\s等,如果要在正则中匹配正常的"\n"而不是"换行符"就需要对"\"进行转义,变成'\\'。
在python中,无论是正则表达式,还是待匹配的内容,都是以字符串的形式出现的,在字符串中\也有特殊的含义,本身还需要转义。所以如果匹配一次"\n",字符串中要写成'\\n',那么正则里就要写成"\\\\n",这样就太麻烦了。这个时候我们就用到了r'\n'这个概念,此时的正则是r'\\n'就可以了。
正则 | 待匹配字符 | 匹配 结果 |
说明 |
\n | \n | False |
因为在正则表达式中\是有特殊意义的字符,所以要匹配\n本身,用表达式\n无法匹配 |
\\n | \n | True |
转义\之后变成\\,即可匹配 |
"\\\\n" | '\\n' | True |
如果在python中,字符串中的'\'也需要转义,所以每一个字符串'\'又需要转义一次 |
r'\\n' | r'\n' | True |
在字符串之前加r,让整个字符串不转义 |
贪婪匹配
贪婪匹配:在满足匹配时,匹配尽可能长的字符串,默认情况下,采用贪婪匹配
正则 | 待匹配字符 | 匹配 结果 |
说明 |
<.*> |
<script>...<script> |
<script>...<script> |
默认为贪婪匹配模式,会匹配尽量长的字符串 |
<.*?> | <script>...<script> |
<script> |
加上?为将贪婪匹配模式转为非贪婪匹配模式,会匹配尽量短的字符串 |
几个常用的非贪婪匹配Pattern
*? 重复任意次,但尽可能少重复
+? 重复1次或更多次,但尽可能少重复
?? 重复0次或1次,但尽可能少重复
{n,m}? 重复n到m次,但尽可能少重复
{n,}? 重复n次以上,但尽可能少重复
.*?的用法
. 是任意字符
* 是取 0 至 无限长度
? 是非贪婪模式。
何在一起就是 取尽量少的任意字符,一般不会这么单独写,他大多用在:
.*?x
就是取前面任意长度的字符,直到一个x出现
re模块下的常用方法
import re
ret = re.findall('a', 'eva egon yuan') # 返回所有满足匹配条件的结果,放在列表里
print(ret) #结果 : ['a', 'a']
ret = re.search('a', 'eva egon yuan').group()
print(ret) #结果 : 'a'
# 函数会在字符串内查找模式匹配,只到找到第一个匹配然后返回一个包含匹配信息的对象,该对象可以
# 通过调用group()方法得到匹配的字符串,如果字符串没有匹配,则返回None。
ret = re.match('a', 'abc').group() # 同search,不过仅在字符串开始处进行匹配
print(ret)
#结果 : 'a'
ret = re.split('[ab]', 'abcd') # 先按'a'分割得到''和'bcd',在对''和'bcd'分别按'b'分割
print(ret) # ['', '', 'cd']
ret = re.sub('\d', 'H', 'eva3egon4yuan4', 1)#将数字替换成'H',参数1表示只替换1个
print(ret) #evaHegon4yuan4
ret = re.subn('\d', 'H', 'eva3egon4yuan4')#将数字替换成'H',返回元组(替换的结果,替换了多少次)
print(ret)
obj = re.compile('\d{3}') #将正则表达式编译成为一个 正则表达式对象,规则要匹配的是3个数字
ret = obj.search('abc123eeee') #正则表达式对象调用search,参数为待匹配的字符串
print(ret.group()) #结果 : 123
import re
ret = re.finditer('\d', 'ds3sy4784a') #finditer返回一个存放匹配结果的迭代器
print(ret) # <callable_iterator object at 0x10195f940>
print(next(ret).group()) #查看第一个结果
print(next(ret).group()) #查看第二个结果
print([i.group() for i in ret]) #查看剩余的左右结果
注意:
1 findall的优先级查询:
import re
ret = re.findall('www.(baidu|oldboy).com', 'www.oldboy.com')
print(ret) # ['oldboy'] 这是因为findall会优先把匹配结果组里内容返回,如果想要匹配结果,取消权限即可
ret = re.findall('www.(?:baidu|oldboy).com', 'www.oldboy.com')
print(ret) # ['www.oldboy.com']
2 split的优先级查询
ret=re.split("\d+","eva3egon4yuan")
print(ret) #结果 : ['eva', 'egon', 'yuan']
ret=re.split("(\d+)","eva3egon4yuan")
print(ret) #结果 : ['eva', '3', 'egon', '4', 'yuan']
#在匹配部分加上()之后所切出的结果是不同的,
#没有()的没有保留所匹配的项,但是有()的却能够保留了匹配的项,
#这个在某些需要保留匹配部分的使用过程是非常重要的。
正则 |
待匹配字符 |
匹配 |
说明 |
[0123456789] |
8 |
True |
在一个字符组里枚举合法的所有字符,字符组里的任意一个字符 |
[0123456789] |
a |
False |
由于字符组中没有"a"字符,所以不能匹配 |
[0-9] |
7 |
True |
也可以用-表示范围,[0-9]就和[0123456789]是一个意思 |
[a-z] |
s |
True |
同样的如果要匹配所有的小写字母,直接用[a-z]就可以表示 |
[A-Z] |
B |
True |
[A-Z]就表示所有的大写字母 |
[0-9a-fA-F] |
e |
True |
可以匹配数字,大小写形式的a~f,用来验证十六进制字符 |
十二string模块
str.capitalize() 把字符串的第一个字符大写
str.center(width) 返回一个原字符串居中,并使用空格填充到width长度的新字符串
str.ljust(width) 返回一个原字符串左对齐,用空格填充到指定长度的新字符串
str.rjust(width) 返回一个原字符串右对齐,用空格填充到指定长度的新字符串
str.zfill(width) 返回字符串右对齐,前面用0填充到指定长度的新字符串
str.count(str,[beg,len]) 返回子字符串在原字符串出现次数,beg,len是范围
str.decode(encodeing[,replace]) 解码string,出错引发ValueError异常
str.encode(encodeing[,replace]) 解码string
str.endswith(substr[,beg,end]) 字符串是否以substr结束,beg,end是范围
str.startswith(substr[,beg,end]) 字符串是否以substr开头,beg,end是范围
str.expandtabs(tabsize = 8) 把字符串的tab转为空格,默认为8个
str.find(str,[stat,end]) 查找子字符串在字符串第一次出现的位置,否则返回-1
str.index(str,[beg,end]) 查找子字符串在指定字符中的位置,不存在报异常
str.isalnum() 检查字符串是否以字母和数字组成,是返回true否则False
str.isalpha() 检查字符串是否以纯字母组成,是返回true,否则false
str.isdecimal() 检查字符串是否以纯十进制数字组成,返回布尔值
str.isdigit() 检查字符串是否以纯数字组成,返回布尔值
str.islower() 检查字符串是否全是小写,返回布尔值
str.isupper() 检查字符串是否全是大写,返回布尔值
str.isnumeric() 检查字符串是否只包含数字字符,返回布尔值
str.isspace() 如果str中只包含空格,则返回true,否则FALSE
str.title() 返回标题化的字符串(所有单词首字母大写,其余小写)
str.istitle() 如果字符串是标题化的(参见title())则返回true,否则false
str.join(seq) 以str作为连接符,将一个序列中的元素连接成字符串
str.split(str=‘‘,num) 以str作为分隔符,将一个字符串分隔成一个序列,num是被分隔的字符串
str.splitlines(num) 以行分隔,返回各行内容作为元素的列表
str.lower() 将大写转为小写
str.upper() 转换字符串的小写为大写
str.swapcase() 翻换字符串的大小写
str.lstrip() 去掉字符左边的空格和回车换行符
str.rstrip() 去掉字符右边的空格和回车换行符
str.strip() 去掉字符两边的空格和回车换行符
str.partition(substr) 从substr出现的第一个位置起,将str分割成一个3元组。
str.replace(str1,str2,num) 查找str1替换成str2,num是替换次数
str.rfind(str[,beg,end]) 从右边开始查询子字符串
str.rindex(str,[beg,end]) 从右边开始查找子字符串位置
str.rpartition(str) 类似partition函数,不过从右边开始查找
str.translate(str,del=‘‘) 按str给出的表转换string的字符,del是要过虑的字符
十三math模块
ceil:取大于等于x的最小的整数值,如果x是一个整数,则返回x
copysign:把y的正负号加到x前面,可以使用0
cos:求x的余弦,x必须是弧度
degrees:把x从弧度转换成角度
e:表示一个常量
exp:返回math.e,也就是2.71828的x次方
expm1:返回math.e的x(其值为2.71828)次方的值减1
fabs:返回x的绝对值
factorial:取x的阶乘的值
floor:取小于等于x的最大的整数值,如果x是一个整数,则返回自身
fmod:得到x/y的余数,其值是一个浮点数
frexp:返回一个元组(m,e),其计算方式为:x分别除0.5和1,得到一个值的范围
fsum:对迭代器里的每个元素进行求和操作
gcd:返回x和y的最大公约数
hypot:如果x是不是无穷大的数字,则返回True,否则返回False
isfinite:如果x是正无穷大或负无穷大,则返回True,否则返回False
isinf:如果x是正无穷大或负无穷大,则返回True,否则返回False
isnan:如果x不是数字True,否则返回False
ldexp:返回x*(2**i)的值
log:返回x的自然对数,默认以e为基数,base参数给定时,将x的对数返回给定的base,计算式为:log(x)/log(base)
log10:返回x的以10为底的对数
log1p:返回x+1的自然对数(基数为e)的值
log2:返回x的基2对数
modf:返回由x的小数部分和整数部分组成的元组
pi:数字常量,圆周率
pow:返回x的y次方,即x**y
radians:把角度x转换成弧度
sin:求x(x为弧度)的正弦值
sqrt:求x的平方根
tan:返回x(x为弧度)的正切值
trunc:返回x的整数部分
十四urllib模块
urllib.quote(string[,safe]) 对字符串进行编码。参数safe指定了不需要编码的字符
urllib.unquote(string) 对字符串进行解码
urllib.quote_plus(string[,safe]) 与urllib.quote类似,但这个方法用‘+‘来替换‘ ‘,而quote用‘%20‘来代替‘ ‘
urllib.unquote_plus(string ) 对字符串进行解码
urllib.urlencode(query[,doseq]) 将dict或者包含两个元素的元组列表转换成url参数。
例如 字典{‘name‘:‘wklken‘,‘pwd‘:‘123‘}将被转换为”name=wklken&pwd=123″
urllib.pathname2url(path) 将本地路径转换成url路径
urllib.url2pathname(path) 将url路径转换成本地路径
urllib.urlretrieve(url[,filename[,reporthook[,data]]]) 下载远程数据到本地
filename:指定保存到本地的路径(若未指定该,urllib生成一个临时文件保存数据)
reporthook:回调函数,当连接上服务器、以及相应的数据块传输完毕的时候会触发该回调
data:指post到服务器的数据
rulrs = urllib.urlopen(url[,data[,proxies]]) 抓取网页信息,[data]post数据到Url,proxies设置的代理
urlrs.readline() 跟文件对象使用一样
urlrs.readlines() 跟文件对象使用一样
urlrs.fileno() 跟文件对象使用一样
urlrs.close() 跟文件对象使用一样
urlrs.info() 返回一个httplib.HTTPMessage对象,表示远程服务器返回的头信息
urlrs.getcode() 获取请求返回状态HTTP状态码
urlrs.geturl() 返回请求的URL
十五logging模块
函数式简单配置
import logging
logging.debug('debug message')
logging.info('info message')
logging.warning('warning message')
logging.error('error message')
logging.critical('critical message')
输出结果:
C:\Python3.6\python.exe H:/test/loggin模块/test1.py
WARNING:root:warning message
ERROR:root:error message
CRITICAL:root:critical message
进程已结束,退出代码0
默认情况下Python的logging模块将日志打印到了标准输出中,且只显示了大于等于WARNING级别的日志,
这说明默认的日志级别设置为WARNING(日志级别等级CRITICAL > ERROR > WARNING > INFO > DEBUG),
默认的日志格式为日志级别:Logger名称:用户输出消息。
灵活配置日志级别,日志格式,输出位置:
配置参数:
logging.basicConfig()函数中可通过具体参数来更改logging模块默认行为,可用参数有:
filename:用指定的文件名创建FiledHandler,这样日志会被存储在指定的文件中。
filemode:文件打开方式,在指定了filename时使用这个参数,默认值为“a”还可指定为“w”。
format:指定handler使用的日志显示格式。
datefmt:指定日期时间格式。
level:设置rootlogger(后边会讲解具体概念)的日志级别
stream:用指定的stream创建StreamHandler。可以指定输出到sys.stderr,sys.stdout或者文件(f=open(‘test.log’,’w’)),默认为sys.stderr。
若同时列出了filename和stream两个参数,则stream参数会被忽略。
format参数中可能用到的格式化串:
%(name)s Logger的名字
%(levelno)s 数字形式的日志级别
%(levelname)s 文本形式的日志级别
%(pathname)s 调用日志输出函数的模块的完整路径名,可能没有
%(filename)s 调用日志输出函数的模块的文件名
%(module)s 调用日志输出函数的模块名
%(funcName)s 调用日志输出函数的函数名
%(lineno)d 调用日志输出函数的语句所在的代码行
%(created)f 当前时间,用UNIX标准的表示时间的浮 点数表示
%(relativeCreated)d 输出日志信息时的,自Logger创建以 来的毫秒数
%(asctime)s 字符串形式的当前时间。默认格式是 “2003-07-08 16:49:45,896”。逗号后面的是毫秒
%(thread)d 线程ID。可能没有
%(threadName)s 线程名。可能没有
%(process)d 进程ID。可能没有
%(message)s用户输出的消息
logger对象配置
import logging
logger = logging.getLogger()
# 创建一个handler,用于写入日志文件
fh = logging.FileHandler('test.log',encoding='utf-8')
# 再创建一个handler,用于输出到控制台
ch = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setLevel(logging.DEBUG)
fh.setFormatter(formatter)
ch.setFormatter(formatter)
logger.addHandler(fh) #logger对象可以添加多个fh和ch对象
logger.addHandler(ch)
logger.debug('logger debug message')
logger.info('logger info message')
logger.warning('logger warning message')
logger.error('logger error message')
logger.critical('logger critical message')
logging库提供了多个组件:Logger、Handler、Filter、Formatter。
Logger对象提供应用程序可直接使用的接口,Handler发送日志到适当的目的地,Filter提供了过滤日志信息的方法,Formatter指定日志显示格式。
另外,可以通过:logger.setLevel(logging.Debug)设置级别,当然也可以通过fh.setLevel(logging.Debug)单独对某个日志handler设置级别。
collections模块
在内置数据类型(dict、list、set、tuple)的基础上, collections模块 还提供了几个额外的数据类型:Counter、deque、defaultdict、namedtuple和OrderedDict等。
1.namedtuple: 生成可以使用名字来访问元素内容的tuple
2.deque: 双端队列,可以快速的从另外一侧追加和推出对象
3.Counter: 计数器,主要用来计数
4.OrderedDict: 有序字典
5.defaultdict: 带有默认值的字典
namedtuple
我们知道 tuple
可以表示不变集合,例如,一个点的二维坐标就可以表示成:
>>> p = (1, 2)
但是,看到(1, 2),很难看出这个tuple是用来表示一个坐标的。也就是说元祖在某些场合并不形象。
这时, namedtuple
就派上了用场:
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(1, 2)
>>> p.x
1
>>> p.y
2
类似的,如果要用坐标和半径表示一个圆,也可以用 namedtuple
定义:
#namedtuple('名称', [属性list]):
Circle = namedtuple('Circle', ['x', 'y', 'r'])
deque
使用list存储数据时,按索引访问元素很快,但是插入和删除元素就很慢了,因为list是线性存储,数据量大的时候,插入和删除效率很低。
deque是为了高效实现插入和删除操作的双向列表,适合用于队列和栈:
>>> from collections import deque
>>> q = deque(['a', 'b', 'c'])
>>> q.append('x')
>>> q.appendleft('y')
>>> q
deque(['y', 'a', 'b', 'c', 'x'])
deque 除了实现list的 append()
和 pop()
外,还支持 appendleft()
和 popleft()
,这样就可以非常高效地往头部添加或删除元素。
OrderedDict
*Python3.6中,Dict已经可以记住key加入的顺序了。
如果我们要显示保持Key的顺序,可以用 OrderedDict
:
>>> from collections import OrderedDict
>>> d = dict([('a', 1), ('b', 2), ('c', 3)])
>>> d # dict的Key是无序的
{'a': 1, 'c': 3, 'b': 2}
>>> od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> od # OrderedDict的Key是有序的
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
注意, OrderedDict
的Key会按照插入的顺序排列,不是Key本身排序:
>>> od = OrderedDict()
>>> od['z'] = 1
>>> od['y'] = 2
>>> od['x'] = 3
>>> od.keys() # 按照插入的Key的顺序返回
['z', 'y', 'x']
defaultdict
有如下值集合 [ 11 , 22 , 33 , 44 , 55 , 66 , 77 , 88 , 99 , 90. ..],将所有大于 66 的值保存至字典的第一个key中,将小于 66 的值保存至第二个key的值中。
即: { 'k1' : 大于 66 , 'k2' : 小于 66 }
原生字典解决方法:
values = [11, 22, 33,44,55,66,77,88,99,90] my_dict = {} for value in values: if value>66: if my_dict.has_key('k1'): my_dict['k1'].append(value) else: my_dict['k1'] = [value] else: if my_dict.has_key('k2'): my_dict['k2'].append(value) else: my_dict['k2'] = [value]
defaultdict字典解决方法:
from collections import defaultdict values = [11, 22, 33,44,55,66,77,88,99,90] my_dict = defaultdict(list) for value in values: if value>66: my_dict['k1'].append(value) else: my_dict['k2'].append(value)
使 用 dict
时,如果引用的Key不存在,就会抛出 KeyError
。如果希望key不存在时,返回一个默认值,就可以用 defaultdict
:
>>> from collections import defaultdict >>> dd = defaultdict(lambda: 'N/A') >>> dd['key1'] = 'abc' >>> dd['key1'] # key1存在 'abc' >>> dd['key2'] # key2不存在,返回默认值 'N/A'
Counter
Counter类的目的是用来跟踪值出现的次数。
它是一个无序的容器类型,以字典的键值对形式存储,其中元素作为key,其计数作为value。
应用示例:
>>> from collections import Counter
>>> c = Counter('abcdeabcdabcaba')
>>> c
Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1})
作者:Mr_Yun
欢迎任何形式的转载,但请务必注明出处。
限于本人水平,如果文章和代码有表述不当之处,还请不吝赐教。
【推荐】还在用 ECharts 开发大屏?试试这款永久免费的开源 BI 工具!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 从二进制到误差:逐行拆解C语言浮点运算中的4008175468544之谜
· .NET制作智能桌面机器人:结合BotSharp智能体框架开发语音交互
· 软件产品开发中常见的10个问题及处理方法
· .NET 原生驾驭 AI 新基建实战系列:向量数据库的应用与畅想
· 从问题排查到源码分析:ActiveMQ消费端频繁日志刷屏的秘密
· 《HelloGitHub》第 108 期
· Windows桌面应用自动更新解决方案SharpUpdater5发布
· 我的家庭实验室服务器集群硬件清单
· C# 13 中的新增功能实操
· Supergateway:MCP服务器的远程调试与集成工具