Python3基础之内置模块

模块和包

一、定义:

模块:用来从逻辑上组织Python代码(变量,函数,类,逻辑:实现一个功能),
本质就是.py结尾的Python文件
包:用来从逻辑上组织模块,本质就是一个目录(必须带有一个__init__.py文件)

二、导入方法:

import module_name,module_name2,...

from module_name import  *

from module_name import m1,m2,m3

from module_name import logger as logger1

 

三、import本质(路径搜索和搜索路径)

导入模块的本质就是把Python文件解释一遍,解释器解释该py文件
(import test test='test.py all code')
(from test import name name='code')

import module_name----->module_name.py----->module_name.py的路径

导入包的本质就是执行该包的__init__.py文件,解释器解释该包下的 __init__.py 文件

__name__

当做脚本运行:
  __name__ 等于'__main__'

当做模块导入:
  __name__= 模块名

我们可以借助这个特性来控制我们的py文件在不同的应用场景下执行不同的逻辑。

举个例子:

def say_hai(name):
    print('Hi, {}'.format(name))


# 下面的代码在当前文件以模块的方法被导入时是不会执行的
if __name__ == "__main__":
    print(__name__)
    input_name = input('your name:').strip()
    say_hai(input_name)

 


四、导入优化

from module_test import test

五、模块的分类

import加载的模块分为四个通用类别:

1 使用python编写的代码(.py文件)

2 已被编译为共享库或DLL的C或C++扩展

3 包好一组模块的包

4 使用C编写并链接到python解释器的内置模块

 

常用内置模块

(一)时间模块

在Python中,通常有这几种方式来表示时间:

  • 时间戳               1970年1月1日之后的秒,即:time.time()
  • 格式化的字符串    2014-11-11 11:11,    即:time.strftime('%Y-%m-%d')
  • 结构化时间          元组包含了:年、日、星期等... time.struct_time    即:time.localtime()

由于Python的time模块实现主要调用C库,所以各个平台可能有所不同。
UTC(Coordinated Universal Time,世界协调时)亦即格林威治天文时间,世界标准时间。

在中国为UTC+8。DST(Daylight Saving Time)即夏令时。

时间戳(timestamp)的方式:通常来说,时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。

我们运行“type(time.time())”,返回的是float类型。返回时间戳方式的函数主要有time(),clock()等

索引(Index)属性(Attribute)值(Values)
0  tm_year(年)  比如2011 
1  tm_mon(月)  1 - 12
2  tm_mday(日)  1 - 31
3  tm_hour(时)  0 - 23
4  tm_min(分)  0 - 59
5  tm_sec(秒)  0 - 61
6  tm_wday(weekday)  0 - 6(0表示周日)
7  tm_yday(一年中的第几天)  1 - 366
8  tm_isdst(是否是夏令时)  默认为-1

 

time模块的常用方法(函数)

1)time.localtime([secs]):将一个时间戳转换为当前时区的struct_time。secs参数未提供,则以当前时间为准。

>>> import time
>>> time.localtime()
time.struct_time(tm_year=2018, tm_mon=10, tm_mday=25, tm_hour=22, tm_min=57, tm_sec=42, tm_wday=3, tm_yday=298, tm_isdst=0)

 

2)time.time():返回当前时间的时间戳。

>>>import time
>>> time.time()
1540479500.1852782

 

3)time.gmtime([secs]):和localtime()方法类似,gmtime()方法是将一个时间戳转换为UTC时区(0时区)的struct_time。

>>>import time
>>> time.gmtime()
time.struct_time(tm_year=2018, tm_mon=10, tm_mday=25, tm_hour=14, tm_min=59, tm_sec=23, tm_wday=3, tm_yday=298, tm_isdst=0)

 

4)time.mktime(t):将一个struct_time(UTC+8)转化为时间戳。

>>>import time
>>> x=time.localtime()
>>> time.mktime(x)
1540479626.0

 

5)time.sleep(secs):线程推迟指定的时间运行。单位为秒。

import time
'''
运行程序,睡眠2秒后输出"Hello  Python!"
'''
time.sleep(2)
print("Hello Python!")

 

6)time.asctime([t]):把一个表示时间的元组或者struct_time表示为这种形式:'Sun Jun 20 23:21:05 1993'。如果没有参数,将会将time.localtime()作为参数传入。

>>>import time 
>>>x=time.localtime()
>>> time.asctime(x)
'Thu Oct 25 23:00:26 2018'
>>>

 

7)time.ctime([secs]):把一个时间戳(按秒计算的浮点数)转化为time.asctime()的形式。如果参数未给或者为None的时候,将会默认time.time()为参数。它的作用相当于time.asctime(time.localtime(secs))。

1 import time
2 >>> time.time()
3 1540459453.0845733
4 >>> time.ctime(time.time())
5 'Thu Oct 25 17:24:36 2018'
6 >>>

8)time.strftime(format[, t]):把一个代表时间的元组或者struct_time(如由time.localtime()和time.gmtime()返回)转化为格式化的时间字符串。

如果t未指定,将传入time.localtime()。如果元组中任何一个元素越界,ValueError的错误将会被抛出。

格式含义备注
%a 本地(locale)简化星期名称  
%A 本地完整星期名称  
%b 本地简化月份名称  
%B 本地完整月份名称  
%c 本地相应的日期和时间表示  
%d 一个月中的第几天(01 - 31)  
%H 一天中的第几个小时(24小时制,00 - 23)  
%I 第几个小时(12小时制,01 - 12)  
%j 一年中的第几天(001 - 366)  
%m 月份(01 - 12)  
%M 分钟数(00 - 59)  
%p 本地am或者pm的相应符
%S 秒(01 - 61)
%U 一年中的星期数。(00 - 53星期天是一个星期的开始。)第一个星期天之前的所有天数都放在第0周。
%w 一个星期中的第几天(0 - 6,0是星期天)
%W 和%U基本相同,不同的是%W以星期一为一个星期的开始。  
%x 本地相应日期  
%X 本地相应时间  
%y 去掉世纪的年份(00 - 99)  
%Y 完整的年份  
%Z 时区的名字(如果不存在为空字符)  
%% ‘%’字符  

备注

  1. “%p”只有与“%I”配合使用才有效果。
  2. 文档中强调确实是0 - 61,而不是59,闰年秒占两秒(汗一个)。
  3. 当使用strptime()函数时,只有当在这年中的周数和天数被确定的时候%U和%W才会被计算
1 import time
2 
3 >>> time.strftime("%Y-%m-%d %A  %H:%M:%S ")
4 '2018-10-25 Thursday  17:33:29 '
5 
6 >>> time.strftime(" %A  %H:%M:%S %Y-%m-%d ")
7 ' Thursday  17:35:09 2018-10-25 '
8 >>>

 

9)time.strptime(string[, format]):把一个格式化时间字符串转化为struct_time。实际上它和strftime()是逆操作。

import time

>>> time.strptime(' Thursday  17:35:09 2018-10-25',' %A  %H:%M:%S %Y-%m-%d')
time.struct_time(tm_year=2018, tm_mon=10, tm_mday=25, tm_hour=17, tm_min=35, tm_sec=9, tm_wday=3, tm_yday=298, tm_isdst=-1)
>>>

 10)time.clock():这个需要注意,在不同的系统上含义不同。在UNIX系统上,它返回的是“进程时间”,它是用秒表示的浮点数(时间戳)。

而在WINDOWS中,第一次调用,返回的是进程运行的实际时间。而第二次之后的调用是自第一次调用以后到现在的运行时间。

(实际上是以WIN32上QueryPerformanceCounter()为基础,它比毫秒表示更为精确)

>>>import time
>>> if __name__ =='__main__':
...     time.sleep(1)
...     print("clock1:%s"%time.clock())
...     time.sleep(1)
...     print("clock2:%s" % time.clock())
...     time.sleep(1)
...     print("clock3:%s" % time.clock())
...
clock1:2.5e-06
clock2:1.0002382
clock3:2.0004314
>>>

 

时间关系转换

datetime

>>>import datetime
#当前时间
>>> datetime.datetime.now()
datetime.datetime(2018, 10, 25, 14, 58, 9, 526923)
#当前时间为未来3天
>>> print ( datetime.datetime.now()+datetime.timedelta(3))
2018-10-28 14:59:58.085724
#当前时间为-3天
>>> print ( datetime.datetime.now()+datetime.timedelta(-3))
2018-10-22 15:01:00.604181
>>>

#当前时间+3小时 
>>>print ( datetime.datetime.now()+datetime.timedelta(hours=3))
2018-10-25 18:02:36.695773
#当前时间+30分钟
>>> print ( datetime.datetime.now()+datetime.timedelta(minutes=30))
2018-10-25 15:33:21.053755
>>>

#时间替换

>>> c_time=datetime.datetime.now()
>>> print(c_time.replace(minute=3,hour=2))
2018-10-25 02:03:39.820451
>>>

datetime.date.today() 本地日期对象,(用str函数可得到它的字面表示(2014-03-24))
datetime.date.isoformat(obj) 当前[年-月-日]字符串表示(2014-03-24)
datetime.date.fromtimestamp() 返回一个日期对象,参数是时间戳,返回 [年-月-日]
datetime.date.weekday(obj) 返回一个日期对象的星期数,周一是0
datetime.date.isoweekday(obj) 返回一个日期对象的星期数,周一是1
datetime.date.isocalendar(obj) 把日期对象返回一个带有年月日的元组
datetime对象:
datetime.datetime.today() 返回一个包含本地时间(含微秒数)的datetime对象 2014-03-24 23:31:50.419000
datetime.datetime.now([tz]) 返回指定时区的datetime对象 2014-03-24 23:31:50.419000
datetime.datetime.utcnow() 返回一个零时区的datetime对象
datetime.fromtimestamp(timestamp[,tz]) 按时间戳返回一个datetime对象,可指定时区,可用于strftime转换为日期表示
datetime.utcfromtimestamp(timestamp) 按时间戳返回一个UTC-datetime对象
datetime.datetime.strptime(‘2014-03-16 12:21:21‘,”%Y-%m-%d %H:%M:%S”) 将字符串转为datetime对象
datetime.datetime.strftime(datetime.datetime.now(), ‘%Y%m%d %H%M%S‘) 将datetime对象转换为str表示形式
datetime.date.today().timetuple() 转换为时间戳datetime元组对象,可用于转换时间戳
datetime.datetime.now().timetuple()
time.mktime(timetupleobj) 将datetime元组对象转为时间戳
time.time() 当前时间戳
time.localtime
time.gmtime

 

(二)random模块

random.random()#用于生成一个0到1的随机符点数: 0 <= n < 1.0

>>> import random
>>> random.random()
0.8048731160537441
>>> random.random()
0.540423134210193
>>> random.random()
0.5877892352747521
>>>

random.randint(a, b),用于生成一个指定范围内的整数。其中参数a是下限,参数b是上限,生成的随机数n: a <= n <= b

>>>import random >>> random.randint(1,9) 3 >>> random.randint(1,9) 7 >>> random.randint(1,9) 5 >>> random.randint(1,9) 9

randrange([start], stop[, step]) # 从指定范围内,按指定基数递增的集合中 获取一个随机数。

如:random.randrange(10, 100, 2), # 结果相当于从[10, 12, 14, 16, ... 96, 98]序列中获取一个随机数。

>>>import random
>>> random.randrange(1,10,2)
1
>>> random.randrange(1,10,2)
7
>>> random.randrange(1,10,2)
9
>>> random.randrange(1,10,2)
5
>>> random.randrange(1,10,2)
1
>>> random.randrange(1,10,2)
9
>>> random.randrange(1,10,2)
3

random.choice(sequence)参数sequence表示一个有序类型。从序列中获取一个随机元素sequence

在python不是一种特定的类型,而是泛指一系列的类型。如 list, tuple, 字符串都属于sequence。

>>>import random
>>> random.choice("I Love You")
'o'
>>> random.choice("I Love You")
'Y'
>>> random.choice("I Love You")
'v'
>>> random.choice("I Love You")
' '
>>> random.choice("I Love You")
' '
>>> random.choice("I Love You")
'L'
>>> random.choice("I Love You")
' '
>>>

 实际应用:

import random
import string
#随机整数:
print( random.randint(0,99))  #70

#随机选取0到100间的偶数:
print(random.randrange(0, 101, 2)) #4

#随机浮点数:
print( random.random()) #0.2746445568079129
print(random.uniform(1, 10)) #9.887001463194844

#随机字符:
print(random.choice('abcdefg&#%^*f')) #f

#多个字符中选取特定数量的字符:
print(random.sample('abcdefghij',3)) #['f', 'h', 'd']

#随机选取字符串:
print( random.choice ( ['apple', 'pear', 'peach', 'orange', 'lemon'] )) #apple
#洗牌#
items = [1,2,3,4,5,6,7]
print(items) #[1, 2, 3, 4, 5, 6, 7]
random.shuffle(items)
print(items) #[1, 4, 7, 2, 5, 3, 6]

 

生成随机验证码

import random

checkcode=''

for i in range(4):
    current=random.randrange(0,4)
    if i==current:
    # 字母
        tmp=chr(random.randint(65,122))

    else:
    #数字
        tmp=random.randint(0,9)

    checkcode+=str(tmp)

print(checkcode)

(三)os模块模块

提供对操作系统进行调用的接口

os.getcwd() 获取当前工作目录,即当前python脚本工作的目录路径
os.chdir("dirname")  改变当前脚本工作目录;相当于shell下cd
os.curdir  返回当前目录: ('.')
os.pardir  获取当前目录的父目录字符串名:('..')
os.makedirs('dirname1/dirname2')    可生成多层递归目录
os.removedirs('dirname1')    若目录为空,则删除,并递归到上一级目录,如若也为空,则删除,依此类推
os.mkdir('dirname')    生成单级目录;相当于shell中mkdir dirname
os.rmdir('dirname')    删除单级空目录,若目录不为空则无法删除,报错;相当于shell中rmdir dirname
os.listdir('dirname')    列出指定目录下的所有文件和子目录,包括隐藏文件,并以列表方式打印
os.remove()  删除一个文件
os.rename("oldname","newname")  重命名文件/目录
os.stat('path/filename')  获取文件/目录信息
os.sep    输出操作系统特定的路径分隔符,win下为"\\",Linux下为"/"
os.linesep    输出当前平台使用的行终止符,win下为"\t\n",Linux下为"\n"
os.pathsep    输出用于分割文件路径的字符串
os.name    输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command")  运行shell命令,直接显示
os.environ  获取系统环境变量
os.path.abspath(path)  返回path规范化的绝对路径
os.path.split(path)  将path分割成目录和文件名二元组返回
os.path.dirname(path)  返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path)  返回path最后的文件名。如何path以/或\结尾,那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path)  如果path存在,返回True;如果path不存在,返回False
os.path.isabs(path)  如果path是绝对路径,返回True
os.path.isfile(path)  如果path是一个存在的文件,返回True。否则返回False
os.path.isdir(path)  如果path是一个存在的目录,则返回True。否则返回False
os.path.join(path1[, path2[, ...]])  将多个路径组合后返回,第一个绝对路径之前的参数将被忽略
os.path.getatime(path)  返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path)  返回path所指向的文件或者目录的最后修改时间

(四)sys模块

sys.argv           命令行参数List,第一个元素是程序本身路径
sys.exit(n)        退出程序,正常退出时exit(0)
sys.version        获取Python解释程序的版本信息
sys.maxint         最大的Int值
sys.path           返回模块的搜索路径,初始化时使用PYTHONPATH环境变量的值
sys.platform       返回操作系统平台名称
sys.stdout.write('please:')
val = sys.stdin.readline()[:-1]

 

(五)shutil模块

高级的 文件、文件夹、压缩包 处理模块

shutil.copyfileobj(fsrc, fdst[, length])
将文件内容拷贝到另一个文件中,可以部分内容

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)
View Code

 

import shutil

f1 = open("程序员必逛的网站.txt",encoding='gbk')

f2 = open("笔记本2",'w',encoding='utf-8')

shutil.copyfileobj(f1,f2)
View Code

 

shutil.copyfile(src, dst)
拷贝文件

 

def copyfile(src, dst):
    """Copy data from src to dst"""
    if _samefile(src, dst):
        raise Error("`%s` and `%s` are the same file" % (src, dst))

    for fn in [src, dst]:
        try:
            st = os.stat(fn)
        except OSError:
            # File most likely does not exist
            pass
        else:
            # XXX What about other special files? (sockets, devices...)
            if stat.S_ISFIFO(st.st_mode):
                raise SpecialFileError("`%s` is a named pipe" % fn)

    with open(src, 'rb') as fsrc:
        with open(dst, 'wb') as fdst:
            copyfileobj(fsrc, fdst)
View Code

 

 

 

 

import shutil

shutil.copyfile('笔记本2','笔记本3')
View Code

shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变

 

def copymode(src, dst):
    """Copy mode bits from src to dst"""
    if hasattr(os, 'chmod'):
        st = os.stat(src)
        mode = stat.S_IMODE(st.st_mode)
        os.chmod(dst, mode)
View Code

 

 

 

shutil.copystat(src, dst)
拷贝状态的信息,包括:mode bits, atime, mtime, flags

def copystat(src, dst):
    """Copy all stat info (mode bits, atime, mtime, flags) from src to dst"""
    st = os.stat(src)
    mode = stat.S_IMODE(st.st_mode)
    if hasattr(os, 'utime'):
        os.utime(dst, (st.st_atime, st.st_mtime))
    if hasattr(os, 'chmod'):
        os.chmod(dst, mode)
    if hasattr(os, 'chflags') and hasattr(st, 'st_flags'):
        try:
            os.chflags(dst, st.st_flags)
        except OSError, why:
            for err in 'EOPNOTSUPP', 'ENOTSUP':
                if hasattr(errno, err) and why.errno == getattr(errno, err):
                    break
            else:
                raise
View Code

shutil.copy(src, dst)
拷贝文件和权限

def copy(src, dst):
    """Copy data and mode bits ("cp src dst").

    The destination may be a directory.

    """
    if os.path.isdir(dst):
        dst = os.path.join(dst, os.path.basename(src))
    copyfile(src, dst)
    copymode(src, dst)
View Code

shutil.copy2(src, dst)
拷贝文件和状态信息

def copy2(src, dst):
    """Copy data and all stat info ("cp -p src dst").

    The destination may be a directory.

    """
    if os.path.isdir(dst):
        dst = os.path.join(dst, os.path.basename(src))
    copyfile(src, dst)
    copystat(src, dst)
View Code

shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
递归的去拷贝文件

import shutil
shutil.copytree('a','new_a')
View Code

shutil.rmtree(path[, ignore_errors[, onerror]])
递归的去删除文件

import  shutil
shutil.rmtree('new_a')
View Code

shutil.move(src, dst)
递归的去移动文件

def move(src, dst):
    """Recursively move a file or directory to another location. This is
    similar to the Unix "mv" command.

    If the destination is a directory or a symlink to a directory, the source
    is moved inside the directory. The destination path must not already
    exist.

    If the destination already exists but is not a directory, it may be
    overwritten depending on os.rename() semantics.

    If the destination is on our current filesystem, then rename() is used.
    Otherwise, src is copied to the destination and then removed.
    A lot more could be done here...  A look at a mv.c shows a lot of
    the issues this implementation glosses over.

    """
    real_dst = dst
    if os.path.isdir(dst):
        if _samefile(src, dst):
            # We might be on a case insensitive filesystem,
            # perform the rename anyway.
            os.rename(src, dst)
            return

        real_dst = os.path.join(dst, _basename(src))
        if os.path.exists(real_dst):
            raise Error, "Destination path '%s' already exists" % real_dst
    try:
        os.rename(src, real_dst)
    except OSError:
        if os.path.isdir(src):
            if _destinsrc(src, dst):
                raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst)
            copytree(src, real_dst, symlinks=True)
            rmtree(src)
        else:
            copy2(src, real_dst)
            os.unlink(src)
View Code

shutil.make_archive(base_name, format,...)

import shutil

shutil.make_archive('shutil_make_archive','zip','H:\Python3_study\jichu\day1')
View Code
 1 def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
 2                  dry_run=0, owner=None, group=None, logger=None):
 3     """Create an archive file (eg. zip or tar).
 4 
 5     'base_name' is the name of the file to create, minus any format-specific
 6     extension; 'format' is the archive format: one of "zip", "tar", "bztar"
 7     or "gztar".
 8 
 9     'root_dir' is a directory that will be the root directory of the
10     archive; ie. we typically chdir into 'root_dir' before creating the
11     archive.  'base_dir' is the directory where we start archiving from;
12     ie. 'base_dir' will be the common prefix of all files and
13     directories in the archive.  'root_dir' and 'base_dir' both default
14     to the current directory.  Returns the name of the archive file.
15 
16     'owner' and 'group' are used when creating a tar archive. By default,
17     uses the current owner and group.
18     """
19     save_cwd = os.getcwd()
20     if root_dir is not None:
21         if logger is not None:
22             logger.debug("changing into '%s'", root_dir)
23         base_name = os.path.abspath(base_name)
24         if not dry_run:
25             os.chdir(root_dir)
26 
27     if base_dir is None:
28         base_dir = os.curdir
29 
30     kwargs = {'dry_run': dry_run, 'logger': logger}
31 
32     try:
33         format_info = _ARCHIVE_FORMATS[format]
34     except KeyError:
35         raise ValueError, "unknown archive format '%s'" % format
36 
37     func = format_info[0]
38     for arg, val in format_info[1]:
39         kwargs[arg] = val
40 
41     if format != 'zip':
42         kwargs['owner'] = owner
43         kwargs['group'] = group
44 
45     try:
46         filename = func(base_name, base_dir, **kwargs)
47     finally:
48         if root_dir is not None:
49             if logger is not None:
50                 logger.debug("changing back to '%s'", save_cwd)
51             os.chdir(save_cwd)
52 
53     return filename
View Code

创建压缩包并返回文件路径,例如:zip、tar

base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径,
如:www                        =>保存至当前路径
如:/Users/wupeiqi/www =>保存至/Users/wupeiqi/

format: 压缩包种类,“zip”, “tar”, “bztar”,“gztar”

root_dir: 要压缩的文件夹路径(默认当前目录)

owner: 用户,默认当前用户

group: 组,默认当前组

logger: 用于记录日志,通常是logging.Logger对象

shutil 对压缩包的处理是调用 ZipFileTarFile 两个模块来进行的,详细:

import zipfile

# 压缩
z = zipfile.ZipFile('laxi.zip', 'w')
z.write('a.log')
z.write('data.data')
z.close()

# 解压
z = zipfile.ZipFile('laxi.zip', 'r')
z.extractall()
z.close()

zipfile 压缩解压
zipfile 压缩解压
import tarfile

# 压缩
tar = tarfile.open('your.tar','w')
tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')
tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')
tar.close()

# 解压
tar = tarfile.open('your.tar','r')
tar.extractall()  # 可设置解压地址
tar.close()
tarfile 压缩解压
class ZipFile(object):
    """ Class with methods to open, read, write, close, list zip files.

    z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)

    file: Either the path to the file, or a file-like object.
          If it is a path, the file will be opened and closed by ZipFile.
    mode: The mode can be either read "r", write "w" or append "a".
    compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
    allowZip64: if True ZipFile will create files with ZIP64 extensions when
                needed, otherwise it will raise an exception when this would
                be necessary.

    """

    fp = None                   # Set here since __del__ checks it

    def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
        """Open the ZIP file with mode read "r", write "w" or append "a"."""
        if mode not in ("r", "w", "a"):
            raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')

        if compression == ZIP_STORED:
            pass
        elif compression == ZIP_DEFLATED:
            if not zlib:
                raise RuntimeError,\
                      "Compression requires the (missing) zlib module"
        else:
            raise RuntimeError, "That compression method is not supported"

        self._allowZip64 = allowZip64
        self._didModify = False
        self.debug = 0  # Level of printing: 0 through 3
        self.NameToInfo = {}    # Find file info given name
        self.filelist = []      # List of ZipInfo instances for archive
        self.compression = compression  # Method of compression
        self.mode = key = mode.replace('b', '')[0]
        self.pwd = None
        self._comment = ''

        # Check if we were passed a file-like object
        if isinstance(file, basestring):
            self._filePassed = 0
            self.filename = file
            modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
            try:
                self.fp = open(file, modeDict[mode])
            except IOError:
                if mode == 'a':
                    mode = key = 'w'
                    self.fp = open(file, modeDict[mode])
                else:
                    raise
        else:
            self._filePassed = 1
            self.fp = file
            self.filename = getattr(file, 'name', None)

        try:
            if key == 'r':
                self._RealGetContents()
            elif key == 'w':
                # set the modified flag so central directory gets written
                # even if no files are added to the archive
                self._didModify = True
            elif key == 'a':
                try:
                    # See if file is a zip file
                    self._RealGetContents()
                    # seek to start of directory and overwrite
                    self.fp.seek(self.start_dir, 0)
                except BadZipfile:
                    # file is not a zip file, just append
                    self.fp.seek(0, 2)

                    # set the modified flag so central directory gets written
                    # even if no files are added to the archive
                    self._didModify = True
            else:
                raise RuntimeError('Mode must be "r", "w" or "a"')
        except:
            fp = self.fp
            self.fp = None
            if not self._filePassed:
                fp.close()
            raise

    def __enter__(self):
        return self

    def __exit__(self, type, value, traceback):
        self.close()

    def _RealGetContents(self):
        """Read in the table of contents for the ZIP file."""
        fp = self.fp
        try:
            endrec = _EndRecData(fp)
        except IOError:
            raise BadZipfile("File is not a zip file")
        if not endrec:
            raise BadZipfile, "File is not a zip file"
        if self.debug > 1:
            print endrec
        size_cd = endrec[_ECD_SIZE]             # bytes in central directory
        offset_cd = endrec[_ECD_OFFSET]         # offset of central directory
        self._comment = endrec[_ECD_COMMENT]    # archive comment

        # "concat" is zero, unless zip was concatenated to another file
        concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
        if endrec[_ECD_SIGNATURE] == stringEndArchive64:
            # If Zip64 extension structures are present, account for them
            concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)

        if self.debug > 2:
            inferred = concat + offset_cd
            print "given, inferred, offset", offset_cd, inferred, concat
        # self.start_dir:  Position of start of central directory
        self.start_dir = offset_cd + concat
        fp.seek(self.start_dir, 0)
        data = fp.read(size_cd)
        fp = cStringIO.StringIO(data)
        total = 0
        while total < size_cd:
            centdir = fp.read(sizeCentralDir)
            if len(centdir) != sizeCentralDir:
                raise BadZipfile("Truncated central directory")
            centdir = struct.unpack(structCentralDir, centdir)
            if centdir[_CD_SIGNATURE] != stringCentralDir:
                raise BadZipfile("Bad magic number for central directory")
            if self.debug > 2:
                print centdir
            filename = fp.read(centdir[_CD_FILENAME_LENGTH])
            # Create ZipInfo instance to store file information
            x = ZipInfo(filename)
            x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
            x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])
            x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]
            (x.create_version, x.create_system, x.extract_version, x.reserved,
                x.flag_bits, x.compress_type, t, d,
                x.CRC, x.compress_size, x.file_size) = centdir[1:12]
            x.volume, x.internal_attr, x.external_attr = centdir[15:18]
            # Convert date/time code to (year, month, day, hour, min, sec)
            x._raw_time = t
            x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,
                                     t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )

            x._decodeExtra()
            x.header_offset = x.header_offset + concat
            x.filename = x._decodeFilename()
            self.filelist.append(x)
            self.NameToInfo[x.filename] = x

            # update total bytes read from central directory
            total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]
                     + centdir[_CD_EXTRA_FIELD_LENGTH]
                     + centdir[_CD_COMMENT_LENGTH])

            if self.debug > 2:
                print "total", total


    def namelist(self):
        """Return a list of file names in the archive."""
        l = []
        for data in self.filelist:
            l.append(data.filename)
        return l

    def infolist(self):
        """Return a list of class ZipInfo instances for files in the
        archive."""
        return self.filelist

    def printdir(self):
        """Print a table of contents for the zip file."""
        print "%-46s %19s %12s" % ("File Name", "Modified    ", "Size")
        for zinfo in self.filelist:
            date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]
            print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size)

    def testzip(self):
        """Read all the files and check the CRC."""
        chunk_size = 2 ** 20
        for zinfo in self.filelist:
            try:
                # Read by chunks, to avoid an OverflowError or a
                # MemoryError with very large embedded files.
                with self.open(zinfo.filename, "r") as f:
                    while f.read(chunk_size):     # Check CRC-32
                        pass
            except BadZipfile:
                return zinfo.filename

    def getinfo(self, name):
        """Return the instance of ZipInfo given 'name'."""
        info = self.NameToInfo.get(name)
        if info is None:
            raise KeyError(
                'There is no item named %r in the archive' % name)

        return info

    def setpassword(self, pwd):
        """Set default password for encrypted files."""
        self.pwd = pwd

    @property
    def comment(self):
        """The comment text associated with the ZIP file."""
        return self._comment

    @comment.setter
    def comment(self, comment):
        # check for valid comment length
        if len(comment) > ZIP_MAX_COMMENT:
            import warnings
            warnings.warn('Archive comment is too long; truncating to %d bytes'
                          % ZIP_MAX_COMMENT, stacklevel=2)
            comment = comment[:ZIP_MAX_COMMENT]
        self._comment = comment
        self._didModify = True

    def read(self, name, pwd=None):
        """Return file bytes (as a string) for name."""
        return self.open(name, "r", pwd).read()

    def open(self, name, mode="r", pwd=None):
        """Return file-like object for 'name'."""
        if mode not in ("r", "U", "rU"):
            raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
        if not self.fp:
            raise RuntimeError, \
                  "Attempt to read ZIP archive that was already closed"

        # Only open a new file for instances where we were not
        # given a file object in the constructor
        if self._filePassed:
            zef_file = self.fp
            should_close = False
        else:
            zef_file = open(self.filename, 'rb')
            should_close = True

        try:
            # Make sure we have an info object
            if isinstance(name, ZipInfo):
                # 'name' is already an info object
                zinfo = name
            else:
                # Get info object for name
                zinfo = self.getinfo(name)

            zef_file.seek(zinfo.header_offset, 0)

            # Skip the file header:
            fheader = zef_file.read(sizeFileHeader)
            if len(fheader) != sizeFileHeader:
                raise BadZipfile("Truncated file header")
            fheader = struct.unpack(structFileHeader, fheader)
            if fheader[_FH_SIGNATURE] != stringFileHeader:
                raise BadZipfile("Bad magic number for file header")

            fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
            if fheader[_FH_EXTRA_FIELD_LENGTH]:
                zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])

            if fname != zinfo.orig_filename:
                raise BadZipfile, \
                        'File name in directory "%s" and header "%s" differ.' % (
                            zinfo.orig_filename, fname)

            # check for encrypted flag & handle password
            is_encrypted = zinfo.flag_bits & 0x1
            zd = None
            if is_encrypted:
                if not pwd:
                    pwd = self.pwd
                if not pwd:
                    raise RuntimeError, "File %s is encrypted, " \
                        "password required for extraction" % name

                zd = _ZipDecrypter(pwd)
                # The first 12 bytes in the cypher stream is an encryption header
                #  used to strengthen the algorithm. The first 11 bytes are
                #  completely random, while the 12th contains the MSB of the CRC,
                #  or the MSB of the file time depending on the header type
                #  and is used to check the correctness of the password.
                bytes = zef_file.read(12)
                h = map(zd, bytes[0:12])
                if zinfo.flag_bits & 0x8:
                    # compare against the file type from extended local headers
                    check_byte = (zinfo._raw_time >> 8) & 0xff
                else:
                    # compare against the CRC otherwise
                    check_byte = (zinfo.CRC >> 24) & 0xff
                if ord(h[11]) != check_byte:
                    raise RuntimeError("Bad password for file", name)

            return ZipExtFile(zef_file, mode, zinfo, zd,
                    close_fileobj=should_close)
        except:
            if should_close:
                zef_file.close()
            raise

    def extract(self, member, path=None, pwd=None):
        """Extract a member from the archive to the current working directory,
           using its full name. Its file information is extracted as accurately
           as possible. `member' may be a filename or a ZipInfo object. You can
           specify a different directory using `path'.
        """
        if not isinstance(member, ZipInfo):
            member = self.getinfo(member)

        if path is None:
            path = os.getcwd()

        return self._extract_member(member, path, pwd)

    def extractall(self, path=None, members=None, pwd=None):
        """Extract all members from the archive to the current working
           directory. `path' specifies a different directory to extract to.
           `members' is optional and must be a subset of the list returned
           by namelist().
        """
        if members is None:
            members = self.namelist()

        for zipinfo in members:
            self.extract(zipinfo, path, pwd)

    def _extract_member(self, member, targetpath, pwd):
        """Extract the ZipInfo object 'member' to a physical
           file on the path targetpath.
        """
        # build the destination pathname, replacing
        # forward slashes to platform specific separators.
        arcname = member.filename.replace('/', os.path.sep)

        if os.path.altsep:
            arcname = arcname.replace(os.path.altsep, os.path.sep)
        # interpret absolute pathname as relative, remove drive letter or
        # UNC path, redundant separators, "." and ".." components.
        arcname = os.path.splitdrive(arcname)[1]
        arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
                    if x not in ('', os.path.curdir, os.path.pardir))
        if os.path.sep == '\\':
            # filter illegal characters on Windows
            illegal = ':<>|"?*'
            if isinstance(arcname, unicode):
                table = {ord(c): ord('_') for c in illegal}
            else:
                table = string.maketrans(illegal, '_' * len(illegal))
            arcname = arcname.translate(table)
            # remove trailing dots
            arcname = (x.rstrip('.') for x in arcname.split(os.path.sep))
            arcname = os.path.sep.join(x for x in arcname if x)

        targetpath = os.path.join(targetpath, arcname)
        targetpath = os.path.normpath(targetpath)

        # Create all upper directories if necessary.
        upperdirs = os.path.dirname(targetpath)
        if upperdirs and not os.path.exists(upperdirs):
            os.makedirs(upperdirs)

        if member.filename[-1] == '/':
            if not os.path.isdir(targetpath):
                os.mkdir(targetpath)
            return targetpath

        with self.open(member, pwd=pwd) as source, \
             file(targetpath, "wb") as target:
            shutil.copyfileobj(source, target)

        return targetpath

    def _writecheck(self, zinfo):
        """Check for errors before writing a file to the archive."""
        if zinfo.filename in self.NameToInfo:
            import warnings
            warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)
        if self.mode not in ("w", "a"):
            raise RuntimeError, 'write() requires mode "w" or "a"'
        if not self.fp:
            raise RuntimeError, \
                  "Attempt to write ZIP archive that was already closed"
        if zinfo.compress_type == ZIP_DEFLATED and not zlib:
            raise RuntimeError, \
                  "Compression requires the (missing) zlib module"
        if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED):
            raise RuntimeError, \
                  "That compression method is not supported"
        if not self._allowZip64:
            requires_zip64 = None
            if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:
                requires_zip64 = "Files count"
            elif zinfo.file_size > ZIP64_LIMIT:
                requires_zip64 = "Filesize"
            elif zinfo.header_offset > ZIP64_LIMIT:
                requires_zip64 = "Zipfile size"
            if requires_zip64:
                raise LargeZipFile(requires_zip64 +
                                   " would require ZIP64 extensions")

    def write(self, filename, arcname=None, compress_type=None):
        """Put the bytes from filename into the archive under the name
        arcname."""
        if not self.fp:
            raise RuntimeError(
                  "Attempt to write to ZIP archive that was already closed")

        st = os.stat(filename)
        isdir = stat.S_ISDIR(st.st_mode)
        mtime = time.localtime(st.st_mtime)
        date_time = mtime[0:6]
        # Create ZipInfo instance to store file information
        if arcname is None:
            arcname = filename
        arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
        while arcname[0] in (os.sep, os.altsep):
            arcname = arcname[1:]
        if isdir:
            arcname += '/'
        zinfo = ZipInfo(arcname, date_time)
        zinfo.external_attr = (st[0] & 0xFFFF) << 16L      # Unix attributes
        if compress_type is None:
            zinfo.compress_type = self.compression
        else:
            zinfo.compress_type = compress_type

        zinfo.file_size = st.st_size
        zinfo.flag_bits = 0x00
        zinfo.header_offset = self.fp.tell()    # Start of header bytes

        self._writecheck(zinfo)
        self._didModify = True

        if isdir:
            zinfo.file_size = 0
            zinfo.compress_size = 0
            zinfo.CRC = 0
            zinfo.external_attr |= 0x10  # MS-DOS directory flag
            self.filelist.append(zinfo)
            self.NameToInfo[zinfo.filename] = zinfo
            self.fp.write(zinfo.FileHeader(False))
            return

        with open(filename, "rb") as fp:
            # Must overwrite CRC and sizes with correct data later
            zinfo.CRC = CRC = 0
            zinfo.compress_size = compress_size = 0
            # Compressed size can be larger than uncompressed size
            zip64 = self._allowZip64 and \
                    zinfo.file_size * 1.05 > ZIP64_LIMIT
            self.fp.write(zinfo.FileHeader(zip64))
            if zinfo.compress_type == ZIP_DEFLATED:
                cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
                     zlib.DEFLATED, -15)
            else:
                cmpr = None
            file_size = 0
            while 1:
                buf = fp.read(1024 * 8)
                if not buf:
                    break
                file_size = file_size + len(buf)
                CRC = crc32(buf, CRC) & 0xffffffff
                if cmpr:
                    buf = cmpr.compress(buf)
                    compress_size = compress_size + len(buf)
                self.fp.write(buf)
        if cmpr:
            buf = cmpr.flush()
            compress_size = compress_size + len(buf)
            self.fp.write(buf)
            zinfo.compress_size = compress_size
        else:
            zinfo.compress_size = file_size
        zinfo.CRC = CRC
        zinfo.file_size = file_size
        if not zip64 and self._allowZip64:
            if file_size > ZIP64_LIMIT:
                raise RuntimeError('File size has increased during compressing')
            if compress_size > ZIP64_LIMIT:
                raise RuntimeError('Compressed size larger than uncompressed size')
        # Seek backwards and write file header (which will now include
        # correct CRC and file sizes)
        position = self.fp.tell()       # Preserve current position in file
        self.fp.seek(zinfo.header_offset, 0)
        self.fp.write(zinfo.FileHeader(zip64))
        self.fp.seek(position, 0)
        self.filelist.append(zinfo)
        self.NameToInfo[zinfo.filename] = zinfo

    def writestr(self, zinfo_or_arcname, bytes, compress_type=None):
        """Write a file into the archive.  The contents is the string
        'bytes'.  'zinfo_or_arcname' is either a ZipInfo instance or
        the name of the file in the archive."""
        if not isinstance(zinfo_or_arcname, ZipInfo):
            zinfo = ZipInfo(filename=zinfo_or_arcname,
                            date_time=time.localtime(time.time())[:6])

            zinfo.compress_type = self.compression
            if zinfo.filename[-1] == '/':
                zinfo.external_attr = 0o40775 << 16   # drwxrwxr-x
                zinfo.external_attr |= 0x10           # MS-DOS directory flag
            else:
                zinfo.external_attr = 0o600 << 16     # ?rw-------
        else:
            zinfo = zinfo_or_arcname

        if not self.fp:
            raise RuntimeError(
                  "Attempt to write to ZIP archive that was already closed")

        if compress_type is not None:
            zinfo.compress_type = compress_type

        zinfo.file_size = len(bytes)            # Uncompressed size
        zinfo.header_offset = self.fp.tell()    # Start of header bytes
        self._writecheck(zinfo)
        self._didModify = True
        zinfo.CRC = crc32(bytes) & 0xffffffff       # CRC-32 checksum
        if zinfo.compress_type == ZIP_DEFLATED:
            co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
                 zlib.DEFLATED, -15)
            bytes = co.compress(bytes) + co.flush()
            zinfo.compress_size = len(bytes)    # Compressed size
        else:
            zinfo.compress_size = zinfo.file_size
        zip64 = zinfo.file_size > ZIP64_LIMIT or \
                zinfo.compress_size > ZIP64_LIMIT
        if zip64 and not self._allowZip64:
            raise LargeZipFile("Filesize would require ZIP64 extensions")
        self.fp.write(zinfo.FileHeader(zip64))
        self.fp.write(bytes)
        if zinfo.flag_bits & 0x08:
            # Write CRC and file sizes after the file data
            fmt = '<LQQ' if zip64 else '<LLL'
            self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,
                  zinfo.file_size))
        self.fp.flush()
        self.filelist.append(zinfo)
        self.NameToInfo[zinfo.filename] = zinfo

    def __del__(self):
        """Call the "close()" method in case the user forgot."""
        self.close()

    def close(self):
        """Close the file, and for mode "w" and "a" write the ending
        records."""
        if self.fp is None:
            return

        try:
            if self.mode in ("w", "a") and self._didModify: # write ending records
                pos1 = self.fp.tell()
                for zinfo in self.filelist:         # write central directory
                    dt = zinfo.date_time
                    dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
                    dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
                    extra = []
                    if zinfo.file_size > ZIP64_LIMIT \
                            or zinfo.compress_size > ZIP64_LIMIT:
                        extra.append(zinfo.file_size)
                        extra.append(zinfo.compress_size)
                        file_size = 0xffffffff
                        compress_size = 0xffffffff
                    else:
                        file_size = zinfo.file_size
                        compress_size = zinfo.compress_size

                    if zinfo.header_offset > ZIP64_LIMIT:
                        extra.append(zinfo.header_offset)
                        header_offset = 0xffffffffL
                    else:
                        header_offset = zinfo.header_offset

                    extra_data = zinfo.extra
                    if extra:
                        # Append a ZIP64 field to the extra's
                        extra_data = struct.pack(
                                '<HH' + 'Q'*len(extra),
                                1, 8*len(extra), *extra) + extra_data

                        extract_version = max(45, zinfo.extract_version)
                        create_version = max(45, zinfo.create_version)
                    else:
                        extract_version = zinfo.extract_version
                        create_version = zinfo.create_version

                    try:
                        filename, flag_bits = zinfo._encodeFilenameFlags()
                        centdir = struct.pack(structCentralDir,
                        stringCentralDir, create_version,
                        zinfo.create_system, extract_version, zinfo.reserved,
                        flag_bits, zinfo.compress_type, dostime, dosdate,
                        zinfo.CRC, compress_size, file_size,
                        len(filename), len(extra_data), len(zinfo.comment),
                        0, zinfo.internal_attr, zinfo.external_attr,
                        header_offset)
                    except DeprecationWarning:
                        print >>sys.stderr, (structCentralDir,
                        stringCentralDir, create_version,
                        zinfo.create_system, extract_version, zinfo.reserved,
                        zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,
                        zinfo.CRC, compress_size, file_size,
                        len(zinfo.filename), len(extra_data), len(zinfo.comment),
                        0, zinfo.internal_attr, zinfo.external_attr,
                        header_offset)
                        raise
                    self.fp.write(centdir)
                    self.fp.write(filename)
                    self.fp.write(extra_data)
                    self.fp.write(zinfo.comment)

                pos2 = self.fp.tell()
                # Write end-of-zip-archive record
                centDirCount = len(self.filelist)
                centDirSize = pos2 - pos1
                centDirOffset = pos1
                requires_zip64 = None
                if centDirCount > ZIP_FILECOUNT_LIMIT:
                    requires_zip64 = "Files count"
                elif centDirOffset > ZIP64_LIMIT:
                    requires_zip64 = "Central directory offset"
                elif centDirSize > ZIP64_LIMIT:
                    requires_zip64 = "Central directory size"
                if requires_zip64:
                    # Need to write the ZIP64 end-of-archive records
                    if not self._allowZip64:
                        raise LargeZipFile(requires_zip64 +
                                           " would require ZIP64 extensions")
                    zip64endrec = struct.pack(
                            structEndArchive64, stringEndArchive64,
                            44, 45, 45, 0, 0, centDirCount, centDirCount,
                            centDirSize, centDirOffset)
                    self.fp.write(zip64endrec)

                    zip64locrec = struct.pack(
                            structEndArchive64Locator,
                            stringEndArchive64Locator, 0, pos2, 1)
                    self.fp.write(zip64locrec)
                    centDirCount = min(centDirCount, 0xFFFF)
                    centDirSize = min(centDirSize, 0xFFFFFFFF)
                    centDirOffset = min(centDirOffset, 0xFFFFFFFF)

                endrec = struct.pack(structEndArchive, stringEndArchive,
                                    0, 0, centDirCount, centDirCount,
                                    centDirSize, centDirOffset, len(self._comment))
                self.fp.write(endrec)
                self.fp.write(self._comment)
                self.fp.flush()
        finally:
            fp = self.fp
            self.fp = None
            if not self._filePassed:
                fp.close()

ZipFile
ZipFile
  1 class TarFile(object):
  2     """The TarFile Class provides an interface to tar archives.
  3     """
  4 
  5     debug = 0                   # May be set from 0 (no msgs) to 3 (all msgs)
  6 
  7     dereference = False         # If true, add content of linked file to the
  8                                 # tar file, else the link.
  9 
 10     ignore_zeros = False        # If true, skips empty or invalid blocks and
 11                                 # continues processing.
 12 
 13     errorlevel = 1              # If 0, fatal errors only appear in debug
 14                                 # messages (if debug >= 0). If > 0, errors
 15                                 # are passed to the caller as exceptions.
 16 
 17     format = DEFAULT_FORMAT     # The format to use when creating an archive.
 18 
 19     encoding = ENCODING         # Encoding for 8-bit character strings.
 20 
 21     errors = None               # Error handler for unicode conversion.
 22 
 23     tarinfo = TarInfo           # The default TarInfo class to use.
 24 
 25     fileobject = ExFileObject   # The default ExFileObject class to use.
 26 
 27     def __init__(self, name=None, mode="r", fileobj=None, format=None,
 28             tarinfo=None, dereference=None, ignore_zeros=None, encoding=None,
 29             errors=None, pax_headers=None, debug=None, errorlevel=None):
 30         """Open an (uncompressed) tar archive `name'. `mode' is either 'r' to
 31            read from an existing archive, 'a' to append data to an existing
 32            file or 'w' to create a new file overwriting an existing one. `mode'
 33            defaults to 'r'.
 34            If `fileobj' is given, it is used for reading or writing data. If it
 35            can be determined, `mode' is overridden by `fileobj's mode.
 36            `fileobj' is not closed, when TarFile is closed.
 37         """
 38         modes = {"r": "rb", "a": "r+b", "w": "wb"}
 39         if mode not in modes:
 40             raise ValueError("mode must be 'r', 'a' or 'w'")
 41         self.mode = mode
 42         self._mode = modes[mode]
 43 
 44         if not fileobj:
 45             if self.mode == "a" and not os.path.exists(name):
 46                 # Create nonexistent files in append mode.
 47                 self.mode = "w"
 48                 self._mode = "wb"
 49             fileobj = bltn_open(name, self._mode)
 50             self._extfileobj = False
 51         else:
 52             if name is None and hasattr(fileobj, "name"):
 53                 name = fileobj.name
 54             if hasattr(fileobj, "mode"):
 55                 self._mode = fileobj.mode
 56             self._extfileobj = True
 57         self.name = os.path.abspath(name) if name else None
 58         self.fileobj = fileobj
 59 
 60         # Init attributes.
 61         if format is not None:
 62             self.format = format
 63         if tarinfo is not None:
 64             self.tarinfo = tarinfo
 65         if dereference is not None:
 66             self.dereference = dereference
 67         if ignore_zeros is not None:
 68             self.ignore_zeros = ignore_zeros
 69         if encoding is not None:
 70             self.encoding = encoding
 71 
 72         if errors is not None:
 73             self.errors = errors
 74         elif mode == "r":
 75             self.errors = "utf-8"
 76         else:
 77             self.errors = "strict"
 78 
 79         if pax_headers is not None and self.format == PAX_FORMAT:
 80             self.pax_headers = pax_headers
 81         else:
 82             self.pax_headers = {}
 83 
 84         if debug is not None:
 85             self.debug = debug
 86         if errorlevel is not None:
 87             self.errorlevel = errorlevel
 88 
 89         # Init datastructures.
 90         self.closed = False
 91         self.members = []       # list of members as TarInfo objects
 92         self._loaded = False    # flag if all members have been read
 93         self.offset = self.fileobj.tell()
 94                                 # current position in the archive file
 95         self.inodes = {}        # dictionary caching the inodes of
 96                                 # archive members already added
 97 
 98         try:
 99             if self.mode == "r":
100                 self.firstmember = None
101                 self.firstmember = self.next()
102 
103             if self.mode == "a":
104                 # Move to the end of the archive,
105                 # before the first empty block.
106                 while True:
107                     self.fileobj.seek(self.offset)
108                     try:
109                         tarinfo = self.tarinfo.fromtarfile(self)
110                         self.members.append(tarinfo)
111                     except EOFHeaderError:
112                         self.fileobj.seek(self.offset)
113                         break
114                     except HeaderError, e:
115                         raise ReadError(str(e))
116 
117             if self.mode in "aw":
118                 self._loaded = True
119 
120                 if self.pax_headers:
121                     buf = self.tarinfo.create_pax_global_header(self.pax_headers.copy())
122                     self.fileobj.write(buf)
123                     self.offset += len(buf)
124         except:
125             if not self._extfileobj:
126                 self.fileobj.close()
127             self.closed = True
128             raise
129 
130     def _getposix(self):
131         return self.format == USTAR_FORMAT
132     def _setposix(self, value):
133         import warnings
134         warnings.warn("use the format attribute instead", DeprecationWarning,
135                       2)
136         if value:
137             self.format = USTAR_FORMAT
138         else:
139             self.format = GNU_FORMAT
140     posix = property(_getposix, _setposix)
141 
142     #--------------------------------------------------------------------------
143     # Below are the classmethods which act as alternate constructors to the
144     # TarFile class. The open() method is the only one that is needed for
145     # public use; it is the "super"-constructor and is able to select an
146     # adequate "sub"-constructor for a particular compression using the mapping
147     # from OPEN_METH.
148     #
149     # This concept allows one to subclass TarFile without losing the comfort of
150     # the super-constructor. A sub-constructor is registered and made available
151     # by adding it to the mapping in OPEN_METH.
152 
153     @classmethod
154     def open(cls, name=None, mode="r", fileobj=None, bufsize=RECORDSIZE, **kwargs):
155         """Open a tar archive for reading, writing or appending. Return
156            an appropriate TarFile class.
157 
158            mode:
159            'r' or 'r:*' open for reading with transparent compression
160            'r:'         open for reading exclusively uncompressed
161            'r:gz'       open for reading with gzip compression
162            'r:bz2'      open for reading with bzip2 compression
163            'a' or 'a:'  open for appending, creating the file if necessary
164            'w' or 'w:'  open for writing without compression
165            'w:gz'       open for writing with gzip compression
166            'w:bz2'      open for writing with bzip2 compression
167 
168            'r|*'        open a stream of tar blocks with transparent compression
169            'r|'         open an uncompressed stream of tar blocks for reading
170            'r|gz'       open a gzip compressed stream of tar blocks
171            'r|bz2'      open a bzip2 compressed stream of tar blocks
172            'w|'         open an uncompressed stream for writing
173            'w|gz'       open a gzip compressed stream for writing
174            'w|bz2'      open a bzip2 compressed stream for writing
175         """
176 
177         if not name and not fileobj:
178             raise ValueError("nothing to open")
179 
180         if mode in ("r", "r:*"):
181             # Find out which *open() is appropriate for opening the file.
182             for comptype in cls.OPEN_METH:
183                 func = getattr(cls, cls.OPEN_METH[comptype])
184                 if fileobj is not None:
185                     saved_pos = fileobj.tell()
186                 try:
187                     return func(name, "r", fileobj, **kwargs)
188                 except (ReadError, CompressionError), e:
189                     if fileobj is not None:
190                         fileobj.seek(saved_pos)
191                     continue
192             raise ReadError("file could not be opened successfully")
193 
194         elif ":" in mode:
195             filemode, comptype = mode.split(":", 1)
196             filemode = filemode or "r"
197             comptype = comptype or "tar"
198 
199             # Select the *open() function according to
200             # given compression.
201             if comptype in cls.OPEN_METH:
202                 func = getattr(cls, cls.OPEN_METH[comptype])
203             else:
204                 raise CompressionError("unknown compression type %r" % comptype)
205             return func(name, filemode, fileobj, **kwargs)
206 
207         elif "|" in mode:
208             filemode, comptype = mode.split("|", 1)
209             filemode = filemode or "r"
210             comptype = comptype or "tar"
211 
212             if filemode not in ("r", "w"):
213                 raise ValueError("mode must be 'r' or 'w'")
214 
215             stream = _Stream(name, filemode, comptype, fileobj, bufsize)
216             try:
217                 t = cls(name, filemode, stream, **kwargs)
218             except:
219                 stream.close()
220                 raise
221             t._extfileobj = False
222             return t
223 
224         elif mode in ("a", "w"):
225             return cls.taropen(name, mode, fileobj, **kwargs)
226 
227         raise ValueError("undiscernible mode")
228 
229     @classmethod
230     def taropen(cls, name, mode="r", fileobj=None, **kwargs):
231         """Open uncompressed tar archive name for reading or writing.
232         """
233         if mode not in ("r", "a", "w"):
234             raise ValueError("mode must be 'r', 'a' or 'w'")
235         return cls(name, mode, fileobj, **kwargs)
236 
237     @classmethod
238     def gzopen(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
239         """Open gzip compressed tar archive name for reading or writing.
240            Appending is not allowed.
241         """
242         if mode not in ("r", "w"):
243             raise ValueError("mode must be 'r' or 'w'")
244 
245         try:
246             import gzip
247             gzip.GzipFile
248         except (ImportError, AttributeError):
249             raise CompressionError("gzip module is not available")
250 
251         try:
252             fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj)
253         except OSError:
254             if fileobj is not None and mode == 'r':
255                 raise ReadError("not a gzip file")
256             raise
257 
258         try:
259             t = cls.taropen(name, mode, fileobj, **kwargs)
260         except IOError:
261             fileobj.close()
262             if mode == 'r':
263                 raise ReadError("not a gzip file")
264             raise
265         except:
266             fileobj.close()
267             raise
268         t._extfileobj = False
269         return t
270 
271     @classmethod
272     def bz2open(cls, name, mode="r", fileobj=None, compresslevel=9, **kwargs):
273         """Open bzip2 compressed tar archive name for reading or writing.
274            Appending is not allowed.
275         """
276         if mode not in ("r", "w"):
277             raise ValueError("mode must be 'r' or 'w'.")
278 
279         try:
280             import bz2
281         except ImportError:
282             raise CompressionError("bz2 module is not available")
283 
284         if fileobj is not None:
285             fileobj = _BZ2Proxy(fileobj, mode)
286         else:
287             fileobj = bz2.BZ2File(name, mode, compresslevel=compresslevel)
288 
289         try:
290             t = cls.taropen(name, mode, fileobj, **kwargs)
291         except (IOError, EOFError):
292             fileobj.close()
293             if mode == 'r':
294                 raise ReadError("not a bzip2 file")
295             raise
296         except:
297             fileobj.close()
298             raise
299         t._extfileobj = False
300         return t
301 
302     # All *open() methods are registered here.
303     OPEN_METH = {
304         "tar": "taropen",   # uncompressed tar
305         "gz":  "gzopen",    # gzip compressed tar
306         "bz2": "bz2open"    # bzip2 compressed tar
307     }
308 
309     #--------------------------------------------------------------------------
310     # The public methods which TarFile provides:
311 
312     def close(self):
313         """Close the TarFile. In write-mode, two finishing zero blocks are
314            appended to the archive.
315         """
316         if self.closed:
317             return
318 
319         if self.mode in "aw":
320             self.fileobj.write(NUL * (BLOCKSIZE * 2))
321             self.offset += (BLOCKSIZE * 2)
322             # fill up the end with zero-blocks
323             # (like option -b20 for tar does)
324             blocks, remainder = divmod(self.offset, RECORDSIZE)
325             if remainder > 0:
326                 self.fileobj.write(NUL * (RECORDSIZE - remainder))
327 
328         if not self._extfileobj:
329             self.fileobj.close()
330         self.closed = True
331 
332     def getmember(self, name):
333         """Return a TarInfo object for member `name'. If `name' can not be
334            found in the archive, KeyError is raised. If a member occurs more
335            than once in the archive, its last occurrence is assumed to be the
336            most up-to-date version.
337         """
338         tarinfo = self._getmember(name)
339         if tarinfo is None:
340             raise KeyError("filename %r not found" % name)
341         return tarinfo
342 
343     def getmembers(self):
344         """Return the members of the archive as a list of TarInfo objects. The
345            list has the same order as the members in the archive.
346         """
347         self._check()
348         if not self._loaded:    # if we want to obtain a list of
349             self._load()        # all members, we first have to
350                                 # scan the whole archive.
351         return self.members
352 
353     def getnames(self):
354         """Return the members of the archive as a list of their names. It has
355            the same order as the list returned by getmembers().
356         """
357         return [tarinfo.name for tarinfo in self.getmembers()]
358 
359     def gettarinfo(self, name=None, arcname=None, fileobj=None):
360         """Create a TarInfo object for either the file `name' or the file
361            object `fileobj' (using os.fstat on its file descriptor). You can
362            modify some of the TarInfo's attributes before you add it using
363            addfile(). If given, `arcname' specifies an alternative name for the
364            file in the archive.
365         """
366         self._check("aw")
367 
368         # When fileobj is given, replace name by
369         # fileobj's real name.
370         if fileobj is not None:
371             name = fileobj.name
372 
373         # Building the name of the member in the archive.
374         # Backward slashes are converted to forward slashes,
375         # Absolute paths are turned to relative paths.
376         if arcname is None:
377             arcname = name
378         drv, arcname = os.path.splitdrive(arcname)
379         arcname = arcname.replace(os.sep, "/")
380         arcname = arcname.lstrip("/")
381 
382         # Now, fill the TarInfo object with
383         # information specific for the file.
384         tarinfo = self.tarinfo()
385         tarinfo.tarfile = self
386 
387         # Use os.stat or os.lstat, depending on platform
388         # and if symlinks shall be resolved.
389         if fileobj is None:
390             if hasattr(os, "lstat") and not self.dereference:
391                 statres = os.lstat(name)
392             else:
393                 statres = os.stat(name)
394         else:
395             statres = os.fstat(fileobj.fileno())
396         linkname = ""
397 
398         stmd = statres.st_mode
399         if stat.S_ISREG(stmd):
400             inode = (statres.st_ino, statres.st_dev)
401             if not self.dereference and statres.st_nlink > 1 and \
402                     inode in self.inodes and arcname != self.inodes[inode]:
403                 # Is it a hardlink to an already
404                 # archived file?
405                 type = LNKTYPE
406                 linkname = self.inodes[inode]
407             else:
408                 # The inode is added only if its valid.
409                 # For win32 it is always 0.
410                 type = REGTYPE
411                 if inode[0]:
412                     self.inodes[inode] = arcname
413         elif stat.S_ISDIR(stmd):
414             type = DIRTYPE
415         elif stat.S_ISFIFO(stmd):
416             type = FIFOTYPE
417         elif stat.S_ISLNK(stmd):
418             type = SYMTYPE
419             linkname = os.readlink(name)
420         elif stat.S_ISCHR(stmd):
421             type = CHRTYPE
422         elif stat.S_ISBLK(stmd):
423             type = BLKTYPE
424         else:
425             return None
426 
427         # Fill the TarInfo object with all
428         # information we can get.
429         tarinfo.name = arcname
430         tarinfo.mode = stmd
431         tarinfo.uid = statres.st_uid
432         tarinfo.gid = statres.st_gid
433         if type == REGTYPE:
434             tarinfo.size = statres.st_size
435         else:
436             tarinfo.size = 0L
437         tarinfo.mtime = statres.st_mtime
438         tarinfo.type = type
439         tarinfo.linkname = linkname
440         if pwd:
441             try:
442                 tarinfo.uname = pwd.getpwuid(tarinfo.uid)[0]
443             except KeyError:
444                 pass
445         if grp:
446             try:
447                 tarinfo.gname = grp.getgrgid(tarinfo.gid)[0]
448             except KeyError:
449                 pass
450 
451         if type in (CHRTYPE, BLKTYPE):
452             if hasattr(os, "major") and hasattr(os, "minor"):
453                 tarinfo.devmajor = os.major(statres.st_rdev)
454                 tarinfo.devminor = os.minor(statres.st_rdev)
455         return tarinfo
456 
457     def list(self, verbose=True):
458         """Print a table of contents to sys.stdout. If `verbose' is False, only
459            the names of the members are printed. If it is True, an `ls -l'-like
460            output is produced.
461         """
462         self._check()
463 
464         for tarinfo in self:
465             if verbose:
466                 print filemode(tarinfo.mode),
467                 print "%s/%s" % (tarinfo.uname or tarinfo.uid,
468                                  tarinfo.gname or tarinfo.gid),
469                 if tarinfo.ischr() or tarinfo.isblk():
470                     print "%10s" % ("%d,%d" \
471                                     % (tarinfo.devmajor, tarinfo.devminor)),
472                 else:
473                     print "%10d" % tarinfo.size,
474                 print "%d-%02d-%02d %02d:%02d:%02d" \
475                       % time.localtime(tarinfo.mtime)[:6],
476 
477             print tarinfo.name + ("/" if tarinfo.isdir() else ""),
478 
479             if verbose:
480                 if tarinfo.issym():
481                     print "->", tarinfo.linkname,
482                 if tarinfo.islnk():
483                     print "link to", tarinfo.linkname,
484             print
485 
486     def add(self, name, arcname=None, recursive=True, exclude=None, filter=None):
487         """Add the file `name' to the archive. `name' may be any type of file
488            (directory, fifo, symbolic link, etc.). If given, `arcname'
489            specifies an alternative name for the file in the archive.
490            Directories are added recursively by default. This can be avoided by
491            setting `recursive' to False. `exclude' is a function that should
492            return True for each filename to be excluded. `filter' is a function
493            that expects a TarInfo object argument and returns the changed
494            TarInfo object, if it returns None the TarInfo object will be
495            excluded from the archive.
496         """
497         self._check("aw")
498 
499         if arcname is None:
500             arcname = name
501 
502         # Exclude pathnames.
503         if exclude is not None:
504             import warnings
505             warnings.warn("use the filter argument instead",
506                     DeprecationWarning, 2)
507             if exclude(name):
508                 self._dbg(2, "tarfile: Excluded %r" % name)
509                 return
510 
511         # Skip if somebody tries to archive the archive...
512         if self.name is not None and os.path.abspath(name) == self.name:
513             self._dbg(2, "tarfile: Skipped %r" % name)
514             return
515 
516         self._dbg(1, name)
517 
518         # Create a TarInfo object from the file.
519         tarinfo = self.gettarinfo(name, arcname)
520 
521         if tarinfo is None:
522             self._dbg(1, "tarfile: Unsupported type %r" % name)
523             return
524 
525         # Change or exclude the TarInfo object.
526         if filter is not None:
527             tarinfo = filter(tarinfo)
528             if tarinfo is None:
529                 self._dbg(2, "tarfile: Excluded %r" % name)
530                 return
531 
532         # Append the tar header and data to the archive.
533         if tarinfo.isreg():
534             with bltn_open(name, "rb") as f:
535                 self.addfile(tarinfo, f)
536 
537         elif tarinfo.isdir():
538             self.addfile(tarinfo)
539             if recursive:
540                 for f in os.listdir(name):
541                     self.add(os.path.join(name, f), os.path.join(arcname, f),
542                             recursive, exclude, filter)
543 
544         else:
545             self.addfile(tarinfo)
546 
547     def addfile(self, tarinfo, fileobj=None):
548         """Add the TarInfo object `tarinfo' to the archive. If `fileobj' is
549            given, tarinfo.size bytes are read from it and added to the archive.
550            You can create TarInfo objects using gettarinfo().
551            On Windows platforms, `fileobj' should always be opened with mode
552            'rb' to avoid irritation about the file size.
553         """
554         self._check("aw")
555 
556         tarinfo = copy.copy(tarinfo)
557 
558         buf = tarinfo.tobuf(self.format, self.encoding, self.errors)
559         self.fileobj.write(buf)
560         self.offset += len(buf)
561 
562         # If there's data to follow, append it.
563         if fileobj is not None:
564             copyfileobj(fileobj, self.fileobj, tarinfo.size)
565             blocks, remainder = divmod(tarinfo.size, BLOCKSIZE)
566             if remainder > 0:
567                 self.fileobj.write(NUL * (BLOCKSIZE - remainder))
568                 blocks += 1
569             self.offset += blocks * BLOCKSIZE
570 
571         self.members.append(tarinfo)
572 
573     def extractall(self, path=".", members=None):
574         """Extract all members from the archive to the current working
575            directory and set owner, modification time and permissions on
576            directories afterwards. `path' specifies a different directory
577            to extract to. `members' is optional and must be a subset of the
578            list returned by getmembers().
579         """
580         directories = []
581 
582         if members is None:
583             members = self
584 
585         for tarinfo in members:
586             if tarinfo.isdir():
587                 # Extract directories with a safe mode.
588                 directories.append(tarinfo)
589                 tarinfo = copy.copy(tarinfo)
590                 tarinfo.mode = 0700
591             self.extract(tarinfo, path)
592 
593         # Reverse sort directories.
594         directories.sort(key=operator.attrgetter('name'))
595         directories.reverse()
596 
597         # Set correct owner, mtime and filemode on directories.
598         for tarinfo in directories:
599             dirpath = os.path.join(path, tarinfo.name)
600             try:
601                 self.chown(tarinfo, dirpath)
602                 self.utime(tarinfo, dirpath)
603                 self.chmod(tarinfo, dirpath)
604             except ExtractError, e:
605                 if self.errorlevel > 1:
606                     raise
607                 else:
608                     self._dbg(1, "tarfile: %s" % e)
609 
610     def extract(self, member, path=""):
611         """Extract a member from the archive to the current working directory,
612            using its full name. Its file information is extracted as accurately
613            as possible. `member' may be a filename or a TarInfo object. You can
614            specify a different directory using `path'.
615         """
616         self._check("r")
617 
618         if isinstance(member, basestring):
619             tarinfo = self.getmember(member)
620         else:
621             tarinfo = member
622 
623         # Prepare the link target for makelink().
624         if tarinfo.islnk():
625             tarinfo._link_target = os.path.join(path, tarinfo.linkname)
626 
627         try:
628             self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
629         except EnvironmentError, e:
630             if self.errorlevel > 0:
631                 raise
632             else:
633                 if e.filename is None:
634                     self._dbg(1, "tarfile: %s" % e.strerror)
635                 else:
636                     self._dbg(1, "tarfile: %s %r" % (e.strerror, e.filename))
637         except ExtractError, e:
638             if self.errorlevel > 1:
639                 raise
640             else:
641                 self._dbg(1, "tarfile: %s" % e)
642 
643     def extractfile(self, member):
644         """Extract a member from the archive as a file object. `member' may be
645            a filename or a TarInfo object. If `member' is a regular file, a
646            file-like object is returned. If `member' is a link, a file-like
647            object is constructed from the link's target. If `member' is none of
648            the above, None is returned.
649            The file-like object is read-only and provides the following
650            methods: read(), readline(), readlines(), seek() and tell()
651         """
652         self._check("r")
653 
654         if isinstance(member, basestring):
655             tarinfo = self.getmember(member)
656         else:
657             tarinfo = member
658 
659         if tarinfo.isreg():
660             return self.fileobject(self, tarinfo)
661 
662         elif tarinfo.type not in SUPPORTED_TYPES:
663             # If a member's type is unknown, it is treated as a
664             # regular file.
665             return self.fileobject(self, tarinfo)
666 
667         elif tarinfo.islnk() or tarinfo.issym():
668             if isinstance(self.fileobj, _Stream):
669                 # A small but ugly workaround for the case that someone tries
670                 # to extract a (sym)link as a file-object from a non-seekable
671                 # stream of tar blocks.
672                 raise StreamError("cannot extract (sym)link as file object")
673             else:
674                 # A (sym)link's file object is its target's file object.
675                 return self.extractfile(self._find_link_target(tarinfo))
676         else:
677             # If there's no data associated with the member (directory, chrdev,
678             # blkdev, etc.), return None instead of a file object.
679             return None
680 
681     def _extract_member(self, tarinfo, targetpath):
682         """Extract the TarInfo object tarinfo to a physical
683            file called targetpath.
684         """
685         # Fetch the TarInfo object for the given name
686         # and build the destination pathname, replacing
687         # forward slashes to platform specific separators.
688         targetpath = targetpath.rstrip("/")
689         targetpath = targetpath.replace("/", os.sep)
690 
691         # Create all upper directories.
692         upperdirs = os.path.dirname(targetpath)
693         if upperdirs and not os.path.exists(upperdirs):
694             # Create directories that are not part of the archive with
695             # default permissions.
696             os.makedirs(upperdirs)
697 
698         if tarinfo.islnk() or tarinfo.issym():
699             self._dbg(1, "%s -> %s" % (tarinfo.name, tarinfo.linkname))
700         else:
701             self._dbg(1, tarinfo.name)
702 
703         if tarinfo.isreg():
704             self.makefile(tarinfo, targetpath)
705         elif tarinfo.isdir():
706             self.makedir(tarinfo, targetpath)
707         elif tarinfo.isfifo():
708             self.makefifo(tarinfo, targetpath)
709         elif tarinfo.ischr() or tarinfo.isblk():
710             self.makedev(tarinfo, targetpath)
711         elif tarinfo.islnk() or tarinfo.issym():
712             self.makelink(tarinfo, targetpath)
713         elif tarinfo.type not in SUPPORTED_TYPES:
714             self.makeunknown(tarinfo, targetpath)
715         else:
716             self.makefile(tarinfo, targetpath)
717 
718         self.chown(tarinfo, targetpath)
719         if not tarinfo.issym():
720             self.chmod(tarinfo, targetpath)
721             self.utime(tarinfo, targetpath)
722 
723     #--------------------------------------------------------------------------
724     # Below are the different file methods. They are called via
725     # _extract_member() when extract() is called. They can be replaced in a
726     # subclass to implement other functionality.
727 
728     def makedir(self, tarinfo, targetpath):
729         """Make a directory called targetpath.
730         """
731         try:
732             # Use a safe mode for the directory, the real mode is set
733             # later in _extract_member().
734             os.mkdir(targetpath, 0700)
735         except EnvironmentError, e:
736             if e.errno != errno.EEXIST:
737                 raise
738 
739     def makefile(self, tarinfo, targetpath):
740         """Make a file called targetpath.
741         """
742         source = self.extractfile(tarinfo)
743         try:
744             with bltn_open(targetpath, "wb") as target:
745                 copyfileobj(source, target)
746         finally:
747             source.close()
748 
749     def makeunknown(self, tarinfo, targetpath):
750         """Make a file from a TarInfo object with an unknown type
751            at targetpath.
752         """
753         self.makefile(tarinfo, targetpath)
754         self._dbg(1, "tarfile: Unknown file type %r, " \
755                      "extracted as regular file." % tarinfo.type)
756 
757     def makefifo(self, tarinfo, targetpath):
758         """Make a fifo called targetpath.
759         """
760         if hasattr(os, "mkfifo"):
761             os.mkfifo(targetpath)
762         else:
763             raise ExtractError("fifo not supported by system")
764 
765     def makedev(self, tarinfo, targetpath):
766         """Make a character or block device called targetpath.
767         """
768         if not hasattr(os, "mknod") or not hasattr(os, "makedev"):
769             raise ExtractError("special devices not supported by system")
770 
771         mode = tarinfo.mode
772         if tarinfo.isblk():
773             mode |= stat.S_IFBLK
774         else:
775             mode |= stat.S_IFCHR
776 
777         os.mknod(targetpath, mode,
778                  os.makedev(tarinfo.devmajor, tarinfo.devminor))
779 
780     def makelink(self, tarinfo, targetpath):
781         """Make a (symbolic) link called targetpath. If it cannot be created
782           (platform limitation), we try to make a copy of the referenced file
783           instead of a link.
784         """
785         if hasattr(os, "symlink") and hasattr(os, "link"):
786             # For systems that support symbolic and hard links.
787             if tarinfo.issym():
788                 if os.path.lexists(targetpath):
789                     os.unlink(targetpath)
790                 os.symlink(tarinfo.linkname, targetpath)
791             else:
792                 # See extract().
793                 if os.path.exists(tarinfo._link_target):
794                     if os.path.lexists(targetpath):
795                         os.unlink(targetpath)
796                     os.link(tarinfo._link_target, targetpath)
797                 else:
798                     self._extract_member(self._find_link_target(tarinfo), targetpath)
799         else:
800             try:
801                 self._extract_member(self._find_link_target(tarinfo), targetpath)
802             except KeyError:
803                 raise ExtractError("unable to resolve link inside archive")
804 
805     def chown(self, tarinfo, targetpath):
806         """Set owner of targetpath according to tarinfo.
807         """
808         if pwd and hasattr(os, "geteuid") and os.geteuid() == 0:
809             # We have to be root to do so.
810             try:
811                 g = grp.getgrnam(tarinfo.gname)[2]
812             except KeyError:
813                 g = tarinfo.gid
814             try:
815                 u = pwd.getpwnam(tarinfo.uname)[2]
816             except KeyError:
817                 u = tarinfo.uid
818             try:
819                 if tarinfo.issym() and hasattr(os, "lchown"):
820                     os.lchown(targetpath, u, g)
821                 else:
822                     if sys.platform != "os2emx":
823                         os.chown(targetpath, u, g)
824             except EnvironmentError, e:
825                 raise ExtractError("could not change owner")
826 
827     def chmod(self, tarinfo, targetpath):
828         """Set file permissions of targetpath according to tarinfo.
829         """
830         if hasattr(os, 'chmod'):
831             try:
832                 os.chmod(targetpath, tarinfo.mode)
833             except EnvironmentError, e:
834                 raise ExtractError("could not change mode")
835 
836     def utime(self, tarinfo, targetpath):
837         """Set modification time of targetpath according to tarinfo.
838         """
839         if not hasattr(os, 'utime'):
840             return
841         try:
842             os.utime(targetpath, (tarinfo.mtime, tarinfo.mtime))
843         except EnvironmentError, e:
844             raise ExtractError("could not change modification time")
845 
846     #--------------------------------------------------------------------------
847     def next(self):
848         """Return the next member of the archive as a TarInfo object, when
849            TarFile is opened for reading. Return None if there is no more
850            available.
851         """
852         self._check("ra")
853         if self.firstmember is not None:
854             m = self.firstmember
855             self.firstmember = None
856             return m
857 
858         # Read the next block.
859         self.fileobj.seek(self.offset)
860         tarinfo = None
861         while True:
862             try:
863                 tarinfo = self.tarinfo.fromtarfile(self)
864             except EOFHeaderError, e:
865                 if self.ignore_zeros:
866                     self._dbg(2, "0x%X: %s" % (self.offset, e))
867                     self.offset += BLOCKSIZE
868                     continue
869             except InvalidHeaderError, e:
870                 if self.ignore_zeros:
871                     self._dbg(2, "0x%X: %s" % (self.offset, e))
872                     self.offset += BLOCKSIZE
873                     continue
874                 elif self.offset == 0:
875                     raise ReadError(str(e))
876             except EmptyHeaderError:
877                 if self.offset == 0:
878                     raise ReadError("empty file")
879             except TruncatedHeaderError, e:
880                 if self.offset == 0:
881                     raise ReadError(str(e))
882             except SubsequentHeaderError, e:
883                 raise ReadError(str(e))
884             break
885 
886         if tarinfo is not None:
887             self.members.append(tarinfo)
888         else:
889             self._loaded = True
890 
891         return tarinfo
892 
893     #--------------------------------------------------------------------------
894     # Little helper methods:
895 
896     def _getmember(self, name, tarinfo=None, normalize=False):
897         """Find an archive member by name from bottom to top.
898            If tarinfo is given, it is used as the starting point.
899         """
900         # Ensure that all members have been loaded.
901         members = self.getmembers()
902 
903         # Limit the member search list up to tarinfo.
904         if tarinfo is not None:
905             members = members[:members.index(tarinfo)]
906 
907         if normalize:
908             name = os.path.normpath(name)
909 
910         for member in reversed(members):
911             if normalize:
912                 member_name = os.path.normpath(member.name)
913             else:
914                 member_name = member.name
915 
916             if name == member_name:
917                 return member
918 
919     def _load(self):
920         """Read through the entire archive file and look for readable
921            members.
922         """
923         while True:
924             tarinfo = self.next()
925             if tarinfo is None:
926                 break
927         self._loaded = True
928 
929     def _check(self, mode=None):
930         """Check if TarFile is still open, and if the operation's mode
931            corresponds to TarFile's mode.
932         """
933         if self.closed:
934             raise IOError("%s is closed" % self.__class__.__name__)
935         if mode is not None and self.mode not in mode:
936             raise IOError("bad operation for mode %r" % self.mode)
937 
938     def _find_link_target(self, tarinfo):
939         """Find the target member of a symlink or hardlink member in the
940            archive.
941         """
942         if tarinfo.issym():
943             # Always search the entire archive.
944             linkname = "/".join(filter(None, (os.path.dirname(tarinfo.name), tarinfo.linkname)))
945             limit = None
946         else:
947             # Search the archive before the link, because a hard link is
948             # just a reference to an already archived file.
949             linkname = tarinfo.linkname
950             limit = tarinfo
951 
952         member = self._getmember(linkname, tarinfo=limit, normalize=True)
953         if member is None:
954             raise KeyError("linkname %r not found" % linkname)
955         return member
956 
957     def __iter__(self):
958         """Provide an iterator object.
959         """
960         if self._loaded:
961             return iter(self.members)
962         else:
963             return TarIter(self)
964 
965     def _dbg(self, level, msg):
966         """Write debugging output to sys.stderr.
967         """
968         if level <= self.debug:
969             print >> sys.stderr, msg
970 
971     def __enter__(self):
972         self._check()
973         return self
974 
975     def __exit__(self, type, value, traceback):
976         if type is None:
977             self.close()
978         else:
979             # An exception occurred. We must not call close() because
980             # it would try to write end-of-archive blocks and padding.
981             if not self._extfileobj:
982                 self.fileobj.close()
983             self.closed = True
984 # class TarFile
985 
986 TarFile
TarFile

 

(六)json和pickle模块

用于序列化的两个模块

  • json,用于字符串 和 python数据类型间进行转换

  • '''
    
    
    序列化
    
    
    '''
    import json
    info={
        'name':'鲁班',
        'age':22
    }
    
    f=open('test.txt','w')
    
    f.write(json.dumps(info))#用于将Python数据以字符串的形式写入到文件中
    
    f.close()

     

    '''
    
    
    
    反序列化
    
    
    '''
    import json
    #json不同语言之间进行交互
    f = open('test.txt','r')
    
    data=json.loads(f.read())#从文件中加载出Python的数据类型
    
    print(data['age'])
  • pickle,用于python特有的类型 和 python的数据类型间进行转换

  • '''
    序列化
    
    '''
    import  pickle
    def sayhi(name):
        print("hello  python",name)
    
    info = {
        'name':'鲁班',
        'age':22,
        'func':'sayhi'
    }
    f=open("pickle_test.txt",'rb')
    
    pickle.dump(info,f)#==f.write(pickle.dumps(info))
    
    f.close()

     

    '''
    
    反序列化
    
    '''
    
    
    import pickle
    
    f=open("pickle_test.txt",'rb')
    
    data=pickle.load(f)
    print(data["age"])

     

Json模块提供了四个功能:dumps、dump、loads、load

pickle模块提供了四个功能:dumps、dump、loads、load

 

(七)shelve模块

shelve模块是一个简单的k,v将内存数据通过文件持久化的模块,可以持久化任何pickle可支持的python数据格式

'''
利用shelve模块把Python数据写入文件

'''
import shelve

d = shelve.open('shelve_test')  # 打开一个文件
t = '123'
t2 = '123334'

name = ["鲁班", "rain", "test"]
d["test"] = name  # 持久化列表
d["t1"] = t  # 持久化类
d["t2"] = t2

d.close()

 

'''
利用shelve模块从文件中读取Python数据

'''
import shelve
d=shelve.open('shelve_test')  # 打开一个文件
print(d.get("test"))
print(d.get("t1"))
print(d.get("t2"))

 

 

 

 (七)xml处理模块

 xml是实现不同语言或程序之间进行数据交换的协议,跟json差不多,但json使用起来更简单,不过,古时候,在json还没诞生的黑暗年代,

大家只能选择用xml呀,至今很多传统公司如金融行业的很多系统的接口还主要是xml。

xml的格式如下,就是通过<>节点来区别数据结构的:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>
xml_hehe.xml

 

 xml协议在各个语言里的都 是支持的,在python中可以用以下模块操作xml。

import xml.etree.ElementTree as ET
tree = ET.parse("xml_hehe.xml")
root = tree.getroot()
print(root)
print(root.tag)

# 遍历xml文档
for child in root:
    print(child.tag, child.attrib)
    for i in child:
        print(i.tag, i.text,i.attrib)

# 只遍历year 节点
for node in root.iter('year'):
    print(node.tag, node.text)
xml_handle.py

 

 修改和删除xml文档内容

import xml.etree.ElementTree as ET

tree = ET.parse("xml_hehe.xml")
root = tree.getroot()

# 修改
for node in root.iter('year'):
    new_year = int(node.text) + 1
    node.text = str(new_year)
    node.set("updated_by", "Yun")

tree.write("xmltest.xml")

# 删除node
for country in root.findall('country'):
    rank = int(country.find('rank').text)
    if rank > 50:
        root.remove(country)

tree.write('output.xml')
xml_change.py

 

自己创建xml文档

import xml.etree.ElementTree as ET

new_xml = ET.Element("namelist")
Personal = ET.SubElement(new_xml, "Personal", attrib={"enrolled": "yes"})
name = ET.SubElement(Personal,"name")
name.text="鲁班大师"
age = ET.SubElement(Personal, "age", attrib={"checked": "no"})
sex = ET.SubElement(Personal, "sex")
age.text = '33'
sex.text='man'
Personal = ET.SubElement(new_xml, "Personal2", attrib={"enrolled": "no"})
name = ET.SubElement(Personal, "name")
name.text="安琪拉"
sex = ET.SubElement(Personal, "sex")
sex.text='men'
age = ET.SubElement(Personal, "age")
age.text = '19'

et = ET.ElementTree(new_xml)  # 生成文档对象
et.write("test.xml", encoding="utf-8", xml_declaration=True)

ET.dump(new_xml)  # 打印生成的格式
xml_myself.py

 

<?xml version='1.0' encoding='utf-8'?>
<namelist>
    <Personal enrolled="yes">
        <name>鲁班大师</name>
        <age checked="no">33</age>
        <sex>man</sex></Personal>
    <Personal2 enrolled="no">
        <name>安琪拉</name>
        <sex>men</sex>
        <age>19</age></Personal2>
</namelist>
test.xml

 

(八)PyYAML模块

Python也可以很容易的处理ymal文档格式,只不过需要安装一个模块,参考文档:http://pyyaml.org/wiki/PyYAMLDocumentation

 

(九)ConfigParser模块

 

用于生成和修改常见配置文档,当前模块的名称在 python 3.x 版本中变更为 configparser。

 

来看一个好多软件的常见文档格式如下

[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes
 
[bitbucket.org]
User = hg
 
[topsecret.server.com]
Port = 50022
ForwardX11 = no

 

如果想用python生成一个这样的文档怎么做呢?

import configparser

config = configparser.ConfigParser()
config["DEFAULT"] = {'ServerAliveInterval': '45',
                     'Compression': 'yes',
                     'CompressionLevel': '9'}

config['bitbucket.org'] = {}
config['bitbucket.org']['User'] = 'hg'
config['topsecret.server.com'] = {}
topsecret = config['topsecret.server.com']
topsecret['Host Port'] = '50022'  # mutates the parser
topsecret['ForwardX11'] = 'no'  # same here
config['DEFAULT']['ForwardX11'] = 'yes'
with open('example.ini', 'w') as configfile:
    config.write(configfile)
Config_test.py

 

读取Config文档内容

import  configparser
conf = configparser.ConfigParser()
conf.read("example.ini")
print(conf.defaults())
print(conf.sections())

print(conf['bitbucket.org']['user'])
View Code

 

configparser增删改查语法

[section1]
k1 = v1
k2:v2
  
[section2]
k1 = v1
 
import ConfigParser
  
config = ConfigParser.ConfigParser()
config.read('i.cfg')
  
# ########## 读 ##########
#secs = config.sections()
#print secs
#options = config.options('group2')
#print options
  
#item_list = config.items('group2')
#print item_list
  
#val = config.get('group1','key')
#val = config.getint('group1','key')
  
# ########## 改写 ##########
#sec = config.remove_section('group1')
#config.write(open('i.cfg', "w"))
  
#sec = config.has_section('wupeiqi')
#sec = config.add_section('wupeiqi')
#config.write(open('i.cfg', "w"))
  
  
#config.set('group2','k1',11111)
#config.write(open('i.cfg', "w"))
  
#config.remove_option('group2','age')
#config.write(open('i.cfg', "w"))
View Code

 

(十)hashlib模块  

用于加密相关的操作,3.x里代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512MD5 算法

import hashlib

m = hashlib.md5()
m.update(b'hello')
print(m.hexdigest())
m.update(b'world!')
print(m.hexdigest())

m2 = hashlib.md5()
m2.update(b'helloworld!')
print(m2.hexdigest())
#sha256()
hash=hashlib.sha256()
hash.update('微微一笑很倾城'.encode(encoding='utf-8'))
print(hash.hexdigest())
#sha384()
hash1 = hashlib.sha384()
hash1.update('微微一笑很倾城'.encode(encoding='utf-8'))
print(hash1.hexdigest())
#sha512()
hash2 = hashlib.sha512()
hash2.update('微微一笑很倾城'.encode(encoding='utf-8'))
print(hash2.hexdigest())

'''
python 还有一个 hmac 模块,它内部对我们创建 key 和 内容 再进行处理然后再加密
散列消息鉴别码,简称HMAC,是一种基于消息鉴别码MAC(Message Authentication Code)
的鉴别机制。使用HMAC时,消息通讯的双方,通过验证消息中加入的鉴别密钥K
来鉴别消息的真伪;
一般用于网络通信中消息加密,前提是双方先要约定好key,就像接头暗号一样,
然后消息发送把用key把消息加密,接收方用key + 消息明文再加密,
拿加密后的值 跟 发送者的相对比是否相等,这样就能验证消息的真实性,
及发送者的合法性了。
'''
import hmac
h = hmac.new('鲁班大师'.encode(encoding='utf-8'),
             '智障二百五'.encode(encoding='utf-8'))
print (h.hexdigest())
View Code

 

(十一)re模块

常用正则表达式符号

 

'.'     默认匹配除\n之外的任意一个字符,若指定flag DOTALL,则匹配任意字符,包括换行
'^'     匹配字符开头,若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
'$'     匹配字符结尾,或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以
'*'     匹配*号前的字符0次或多次,re.findall("ab*","cabb3abcbbac")  结果为['abb', 'ab', 'a']
'+'     匹配前一个字符1次或多次,re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']
'?'     匹配前一个字符1次或0次
'{m}'   匹配前一个字符m次
'{n,m}' 匹配前一个字符n到m次,re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']
'|'     匹配|左或|右的字符,re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'
'(...)' 分组匹配,re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c
 
'\A'    只从字符开头匹配,re.search("\Aabc","alexabc") 是匹配不到的
'\Z'    匹配字符结尾,同$
'\d'    匹配数字0-9
'\D'    匹配非数字
'\w'    匹配[A-Za-z0-9]
'\W'    匹配非[A-Za-z0-9]
's'     匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 '\t'
 
'(?P<name>...)' 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 
结果{'province': '3714', 'city': '81', 'birthday': '1993'}

 

 

 

演示

>>> import re
>>> re.match('.','dsskdslds211')
<_sre.SRE_Match object; span=(0, 1), match='d'>

 

 

>>>import re
>>> re.match('^ds','dsdsdsdsadj1212')
<_sre.SRE_Match object; span=(0, 2), match='ds'>
>>>

 

 

>>>import re
>>> re.match('^ds\d','ds12123dsdsdsadj1212')
<_sre.SRE_Match object; span=(0, 3), match='ds1'>
>>> re.match('^ds\d+','ds12123dsdsdsadj1212')
<_sre.SRE_Match object; span=(0, 7), match='ds12123'>
>>>

 

 

>>>import re
>>>re.search('k[a-z]+a','sahsaj1212kaHEHEsha12sakasha')
<_sre.SRE_Match object; span=(23, 28), match='kasha'>

 

 

>>>import re
>>>re.search('k[a-zA-Z]+a','sahsaj1212kaHEHEsha12sakasha')
<_sre.SRE_Match object; span=(10, 19), match='kaHEHEsha'>

 

 

>>>import re
>>>re.search('#.+#','as#hello#ha')
<_sre.SRE_Match object; span=(2, 9), match='#hello#'>

 

 

>>>import re
>>>print(re.search('a?','asnksaaaha'))
>>>print(re.search('aa?','asnksaaaha'))
>>>print(re.search('aaa?','asnksaaaha'))
<_sre.SRE_Match object; span=(0, 1), match='a'>
<_sre.SRE_Match object; span=(0, 1), match='a'>
<_sre.SRE_Match object; span=(5, 8), match='aaa'>

 

 

import re
print(re.search('[0-9]{3}','asn1k2sa1213aaha'))
print(re.search('[0-9]{1,3}','asn1k2sa1213aaha'))
<_sre.SRE_Match object; span=(8, 11), match='121'>
<_sre.SRE_Match object; span=(3, 4), match='1'>

 

 

import re
print(re.findall('[0-9]{3}','asn1k2sa1213aaha'))
print(re.findall('[0-9]{1,3}','asn1k2sa1213aaha'))
['121']
['1', '2', '121', '3']

 

 

import re
print(re.findall('abc|ABC','asabcn1k2sABCa1213aaha'))
print(re.search('abc|ABC','asabcn1k2sABCa1213aaha').group())

['abc', 'ABC']
abc

 

import re
print(re.search('(abc){2}','asabcn1abcabcka'))
print(re.search('(abc){2}\|','asabcn1abcabc|ka'))
print(re.search('(abc){2}\|{2}','asabcn1abcabc||ka'))
print(re.search('(abc){2}\|\|=','asabcn1abcabc||=ka'))
print(re.search('(abc){2}(\|\|=){2}','asabcn1abcabc||=||=ka'))
<_sre.SRE_Match object; span=(7, 13), match='abcabc'>
<_sre.SRE_Match object; span=(7, 14), match='abcabc|'>
<_sre.SRE_Match object; span=(7, 15), match='abcabc||'>
<_sre.SRE_Match object; span=(7, 16), match='abcabc||='>
<_sre.SRE_Match object; span=(7, 19), match='abcabc||=||='>

 

 

import re
print(re.search('\A[0-9]+[a-z]\Z','1213a'))
<_sre.SRE_Match object; span=(0, 5), match='1213a'>

 

 

import re
print(re.search('\D+','1213asa |?$#@'))
print(re.search('\W+','1213asa |?$#@'))
print(re.search('\s+','1213asa \r\n\t'))

<_sre.SRE_Match object; span=(4, 13), match='asa |?$#@'>
<_sre.SRE_Match object; span=(7, 13), match=' |?$#@'>
<_sre.SRE_Match object; span=(7, 11), match=' \r\n\t'>

 

 

import re
re.search("(?P<province>[0-9]{2})(?P<city>[0-9]{2})
          (?P<local>[0-9]{2})(?P<birthday>[0-9]{8})",
          "371481199306143242").groupdict("city") 

{'province': '37', 'city': '14', 'local': '81', 'birthday': '19930614'}

正则表达式

在线测试工具 http://tool.chinaz.com/regex/

同一个位置上可以出现的字符的范围。
字符组 : [字符组]
在同一个位置可能出现的各种字符组成了一个字符组,在正则表达式中用[]表示
字符分为很多类,比如数字、字母、标点等等。
假如你现在要求一个位置"只能出现一个数字",那么这个位置上的字符只能是0、1、2...9这10个数之一。

字符:


元字符
匹配内容
. 匹配除换行符以外的任意字符
\w 匹配字母或数字或下划线
\s 匹配任意的空白符
\d 匹配数字
\n 匹配一个换行符
\t 匹配一个制表符
\b 匹配一个单词的结尾
^ 匹配字符串的开始
$ 匹配字符串的结尾
\W
匹配非字母或数字或下划线
\D
匹配非数字
\S
匹配非空白符
a|b
匹配字符a或字符b
()
匹配括号内的表达式,也表示一个组
[...]
匹配字符组中的字符
[^...]
匹配除了字符组中字符的所有字符

 


量词:


量词
用法说明
* 重复零次或更多次
+ 重复一次或更多次
? 重复零次或一次
{n} 重复n次
{n,} 重复n次或更多次
{n,m} 重复n到m次

 


. ^ $


正则 待匹配字符 匹配
结果
说明
海. 海燕海娇海东 海燕海娇海东 匹配所有"海."的字符
^海. 海燕海娇海东 海燕 只从开头匹配"海."
海.$ 海燕海娇海东 海东 只匹配结尾的"海.$"

 


* + ? { }


正则 待匹配字符 匹配
结果
说明
李.? 李杰和李莲英和李二棍子

李杰
李莲
李二

?表示重复零次或一次,即只匹配"李"后面一个任意字符
李.* 李杰和李莲英和李二棍子 李杰和李莲英和李二棍子
*表示重复零次或多次,即匹配"李"后面0或多个任意字符
李.+ 李杰和李莲英和李二棍子 李杰和李莲英和李二棍子
+表示重复一次或多次,即只匹配"李"后面1个或多个任意字符
李.{1,2} 李杰和李莲英和李二棍子

李杰和
李莲英
李二棍

{1,2}匹配1到2次任意字符

注意:前面的*,+,?等都是贪婪匹配,也就是尽可能匹配,后面加?号使其变成惰性匹配


正则 待匹配字符 匹配
结果
说明
李.*? 李杰和李莲英和李二棍子

惰性匹配

 


字符集[][^]


正则 待匹配字符 匹配
结果
说明
李[杰莲英二棍子]* 李杰和李莲英和李二棍子

李杰
李莲英
李二棍子

表示匹配"李"字后面[杰莲英二棍子]的字符任意次
李[^和]* 李杰和李莲英和李二棍子

李杰
李莲英
李二棍子

表示匹配一个不是"和"的字符任意次
[\d] 456bdha3

4
5
6
3

表示匹配任意一个数字,匹配到4个结果
[\d]+ 456bdha3

456
3

表示匹配任意个数字,匹配到2个结果

 


分组 ()与 或 |[^]


身份证号码是一个长度为15或18个字符的字符串,如果是15位则全部由数字组成,首位不能为0;如果是18位,则前17位全部是数字,末位可能是数字或x,下面我们尝试用正则来表示:


正则 待匹配字符 匹配
结果
说明
^[1-9]\d{13,16}[0-9x]$ 110101198001017032

110101198001017032

表示可以匹配一个正确的身份证号
^[1-9]\d{13,16}[0-9x]$ 1101011980010170

1101011980010170

表示也可以匹配这串数字,但这并不是一个正确的身份证号码,它是一个16位的数字
^[1-9]\d{14}(\d{2}[0-9x])?$ 1101011980010170

False

现在不会匹配错误的身份证号了
()表示分组,将\d{2}[0-9x]分成一组,就可以整体约束他们出现的次数为0-1次
^([1-9]\d{16}[0-9x]|[1-9]\d{14})$ 110105199812067023

110105199812067023

表示先匹配[1-9]\d{16}[0-9x]如果没有匹配上就匹配[1-9]\d{14}

 


转义符 \


在正则表达式中,有很多有特殊意义的是元字符,比如\n和\s等,如果要在正则中匹配正常的"\n"而不是"换行符"就需要对"\"进行转义,变成'\\'。


在python中,无论是正则表达式,还是待匹配的内容,都是以字符串的形式出现的,在字符串中\也有特殊的含义,本身还需要转义。所以如果匹配一次"\n",字符串中要写成'\\n',那么正则里就要写成"\\\\n",这样就太麻烦了。这个时候我们就用到了r'\n'这个概念,此时的正则是r'\\n'就可以了。


正则 待匹配字符 匹配
结果
说明
\n \n False
因为在正则表达式中\是有特殊意义的字符,所以要匹配\n本身,用表达式\n无法匹配
\\n \n True
转义\之后变成\\,即可匹配
"\\\\n" '\\n' True
如果在python中,字符串中的'\'也需要转义,所以每一个字符串'\'又需要转义一次
r'\\n' r'\n' True
在字符串之前加r,让整个字符串不转义

 


贪婪匹配


贪婪匹配:在满足匹配时,匹配尽可能长的字符串,默认情况下,采用贪婪匹配


正则 待匹配字符 匹配
结果
说明
<.*>

<script>...<script>

<script>...<script>
默认为贪婪匹配模式,会匹配尽量长的字符串
<.*?> <script>...<script>

<script>
<script>

加上?为将贪婪匹配模式转为非贪婪匹配模式,会匹配尽量短的字符串

几个常用的非贪婪匹配Pattern

*? 重复任意次,但尽可能少重复
+? 重复1次或更多次,但尽可能少重复
?? 重复0次或1次,但尽可能少重复
{n,m}? 重复n到m次,但尽可能少重复
{n,}? 重复n次以上,但尽可能少重复

.*?的用法

复制代码
. 是任意字符
* 是取 0 至 无限长度
? 是非贪婪模式。
何在一起就是 取尽量少的任意字符,一般不会这么单独写,他大多用在:
.*?x

就是取前面任意长度的字符,直到一个x出现
复制代码

re模块下的常用方法


复制代码
import re

ret = re.findall('a', 'eva egon yuan')  # 返回所有满足匹配条件的结果,放在列表里
print(ret) #结果 : ['a', 'a']

ret = re.search('a', 'eva egon yuan').group()
print(ret) #结果 : 'a'
# 函数会在字符串内查找模式匹配,只到找到第一个匹配然后返回一个包含匹配信息的对象,该对象可以
# 通过调用group()方法得到匹配的字符串,如果字符串没有匹配,则返回None。

ret = re.match('a', 'abc').group()  # 同search,不过仅在字符串开始处进行匹配
print(ret)
#结果 : 'a'

ret = re.split('[ab]', 'abcd')  # 先按'a'分割得到''和'bcd',在对''和'bcd'分别按'b'分割
print(ret)  # ['', '', 'cd']

ret = re.sub('\d', 'H', 'eva3egon4yuan4', 1)#将数字替换成'H',参数1表示只替换1个
print(ret) #evaHegon4yuan4

ret = re.subn('\d', 'H', 'eva3egon4yuan4')#将数字替换成'H',返回元组(替换的结果,替换了多少次)
print(ret)

obj = re.compile('\d{3}')  #将正则表达式编译成为一个 正则表达式对象,规则要匹配的是3个数字
ret = obj.search('abc123eeee') #正则表达式对象调用search,参数为待匹配的字符串
print(ret.group())  #结果 : 123

import re
ret = re.finditer('\d', 'ds3sy4784a')   #finditer返回一个存放匹配结果的迭代器
print(ret)  # <callable_iterator object at 0x10195f940>
print(next(ret).group())  #查看第一个结果
print(next(ret).group())  #查看第二个结果
print([i.group() for i in ret])  #查看剩余的左右结果
复制代码

注意:


1 findall的优先级查询:


复制代码
import re

ret = re.findall('www.(baidu|oldboy).com', 'www.oldboy.com')
print(ret)  # ['oldboy']     这是因为findall会优先把匹配结果组里内容返回,如果想要匹配结果,取消权限即可

ret = re.findall('www.(?:baidu|oldboy).com', 'www.oldboy.com')
print(ret)  # ['www.oldboy.com']
复制代码

 


2 split的优先级查询


复制代码
ret=re.split("\d+","eva3egon4yuan")
print(ret) #结果 : ['eva', 'egon', 'yuan']

ret=re.split("(\d+)","eva3egon4yuan")
print(ret) #结果 : ['eva', '3', 'egon', '4', 'yuan']

#在匹配部分加上()之后所切出的结果是不同的,
#没有()的没有保留所匹配的项,但是有()的却能够保留了匹配的项,
#这个在某些需要保留匹配部分的使用过程是非常重要的。
复制代码

正则
待匹配字符
匹配
结果
说明
[0123456789]
8
True
在一个字符组里枚举合法的所有字符,字符组里的任意一个字符
和"待匹配字符"相同都视为可以匹配
[0123456789]
a
False
由于字符组中没有"a"字符,所以不能匹配
[0-9]
7
True
也可以用-表示范围,[0-9]就和[0123456789]是一个意思
[a-z]
s
True
同样的如果要匹配所有的小写字母,直接用[a-z]就可以表示
[A-Z]
B
True
[A-Z]就表示所有的大写字母
[0-9a-fA-F]
e
True
可以匹配数字,大小写形式的a~f,用来验证十六进制字符

 

 

 

 十二string模块

 

str.capitalize() 把字符串的第一个字符大写
str.center(width) 返回一个原字符串居中,并使用空格填充到width长度的新字符串
str.ljust(width) 返回一个原字符串左对齐,用空格填充到指定长度的新字符串
str.rjust(width) 返回一个原字符串右对齐,用空格填充到指定长度的新字符串
str.zfill(width) 返回字符串右对齐,前面用0填充到指定长度的新字符串
str.count(str,[beg,len]) 返回子字符串在原字符串出现次数,beg,len是范围
str.decode(encodeing[,replace]) 解码string,出错引发ValueError异常
str.encode(encodeing[,replace]) 解码string
str.endswith(substr[,beg,end]) 字符串是否以substr结束,beg,end是范围
str.startswith(substr[,beg,end]) 字符串是否以substr开头,beg,end是范围
str.expandtabs(tabsize = 8) 把字符串的tab转为空格,默认为8个
str.find(str,[stat,end]) 查找子字符串在字符串第一次出现的位置,否则返回-1
str.index(str,[beg,end]) 查找子字符串在指定字符中的位置,不存在报异常
str.isalnum() 检查字符串是否以字母和数字组成,是返回true否则False
str.isalpha() 检查字符串是否以纯字母组成,是返回true,否则false
str.isdecimal() 检查字符串是否以纯十进制数字组成,返回布尔值
str.isdigit() 检查字符串是否以纯数字组成,返回布尔值
str.islower() 检查字符串是否全是小写,返回布尔值
str.isupper() 检查字符串是否全是大写,返回布尔值
str.isnumeric() 检查字符串是否只包含数字字符,返回布尔值
str.isspace() 如果str中只包含空格,则返回true,否则FALSE
str.title() 返回标题化的字符串(所有单词首字母大写,其余小写)
str.istitle() 如果字符串是标题化的(参见title())则返回true,否则false
str.join(seq) 以str作为连接符,将一个序列中的元素连接成字符串
str.split(str=‘‘,num) 以str作为分隔符,将一个字符串分隔成一个序列,num是被分隔的字符串
str.splitlines(num) 以行分隔,返回各行内容作为元素的列表
str.lower() 将大写转为小写
str.upper() 转换字符串的小写为大写
str.swapcase() 翻换字符串的大小写
str.lstrip() 去掉字符左边的空格和回车换行符
str.rstrip() 去掉字符右边的空格和回车换行符
str.strip() 去掉字符两边的空格和回车换行符
str.partition(substr) 从substr出现的第一个位置起,将str分割成一个3元组。
str.replace(str1,str2,num) 查找str1替换成str2,num是替换次数
str.rfind(str[,beg,end]) 从右边开始查询子字符串
str.rindex(str,[beg,end]) 从右边开始查找子字符串位置 
str.rpartition(str) 类似partition函数,不过从右边开始查找
str.translate(str,del=‘‘) 按str给出的表转换string的字符,del是要过虑的字符

 

 

十三math模块

 

ceil:取大于等于x的最小的整数值,如果x是一个整数,则返回x
copysign:把y的正负号加到x前面,可以使用0
cos:求x的余弦,x必须是弧度
degrees:把x从弧度转换成角度
e:表示一个常量
exp:返回math.e,也就是2.71828的x次方
expm1:返回math.e的x(其值为2.71828)次方的值减1
fabs:返回x的绝对值
factorial:取x的阶乘的值
floor:取小于等于x的最大的整数值,如果x是一个整数,则返回自身
fmod:得到x/y的余数,其值是一个浮点数
frexp:返回一个元组(m,e),其计算方式为:x分别除0.5和1,得到一个值的范围
fsum:对迭代器里的每个元素进行求和操作
gcd:返回x和y的最大公约数
hypot:如果x是不是无穷大的数字,则返回True,否则返回False
isfinite:如果x是正无穷大或负无穷大,则返回True,否则返回False
isinf:如果x是正无穷大或负无穷大,则返回True,否则返回False
isnan:如果x不是数字True,否则返回False
ldexp:返回x*(2**i)的值
log:返回x的自然对数,默认以e为基数,base参数给定时,将x的对数返回给定的base,计算式为:log(x)/log(base)
log10:返回x的以10为底的对数
log1p:返回x+1的自然对数(基数为e)的值
log2:返回x的基2对数
modf:返回由x的小数部分和整数部分组成的元组
pi:数字常量,圆周率
pow:返回x的y次方,即x**y
radians:把角度x转换成弧度
sin:求x(x为弧度)的正弦值
sqrt:求x的平方根
tan:返回x(x为弧度)的正切值
trunc:返回x的整数部分

 

 

 

十四urllib模块

 

urllib.quote(string[,safe]) 对字符串进行编码。参数safe指定了不需要编码的字符
urllib.unquote(string) 对字符串进行解码
urllib.quote_plus(string[,safe]) 与urllib.quote类似,但这个方法用‘+‘来替换‘ ‘,而quote用‘%20‘来代替‘ ‘
urllib.unquote_plus(string ) 对字符串进行解码
urllib.urlencode(query[,doseq]) 将dict或者包含两个元素的元组列表转换成url参数。
例如 字典{‘name‘:‘wklken‘,‘pwd‘:‘123‘}将被转换为”name=wklken&pwd=123″
urllib.pathname2url(path) 将本地路径转换成url路径
urllib.url2pathname(path) 将url路径转换成本地路径
urllib.urlretrieve(url[,filename[,reporthook[,data]]]) 下载远程数据到本地
filename:指定保存到本地的路径(若未指定该,urllib生成一个临时文件保存数据)
reporthook:回调函数,当连接上服务器、以及相应的数据块传输完毕的时候会触发该回调
data:指post到服务器的数据
rulrs = urllib.urlopen(url[,data[,proxies]]) 抓取网页信息,[data]post数据到Url,proxies设置的代理
urlrs.readline() 跟文件对象使用一样
urlrs.readlines() 跟文件对象使用一样
urlrs.fileno() 跟文件对象使用一样
urlrs.close() 跟文件对象使用一样
urlrs.info() 返回一个httplib.HTTPMessage对象,表示远程服务器返回的头信息
urlrs.getcode() 获取请求返回状态HTTP状态码
urlrs.geturl() 返回请求的URL

 

十五logging模块

 

函数式简单配置

import logging
logging.debug('debug message')
logging.info('info message')
logging.warning('warning message')
logging.error('error message')
logging.critical('critical message') 

 

输出结果:

C:\Python3.6\python.exe H:/test/loggin模块/test1.py
WARNING:root:warning message
ERROR:root:error message
CRITICAL:root:critical message

进程已结束,退出代码0

 

  默认情况下Python的logging模块将日志打印到了标准输出中,且只显示了大于等于WARNING级别的日志,

这说明默认的日志级别设置为WARNING(日志级别等级CRITICAL > ERROR > WARNING > INFO > DEBUG),

默认的日志格式为日志级别:Logger名称:用户输出消息。

 

灵活配置日志级别,日志格式,输出位置:

 

 

配置参数:

logging.basicConfig()函数中可通过具体参数来更改logging模块默认行为,可用参数有:

  filename:用指定的文件名创建FiledHandler,这样日志会被存储在指定的文件中。
  filemode:文件打开方式,在指定了filename时使用这个参数,默认值为“a”还可指定为“w”。
  format:指定handler使用的日志显示格式。
  datefmt:指定日期时间格式。
  level:设置rootlogger(后边会讲解具体概念)的日志级别
  stream:用指定的stream创建StreamHandler。可以指定输出到sys.stderr,sys.stdout或者文件(f=open(‘test.log’,’w’)),默认为sys.stderr。
  若同时列出了filename和stream两个参数,则stream参数会被忽略。 format参数中可能用到的格式化串:   
%(name)s Logger的名字   %(levelno)s 数字形式的日志级别   %(levelname)s 文本形式的日志级别   %(pathname)s 调用日志输出函数的模块的完整路径名,可能没有   %(filename)s 调用日志输出函数的模块的文件名   %(module)s 调用日志输出函数的模块名   %(funcName)s 调用日志输出函数的函数名   %(lineno)d 调用日志输出函数的语句所在的代码行   %(created)f 当前时间,用UNIX标准的表示时间的浮 点数表示   %(relativeCreated)d 输出日志信息时的,自Logger创建以 来的毫秒数   %(asctime)s 字符串形式的当前时间。默认格式是 “2003-07-08 16:49:45,896”。逗号后面的是毫秒   %(thread)d 线程ID。可能没有   %(threadName)s 线程名。可能没有   %(process)d 进程ID。可能没有   %(message)s用户输出的消息

 

 

logger对象配置

import logging

logger = logging.getLogger()
# 创建一个handler,用于写入日志文件
fh = logging.FileHandler('test.log',encoding='utf-8') 

# 再创建一个handler,用于输出到控制台 
ch = logging.StreamHandler() 
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

fh.setLevel(logging.DEBUG)

fh.setFormatter(formatter) 
ch.setFormatter(formatter) 

logger.addHandler(fh) #logger对象可以添加多个fh和ch对象 
logger.addHandler(ch) 

logger.debug('logger debug message') 
logger.info('logger info message') 
logger.warning('logger warning message') 
logger.error('logger error message') 
logger.critical('logger critical message')

 

 

  logging库提供了多个组件:Logger、Handler、Filter、Formatter。

Logger对象提供应用程序可直接使用的接口,Handler发送日志到适当的目的地,Filter提供了过滤日志信息的方法,Formatter指定日志显示格式。

另外,可以通过:logger.setLevel(logging.Debug)设置级别,当然也可以通过fh.setLevel(logging.Debug)单独对某个日志handler设置级别。

 

collections模块

 

在内置数据类型(dict、list、set、tuple)的基础上, collections模块 还提供了几个额外的数据类型:Counter、deque、defaultdict、namedtuple和OrderedDict等。

1.namedtuple: 生成可以使用名字来访问元素内容的tuple

2.deque: 双端队列,可以快速的从另外一侧追加和推出对象

3.Counter: 计数器,主要用来计数

4.OrderedDict: 有序字典

5.defaultdict: 带有默认值的字典

 

namedtuple

们知道 tuple 可以表示不变集合,例如,一个点的二维坐标就可以表示成:

>>> p = (1, 2)

但是,看到(1, 2),很难看出这个tuple是用来表示一个坐标的。也就是说元祖在某些场合并不形象。

这时, namedtuple 就派上了用场:

复制代码
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(1, 2)
>>> p.x
1
>>> p.y
2
复制代码

似的,如果要用坐标和半径表示一个圆,也可以用 namedtuple 定义:

#namedtuple('名称', [属性list]):
Circle = namedtuple('Circle', ['x', 'y', 'r'])

 

deque

使用list存储数据时,按索引访问元素很快,但是插入和删除元素就很慢了,因为list是线性存储,数据量大的时候,插入和删除效率很低。

deque是为了高效实现插入和删除操作的双向列表,适合用于队列和栈:

复制代码
>>> from collections import deque
>>> q = deque(['a', 'b', 'c'])
>>> q.append('x')
>>> q.appendleft('y')
>>> q
deque(['y', 'a', 'b', 'c', 'x'])
复制代码

deque 除了实现list的 append() pop() 外,还支持 appendleft() popleft() ,这样就可以非常高效地往头部添加或删除元素。

 

OrderedDict

*Python3.6中,Dict已经可以记住key加入的顺序了。

如果我们要显示保持Key的顺序,可以用 OrderedDict

复制代码
>>> from collections import OrderedDict
>>> d = dict([('a', 1), ('b', 2), ('c', 3)])
>>> d # dict的Key是无序的
{'a': 1, 'c': 3, 'b': 2}
>>> od = OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> od # OrderedDict的Key是有序的
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
复制代码

意, OrderedDict 的Key会按照插入的顺序排列,不是Key本身排序:

复制代码
>>> od = OrderedDict()
>>> od['z'] = 1
>>> od['y'] = 2
>>> od['x'] = 3
>>> od.keys() # 按照插入的Key的顺序返回
['z', 'y', 'x']
复制代码

 

defaultdict

有如下值集合 [ 11 , 22 , 33 , 44 , 55 , 66 , 77 , 88 , 99 , 90. ..],将所有大于 66 的值保存至字典的第一个key中,将小于 66 的值保存至第二个key的值中。

即: { 'k1' : 大于 66 , 'k2' : 小于 66 }

原生字典解决方法:

values = [11, 22, 33,44,55,66,77,88,99,90]

my_dict = {}

for value in  values:
    if value>66:
        if my_dict.has_key('k1'):
            my_dict['k1'].append(value)
        else:
            my_dict['k1'] = [value]
    else:
        if my_dict.has_key('k2'):
            my_dict['k2'].append(value)
        else:
            my_dict['k2'] = [value]

 

defaultdict字典解决方法:

from collections import defaultdict

values = [11, 22, 33,44,55,66,77,88,99,90]

my_dict = defaultdict(list)

for value in  values:
    if value>66:
        my_dict['k1'].append(value)
    else:
        my_dict['k2'].append(value)

 

使 dict 时,如果引用的Key不存在,就会抛出 KeyError 。如果希望key不存在时,返回一个默认值,就可以用 defaultdict

>>> from collections import defaultdict
>>> dd = defaultdict(lambda: 'N/A')
>>> dd['key1'] = 'abc'
>>> dd['key1'] # key1存在
'abc'
>>> dd['key2'] # key2不存在,返回默认值
'N/A'

 

 

Counter

Counter类的目的是用来跟踪值出现的次数。

它是一个无序的容器类型,以字典的键值对形式存储,其中元素作为key,其计数作为value。

应用示例:

>>> from collections import Counter
>>> c = Counter('abcdeabcdabcaba')
>>> c
Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1})

 

posted @ 2018-10-25 22:24  Mr_Yun  阅读(897)  评论(0编辑  收藏  举报