Python（文件、文件夹压缩处理模块，shelve持久化模块，xml处理模块、ConfigParser文档配置模块、hashlib加密模块，subprocess系统交互模块 log模块）

Posted on 2016-02-29 11:39 善恶美丑阅读(631) 评论(0) 收藏举报

OS模块

提供对操作系统进行调用的接口

os.getcwd() 获取当前工作目录，即当前python脚本工作的目录路径
os.chdir("dirname")  改变当前脚本工作目录；相当于shell下cd
os.curdir  返回当前目录: ('.')
os.pardir  获取当前目录的父目录字符串名：('..')
os.makedirs('dirname1/dirname2')    可生成多层递归目录
os.removedirs('dirname1')    若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
os.mkdir('dirname')    生成单级目录；相当于shell中mkdir dirname
os.rmdir('dirname')    删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
os.listdir('dirname')    列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
os.remove()  删除一个文件
os.rename("oldname","newname")  重命名文件/目录
os.stat('path/filename')  获取文件/目录信息
os.sep    输出操作系统特定的路径分隔符，win下为"\\",Linux下为"/"
os.linesep    输出当前平台使用的行终止符，win下为"\t\n",Linux下为"\n"
os.pathsep    输出用于分割文件路径的字符串
os.name    输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command")  运行shell命令，直接显示
os.environ  获取系统环境变量
os.path.abspath(path)  返回path规范化的绝对路径
os.path.split(path)  将path分割成目录和文件名二元组返回
os.path.dirname(path)  返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path)  返回path最后的文件名。如何path以／或\结尾，那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path)  如果path存在，返回True；如果path不存在，返回False
os.path.isabs(path)  如果path是绝对路径，返回True
os.path.isfile(path)  如果path是一个存在的文件，返回True。否则返回False
os.path.isdir(path)  如果path是一个存在的目录，则返回True。否则返回False
os.path.join(path1[, path2[, ...]])  将多个路径组合后返回，第一个绝对路径之前的参数将被忽略
os.path.getatime(path)  返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path)  返回path所指向的文件或者目录的最后修改时间
os.popen('dir‘).read() 返回执行结果，需要复制变量，进行打印（当前目录的文件和目录）os.system只是返回执行结果

OS模块

SYS模块

sys.argv           命令行参数List，第一个元素是程序本身路径
sys.exit(n)        退出程序，正常退出时exit(0)
sys.version        获取Python解释程序的版本信息
sys.maxint         最大的Int值
sys.path           返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
sys.platform       返回操作系统平台名称
sys.stdout.write('please:')
##示例 -  打印进度条
import sys
import time
for i in range(10):
    sys.stdout.write('#')
    sys.stdout.flush()
    time.sleep(0.3)
val = sys.stdin.readline()[:-1]

SYS模块

注意：上述此进度条是一直叠加输出，并不是准确的进度条效果，如果想准确实现进度条效果，在打印字符串前面加上\r即可

shutil 模块

高级的文件、文件夹、压缩包处理模块

shutil.copyfileobj(fsrc, fdst[, length]) 将文件内容拷贝到另一个文件中，可以部分内容
shutil.copyfile(src, dst) 拷贝文件
shutil.copymode(src, dst) 仅拷贝权限。内容、组、用户均不变
shutil.copystat(src, dst) 拷贝状态的信息，包括：mode bits, atime, mtime, flags
shutil.copy(src, dst) 拷贝文件和权限
shutil.copy2(src, dst) 拷贝文件和状态信息
shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)  递归的去拷贝文件,例如：copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))
shutil.rmtree(path[, ignore_errors[, onerror]]) 递归的去删除文件
shutil.move(src, dst) 递归的去移动文件
shutil.make_archive(base_name, format,...)
创建压缩包并返回文件路径，例如：zip、tar

base_name： 压缩包的文件名，也可以是压缩包的路径。只是文件名时，则保存至当前目录，否则保存至指定路径，
    如：www                        =>保存至当前路径
    如：/Users/wupeiqi/www =>保存至/Users/wupeiqi/
    format：    压缩包种类，“zip”, “tar”, “bztar”，“gztar”
    root_dir：    要压缩的文件夹路径（默认当前目录）
    owner：    用户，默认当前用户
    group：    组，默认当前组
    logger：    用于记录日志，通常是logging.Logger对象
示例：
#将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录
 
import shutil
ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
 
 
#将 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目录
import shutil
ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')

shutil模块

shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的，详细：

import zipfile

# 压缩
z = zipfile.ZipFile('laxi.zip', 'w')
z.write('a.log')
z.write('data.data')
z.close()

# 解压
z = zipfile.ZipFile('laxi.zip', 'r')
z.extractall()
z.close()

Zipfile压缩解压

import tarfile

# 压缩
tar = tarfile.open('your.tar','w')
tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')
tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')
tar.close()

# 解压
tar = tarfile.open('your.tar','r')
tar.extractall()  # 可设置解压地址
tar.close()

tarfile压缩解压

shelve模块

shelve模块是一个简单的k,v将内存数据通过文件持久化的模块，可以持久化任何pickle可支持的python数据格式。（于Python不同的是，python dump多种数据时，只能load一次次读，而不能指定读取，而shelve可以，根据指定的keys去读取）

import shelve
d = shelve.open('shelve_test') #打开一个文件
class Test(object):
    def __init__(self,n):
        self.n = n
t = Test(123)
t2 = Test(123334)
name = ["alex","rain","test"]
d["test"] = name #持久化列表
d["t1"] = t      #持久化类
d["t2"] = t2
d.close()

a=shelve.open('shelve_test')
print(a.get('test'))
b=a.get('t1')
print(b.n)
b2=a.get('t2')
print(b2.n)
'''
['alex', 'rain', 'test']
123
123334
'''

xml处理模块

xml是实现不同语言或程序之间进行数据交换的协议，跟json差不多，但json使用起来更简单，不过，古时候，在json还没诞生的黑暗年代，大家只能选择用xml呀，至今很多传统公司如金融行业的很多系统的接口还主要是xml。

xml的格式如下，就是通过<>节点来区别数据结构的:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

xml协议在各个语言里的都是支持的，在python中可以用以下模块操作xml

import xml.etree.ElementTree as ET
 
tree = ET.parse("xmltest.xml")
root = tree.getroot()
print(root.tag)
 
#遍历xml文档
for child in root:
    print(child.tag, child.attrib)
    for i in child:
        print(i.tag,i.text)
 
#只遍历year 节点
for node in root.iter('year'):
    print(node.tag,node.text)

修改和删除xml文档内容

import xml.etree.ElementTree as ET
 
tree = ET.parse("xmltest.xml")
root = tree.getroot()
 
#修改
for node in root.iter('year'):
    new_year = int(node.text) + 1
    node.text = str(new_year)
    node.set("updated","yes")
 
tree.write("xmltest.xml")
 
 
#删除node
for country in root.findall('country'):
   rank = int(country.find('rank').text)
   if rank > 50:
     root.remove(country)
 
tree.write('output.xml')

自己创建xml文档

import xml.etree.ElementTree as ET
 
 
new_xml = ET.Element("namelist")
name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
age = ET.SubElement(name,"age",attrib={"checked":"no"})
sex = ET.SubElement(name,"sex")
sex.text = '33'
name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
age = ET.SubElement(name2,"age")
age.text = '19'
 
et = ET.ElementTree(new_xml) #生成文档对象
et.write("test.xml", encoding="utf-8",xml_declaration=True)
 
ET.dump(new_xml) #打印生成的格式

ConfigParser模块

用于生成和修改常见配置文档，当前模块的名称在 python 3.x 版本中变更为 configparser。

来看一个好多软件的常见文档格式如下

[DEFAULT]

ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes
 
[bitbucket.org]
User = hg
 
[topsecret.server.com]
Port = 50022
ForwardX11 = no

如果想用python生成一个这样的文档怎么做呢？

import configparser

config = configparser.ConfigParser()
config["DEFAULT"] = {'ServerAliveInterval': '45',
                      'Compression': 'yes',
                     'CompressionLevel': '9'}
config["aaa"] = {'ServerAliveInterval': '45',
                      'Compression': 'yes',
                     'CompressionLevel': '9'}
 # 第二种写入方式
config['bitbucket.org'] = {}
config['bitbucket.org']['User'] = 'hg'

config['topsecret.server.com'] = {}
topsecret = config['topsecret.server.com']
topsecret['Host Port'] = '50022'     # mutates the parser
topsecret['ForwardX11'] = 'no'  # same here
config['DEFAULT']['ForwardX11'] = 'yes'
with open('example.ini', 'w') as configfile:
   config.write(configfile)

写完了还可以再读出来

import configparser
config=configparser.ConfigParser()
print(config.sections())

config.read('example.ini')

print('bitbucket.org' in config)
print(config['bitbucket.org']['user'])
print(config['DEFAULT']['Compression'])
topsecret = config['topsecret.server.com']
print(topsecret['forwardX11'])
print(topsecret['host port'])
for key in config['bitbucket.org']: print(key)
print(config['bitbucket.org']['ForwardX11'])

configparser增删改查语法

import configparser

config = configparser.ConfigParser()
config.read('example.ini')

###另一种读取值的方法#####

A=Read_log.get('esuizhen','User')  # 必须传入两个参数 即为字典的keys
print(A)
A=Read_log.getint('esuizhen','User')  #get 和getint的区别是getint 对于获取的值必须可以转换为int类型
print(A)

# ########## 移除全部类似1层keys ########## sec = config.remove_section('aaa') config.write(open('example2.ini', "w"))

# ######  删除2层keys
config.remove_option('wupeiqi','age')
config.write(open('example2.ini', "w"))

# ######### 增加 ########## sec = config.has_section('wupeiqi') # 查询如果没有就增加 sec = config.add_section('wupeiqi') config['wupeiqi']['age'] = '21' config.write(open('example2.ini', "w")) # ######## 修改 ######### config.set('wupeiqi','age','22') config.write(open('example2.ini', "w"))

hashlib模块　　

用于加密相关的操作，3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法

import hashlib
######## md5 ########

a = hashlib.md5()
a.update(b'Hello')
a.update(b'It,s me')
print(a.digest())  # 2禁制格式
print(a.hexdigest()) # 16禁制格式
a.update(b'aaaaaaaaaaaaaaa')
print(a.hexdigest())

# ######## sha1 ########

hash = hashlib.sha1()
hash.update(b'admin')
print(hash.hexdigest())

# ######## sha256 ########

hash = hashlib.sha256()
hash.update(b'admin')
print(hash.hexdigest())


# ######## sha384 ########

hash = hashlib.sha384()
hash.update(b'admin')
print(hash.hexdigest())

# ######## sha512 ########

hash = hashlib.sha512()
hash.update(b'admin')
print(hash.hexdigest())

还不够吊？python 还有一个 hmac 模块，它内部对我们创建 key 和内容再进行处理然后再加密

# import hmac

# #一般更多用于消息加密，
# import hmac
# h = hmac.new('wueiqi')
# h.update('hellowo')
# print (h.hexdigest())

hashlib模块

The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. This module intends to replace several older modules and functions:

os.system
os.spawn*
The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

The run() function was added in Python 3.5; if you need to retain compatibility with older versions, see the Older high-level API section.

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, shell=False, timeout=None, check=False)
Run the command described by args. Wait for command to complete, then return a CompletedProcess instance.

The arguments shown above are merely the most common ones, described below in Frequently Used Arguments (hence the use of keyword-only notation in the abbreviated signature). The full function signature is largely the same as that of the Popen constructor - apart from timeout, input and check, all the arguments to this function are passed through to that interface.

This does not capture stdout or stderr by default. To do so, pass PIPE for the stdout and/or stderr arguments.

The timeout argument is passed to Popen.communicate(). If the timeout expires, the child process will be killed and waited for. The TimeoutExpired exception will be re-raised after the child process has terminated.

The input argument is passed to Popen.communicate() and thus to the subprocess’s stdin. If used it must be a byte sequence, or a string if universal_newlines=True. When used, the internal Popen object is automatically created withstdin=PIPE, and the stdin argument may not be used as well.

If check is True, and the process exits with a non-zero exit code, a CalledProcessError exception will be raised. Attributes of that exception hold the arguments, the exit code, and stdout and stderr if they were captured.

官方介绍

明细示例：

>>> subprocess.run(["ls", "-l"])  # doesn't capture output
CompletedProcess(args=['ls', '-l'], returncode=0)
 
>>> subprocess.run("exit 1", shell=True, check=True)
Traceback (most recent call last):
  ...
subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1
 
>>> subprocess.run(["ls", "-l", "/dev/null"], stdout=subprocess.PIPE)
CompletedProcess(args=['ls', '-l', '/dev/null'], returncode=0,
stdout=b'crw-rw-rw- 1 root root 1, 3 Jan 23 16:23 /dev/null\n')

import subprocess
# a=subprocess.run('ipconfig');print(a) # 调用系统命令.返回显示值 3.5以前不支持
# subprocess.call('ipconfig')## 在3.5以前的版本中 该写法会报错。多一个参数,只能如下格式
# subprocess.call(['df','-h'])
#或者
# subprocess.call('df -h',shell=True)

# a=subprocess.call('ipconfig');print(a) #3.5以前调用系统命令这样写,但是返回的是返回值，而不是命令执行结果（显示值），若想打印显示值则：
# a=subprocess.Popen('ipconfig',stdout=subprocess.PIPE) #3.5以前的版本
# print(a.stdout.read())   # 打印ipconfig的显示结果而不是打印返回值



# a=subprocess.Popen('df -h',shell=True,stdout=subprocess.PIPE);print(a.stdout.read())
'''
打印执行命令的显示，而不是返回结果（stdout=subprocess.PIPE）表示管道符，意将显示通过管道符传入，
然后通过a.stdout.read()打印出来显示
'''
'''
>>>subprocess.call('sdf',shell=True)# 报错
/bin/sh: sdf: command not found
127
subprocess.check_call('sdf',shell=True)  #抛出异常
/bin/sh: sdf: command not found
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/subprocess.py", line 505, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'sdf' returned non-zero exit status 127
'''

调用subprocess.run(...)是推荐的常用方法，在大多数情况下能满足需求，但如果你可能需要进行一些复杂的与系统的交互的话，你还可以用subprocess.Popen(),语法如下：

p = subprocess.Popen("find / -size +1000000 -exec ls -shl {} \;",shell=True,stdout=subprocess.PIPE)
print(p.stdout.read())

可用参数--

可用参数：

args：shell命令，可以是字符串或者序列类型（如：list，元组）

bufsize：指定缓冲。0 无缓冲,1 行缓冲,其他 缓冲区大小,负值 系统缓冲

stdin, stdout, stderr：分别表示程序的标准输入、输出、错误句柄

preexec_fn：只在Unix平台下有效，用于指定一个可执行对象（callable object），它将在子进程运行之前被调用

close_sfs：在windows平台下，如果close_fds被设置为True，则新创建的子进程将不会继承父进程的输入、输出、错误管道。
所以不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。

shell：同上

cwd：用于设置子进程的当前目录

env：用于指定子进程的环境变量。如果env = None，子进程的环境变量将从父进程中继承。

universal_newlines：不同系统的换行符不同，True -> 同意使用 \n

startupinfo与createionflags只在windows下有效
将被传递给底层的CreateProcess()函数，用于设置子进程的一些属性，如：主窗口的外观，进程的优先级等等

终端输入的命令分为两种：

输入即可得到输出，如：ifconfig
输入进行某环境，依赖再输入，如：python

需要交互的命令示例

import subprocess

obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
obj.stdin.write("print ('hello')\n")
obj.stdin.write("print ('hello1')\n")
obj.stdin.write("print ('hello2')\n")
obj.stdin.write("print ('hello3')\n")

out_error_list = obj.communicate(timeout=10)
print (out_error_list)

ogging模块　　

很多程序都有记录日志的需求，并且日志中包含的信息即有正常的程序访问日志，还可能有错误、警告等信息输出，python的logging模块提供了标准的日志接口，你可以通过它存储各种格式的日志，logging的日志可以分为 debug(), info(), warning(), error() and critical() 5个级别，下面我们看一下怎么用。

最简单用法

import logging
 
logging.warning("user [alex] attempted wrong password more than 3 times")
logging.critical("server is down")
 
#输出
WARNING:root:user [alex] attempted wrong password more than 3 times
CRITICAL:root:server is down

看一下这几个日志级别分别代表什么意思

Level	When it’s used
`DEBUG`	Detailed information, typically of interest only when diagnosing problems.
`INFO`	Confirmation that things are working as expected.
`WARNING`	An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.
`ERROR`	Due to a more serious problem, the software has not been able to perform some function.
`CRITICAL`	A serious error, indicating that the program itself may be unable to continue running.

对于格式，有如下属性可是配置：

如果想把日志写到文件里，也很简单

import logging
 
logging.basicConfig(filename='example.log',level=logging.INFO)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')

其中下面这句中的level=loggin.INFO意思是，把日志纪录级别设置为INFO，也就是说，只有比日志是INFO或比INFO级别更高的日志才会被纪录到文件里，在这个例子，第一条日志是不会被纪录的，如果希望纪录debug的日志，那把日志级别改成DEBUG就行了。

感觉上面的日志格式忘记加上时间啦，日志不知道时间怎么行呢，下面就来加上!

import logging
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
logging.warning('is when this event was logged.')
 
#输出
2016-02-23 11:46:36 AM is when this event was logged.

如果想同时把log打印在屏幕和文件日志里，就需要了解一点复杂的知识了

The logging library takes a modular approach and offers several categories of components: loggers, handlers, filters, and formatters.

Loggers expose the interface that application code directly uses.
Handlers send the log records (created by loggers) to the appropriate destination.
Filters provide a finer grained facility for determining which log records to output.
Formatters specify the layout of log records in the final output.
译:
日志库采用模块化的方法,提供了几种类型的组件:Loggers、Handlers、Filters和Formatters。


Loggers:公开接口,应用程序代码中直接调用接口。
Handlers:发送日志记录到相应的目的地。
Filters:为确定哪些日志记录提供更细节的输出。
:Formatters指定日志记录在最终输出的格式。

示例-----

import logging

#create logger
logger = logging.getLogger('TEST-LOG') #公开接口
logger.setLevel(logging.DEBUG) # 默认级别  # 如果默认级别较高 则下面打印的日志和写入到文件的日志都将在默认级别的下面的等级


# create console handler and set level to debug
ch = logging.StreamHandler()  #输出到屏幕
ch.setLevel(logging.DEBUG)

# create file handler and set level to warning  # 输出到文件
fh = logging.FileHandler("access.log")
fh.setLevel(logging.WARNING)
# create formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') # 规定输出格式

# add formatter to ch and fh
ch.setFormatter(formatter) #哪个对象应用上述格式
fh.setFormatter(formatter)#哪个对象应用上述格式

# add ch and fh to logger
logger.addHandler(ch) #应用到接口
logger.addHandler(fh)#应用到接口

# 'application' code # 输出日志信息
logger.debug('debug message')
logger.info('info message')
logger.warn('warn message')
logger.error('error message')
logger.critical('critical message')

>>>
'''

2016-02-26 20:19:58,917 - TEST-LOG - DEBUG - debug message
2016-02-26 20:19:58,917 - TEST-LOG - INFO - info message
2016-02-26 20:19:58,917 - TEST-LOG - WARNING - warn message
2016-02-26 20:19:58,917 - TEST-LOG - ERROR - error message
2016-02-26 20:19:58,917 - TEST-LOG - CRITICAL - critical message

文件中的数据

2016-02-26 20:20:58,243 - TEST-LOG - WARNING - warn message
2016-02-26 20:20:58,243 - TEST-LOG - ERROR - error message
2016-02-26 20:20:58,244 - TEST-LOG - CRITICAL - critical message

'''

刷新页面返回顶部

善恶美丑

公告