Python之模块(一)
什么是Python模块?
1.1 Python模块简介
模块让你能够有逻辑地组织你的Python代码段。把相关的代码分配到一个 模块里能让你的代码更好用,更易懂。模块也是Python对象,具有随机的名字属性用来绑定或引用。简单地说,模块就是一个保存了Python代码的文件。模块能定义函数,类和变量。模块里也能包含可执行的代码。
一个叫做aname的模块里的Python代码一般都能在一个叫aname.py的文件中找到。下例是个简单的模块support.py。
def print_func( par ): print "Hello : ", par return
1.2 Python模块导入
当解释器遇到import语句,如果模块在当前的搜索路径就会被导入。搜索路径是一个解释器会先进行搜索的所有目录的列表。一个模块只会被导入一次,不管你执行了多少次import。这样可以防止导入模块被一遍又一遍地执行。
# 单模块 import xxx # 嵌套在文件夹下 from xxx import xxx from xxx import * from xxx import xxx as xxx
导入模块其实就是告诉Python解释器去解释那个py文件
- 导入一个py文件,解释器解释该py文件
- 导入一个包,解释器解释该包下的 __init__.py 文件
那么问题来了,导入模块时是根据那个路径作为基准来进行的呢?当你导入一个模块,Python解析器对模块位置的搜索顺序是:
- 当前目录
- 如果不在当前目录,Python 则搜索在 shell 变量 PYTHONPATH 下的每个目录。
- 如果都找不到,Python会察看默认路径。UNIX下,默认路径一般为/usr/local/lib/python/。
模块搜索路径存储在system模块的sys.path变量中。变量里包含当前目录,PYTHONPATH和由安装过程决定的默认目录。
模块路径:
# 获取路径 >>> import sys >>> for i in sys.path: ... print(i) ... C:\Python35-32\python35.zip C:\Python35-32\DLLs C:\Python35-32\lib # 存放标准库 C:\Python35-32 C:\Python35-32\lib\site-packages # 存放第三方库,扩展库 >>> import sys >>> import os >>> pre_path = os.path.abspath('../') >>> sys.path.append(pre_path) >>> for i in sys.path: ... print(i) ... C:\Python35-32\python35.zip C:\Python35-32\DLLs C:\Python35-32\lib C:\Python35-32 C:\Python35-32\lib\site-packages C:\Users
1.3 dir()函数
dir()函数一个排好序的字符串列表,内容是一个模块里定义过的名字。返回的列表容纳了在一个模块里定义的所有模块,变量和函数。如下一个简单的实例:
>>> for i in dir(os): ... print(i) ... .....省略 sys system urandom utime waitpid walk write
1.4 reload()函数
当一个模块被导入到一个脚本,模块顶层部分的代码只会被执行一次。因此,如果你想重新执行模块里顶层部分的代码,可以用reload()函数。该函数会重新导入之前导入过的模块。语法如下:
reload(module_name)
在这里,module_name要直接放模块的名字,而不是一个字符串形式。比如想重载hello模块,如下:
reload(hello)
1.5 模块分类
自定义模块
内置模块
开源模块
开源模块
1.6.1 模块安装
1)使用Python包管理工具
pip # 生成依赖包列表 pip freeze > requirements.txt pip install -r requirements.txt pip官网资料:https://pip.pypa.io/en/stable/quickstart/ easy_install
2)源码安装
# 在使用源码安装时,需要使用到gcc编译和python开发环境,所以,需要先执行: yum install gcc yum install python-devel 或 apt-get python-dev # 源码包安装流程 下载源码 解压源码 进入目录 编译源码 python setup.py build 安装源码 python setup.py install # 安装成功后,模块会自动安装到 sys.path中的某个目录中,如: /usr/lib/python2.7/site-packages/
16.2 模块导入
导入方式与自定义模块相同
16.3 案例讲解
paramiko是一个用于做远程控制的模块,使用该模块可以对远程服务器进行命令或文件操作,值得一说的是,fabric和ansible内部的远程管理就是使用的paramiko来现实。
官网:http://www.paramiko.org/
1)模块安装
# pip 安装 python -m pip install paramiko # Python 2.x python3 -m pip install paramiko # Python 3.x # 源码安装 # pycrypto,由于 paramiko 模块内部依赖pycrypto,所以先下载安装pycrypto # 下载安装 pycrypto wget https://files.cnblogs.com/files/wupeiqi/pycrypto-2.6.1.tar.gz tar -xvf pycrypto-2.6.1.tar.gz cd pycrypto-2.6.1 python setup.py build python setup.py install # 进入python环境,导入Crypto检查是否安装成功 # 下载安装 paramiko wget https://files.cnblogs.com/files/wupeiqi/paramiko-1.10.1.tar.gz tar -xvf paramiko-1.10.1.tar.gz cd paramiko-1.10.1 python setup.py build python setup.py install
# 进入python环境,导入paramiko检查是否安装成功
python3 -c "import paramiko"
2)模块应用
# 通过账号密码连接服务器
import paramiko
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('192.168.1.108', 22, 'alex', '123')
stdin, stdout, stderr = ssh.exec_command('df')
print stdout.read()
ssh.close();
# SSHClient封装Transport
import paramiko
transport = paramiko.Transport(('hostname', 22))
transport.connect(username='wupeiqi', password='123')
ssh = paramiko.SSHClient()
ssh._transport = transport
stdin, stdout, stderr = ssh.exec_command('df')
print(stdout.read())
transport.close()
# 通过密钥链接服务器
import paramiko
private_key_path = '/home/auto/.ssh/id_rsa'
key = paramiko.RSAKey.from_private_key_file(private_key_path)
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('主机名 ', 端口, '用户名', key)
stdin, stdout, stderr = ssh.exec_command('df')
print stdout.read()
ssh.close()
# SSHClient 封装 Transport
import paramiko
private_key = paramiko.RSAKey.from_private_key_file('/home/auto/.ssh/id_rsa')
transport = paramiko.Transport(('hostname', 22))
transport.connect(username='wupeiqi', pkey=private_key)
ssh = paramiko.SSHClient()
ssh._transport = transport
stdin, stdout, stderr = ssh.exec_command('df')
transport.close()
# 上传或者下载文件 - 通过用户名和密码 import os,sys import paramiko t = paramiko.Transport(('192.168.1.1',22)) t.connect(username='root',password='123456') sftp = paramiko.SFTPClient.from_transport(t) sftp.put('/tmp/test.py','/tmp/test.py') # sftp.get('/tmp/test.py','/tmp/test2.py') t.close()
# 上传或下载文件 - 通过密钥 import paramiko pravie_key_path = '/home/auto/.ssh/id_rsa' key = paramiko.RSAKey.from_private_key_file(pravie_key_path) t = paramiko.Transport(('192.168.1.100',22)) t.connect(username='root',pkey=key) sftp = paramiko.SFTPClient.from_transport(t) sftp.put('/tmp/test3.py','/tmp/test3.py') # sftp.get('/tmp/test3.py','/tmp/test4.py') t.close()
#!/usr/bin/env python # -*- coding:utf-8 -*- import paramiko import uuid class SSHConnection(object): def __init__(self, host='172.16.103.191', port=22, username='wupeiqi',pwd='123'): self.host = host self.port = port self.username = username self.pwd = pwd self.__k = None def create_file(self): file_name = str(uuid.uuid4()) with open(file_name,'w') as f: f.write('sb') return file_name def run(self): self.connect() self.upload('/home/wupeiqi/tttttttttttt.py') self.rename('/home/wupeiqi/tttttttttttt.py', '/home/wupeiqi/ooooooooo.py) self.close() def connect(self): transport = paramiko.Transport((self.host,self.port)) transport.connect(username=self.username,password=self.pwd) self.__transport = transport def close(self): self.__transport.close() def upload(self,target_path): # 连接,上传 file_name = self.create_file() sftp = paramiko.SFTPClient.from_transport(self.__transport) # 将location.py 上传至服务器 /tmp/test.py sftp.put(file_name, target_path) def rename(self, old_path, new_path): ssh = paramiko.SSHClient() ssh._transport = self.__transport # 执行命令 cmd = "mv %s %s" % (old_path, new_path,) stdin, stdout, stderr = ssh.exec_command(cmd) # 获取命令结果 result = stdout.read() def cmd(self, command): ssh = paramiko.SSHClient() ssh._transport = self.__transport # 执行命令 stdin, stdout, stderr = ssh.exec_command(command) # 获取命令结果 result = stdout.read() return result ha = SSHConnection() ha.run()
import paramiko import uuid class SSHConnection(object): def __init__(self, host='192.168.11.61', port=22, username='alex',pwd='alex3714'): self.host = host self.port = port self.username = username self.pwd = pwd self.__k = None def run(self): self.connect() pass self.close() def connect(self): transport = paramiko.Transport((self.host,self.port)) transport.connect(username=self.username,password=self.pwd) self.__transport = transport def close(self): self.__transport.close() def cmd(self, command): ssh = paramiko.SSHClient() ssh._transport = self.__transport # 执行命令 stdin, stdout, stderr = ssh.exec_command(command) # 获取命令结果 result = stdout.read() return result def upload(self,local_path, target_path): # 连接,上传 sftp = paramiko.SFTPClient.from_transport(self.__transport) # 将location.py 上传至服务器 /tmp/test.py sftp.put(local_path, target_path) ssh = SSHConnection() ssh.connect() r1 = ssh.cmd('df') ssh.upload('s2.py', "/home/alex/s7.py") ssh.close()
内置模块
Python 带有一个内置模块库,并发布有单独的文档叫Python 库参考手册(以下简称"库参考手册")。有些模块被直接构建在解析器里;这些操作虽然不是语言核心的部分,但是依然被内建进来,一方面是效率的原因,另一方面是为了提供访问操作系统原语如系统调用的功能。这些模块是可配置的,也取决于底层的平台。例如,winreg 模块只在 Windows 系统上提供。有一个特别的模块需要注意: sys,它内置在每一个 Python 解释器中。变量 sys.ps1 和 sys.ps2 定义了主提示符和辅助提示符使用的字符串:
>>> import sys >>> sys.ps1 # 只有在交互式模式中,这两个变量才有定义 '>>> ' >>> sys.ps2 '... ' >>> sys.ps1 = 'mads>>>' # 重新设置ps1,ps2的值 mads>>>
模块中的特殊变量
1)__doc__ 将文档的注释封装到该方法 """这种注释才会被封装"""
#!/usr/bin/env python # encoding: utf-8 """ @version: v1.0 @author: XXX @license: Apache Licence @contact: xxx@qq.com @site: http://madsstudy.blog.51cto.com/ @software: PyCharm @file: mod_index.py @time: 2016/6/11 16:22 """ print(__doc__) 结果: @version: v1.0 @author: xxx @license: Apache Licence @contact: xxx@qq.com @site: http://madsstudy.blog.51cto.com/ @software: PyCharm @file: mod_index.py @time: 2016/6/11 16:22
2)__file__ 获取当前文件的所在路径
print(__file__) E:/开源中国/192.168.21.140/trunk/Python/scripts/S13/s13day06/模块/mod_index.py # 应用案例 import sys import os sys.path.append(os.path.dirname(os.path.abspath(__file__)))
3) __name__只有执行当前文件时候,当前文件的特殊变量__name__ == "__main__"
4)__cached__
5)__package__指定的函数从哪个模块导入
from bin import admin print(__package__) print(admin.__package__) # 结果: None bin
17.5 json和pickel模块
用于序列化的两个模块
1)json
功能:用于字符串和 python数据类型间进行转换
特点:更加适合跨语言,处理Python的基本数据类型
方法:
loads:将字符串形式转化为Python基本数据类型
dumps:将Python基本数据类型转化为字符串
load :读文件,进行反序列化
dump :序列化写文件
2)pickel
功能:用于python特有的类型 和 python的数据类型间进行转换
特点:Python所有类型的序列化,仅适用Python,还有可能发生版本兼容问题,应用场景:游戏自自定义玩家数据
方法:
dumps:
loads:
dump:只支持写字符串类型,不能是二进制
load:
3)应用案例
import json dic = {"name": "alex", "age": 29} # 字典 print(dic, type(dic)) st = json.dumps(dic) print(st, type(st)) li = ["alex", 29] # 列表 print(li, type(li)) st = json.dumps(li) print(st, type(st)) st = '{"k1": "v1" ,"k2": 23}' # 最外层必须使用单引号,里面使用双引号 否则loads时报错 print(st, type(st)) dic = json.loads(st) print(dic, type(dic)) li = ["alex", 29] # 列表 print(li, type(li)) st = json.dump(li, open('user_info.txt', 'w')) st = json.load(open('user_info.txt', 'r')) print(st, type(st))
import pickle li = [11, 22, 33, ] r = pickle.dumps(li) print(r) res = pickle.loads(r) print(res) li = [11, 22, 33, ] pickle.dump(li, open('user_info.pickle', 'wb')) res = pickle.load(open('user_info.pickle', 'rb')) print(res)
17.6 logging模块
很多程序都有记录日志的需求,并且日志中包含的信息即有正常的程序访问日志,还可能有错误、警告等信息输出,python的logging模块提供了标准的日志接口,你可以通过它存储各种格式的日志,logging的日志可以分为 debug(), info(), warning(), error() and critical() 5个级别
简单的将日志打印到屏幕
>>> import logging >>> >>> logging.debug('This is debug message') >>> logging.info('This is info message') >>> logging.warning('This is warning message') WARNING:root:This is warning message >>>
默认情况下,logging将日志打印到屏幕,日志级别为WARNING;
日志级别大小关系为:CRITICAL > ERROR > WARNING > INFO > DEBUG > NOTSET,当然也可以自己定义日志级别。
将日志写入文件
import logging logging.basicConfig(level=logging.DEBUG, # 设置日志级别为debug format='%(asctime)s %(filename)s [line:%(lineno)d] %(levelname)s %(message)s', datefmt='%a, %d %b %Y %H:%M:%S', filename='my_app.log', filemode='w') logging.debug('This is debug message') logging.info('This is info message') logging.warning('This is warning message')
# 日志文件内容
E:\开源中国\192.168.21.140\trunk\Python\scripts\S13\s13day05>cat my_app.log
Mon, 06 Jun 2016 22:34:31 logging_v1.py [line:24] DEBUG This is debug message
Mon, 06 Jun 2016 22:34:31 logging_v1.py [line:25] INFO This is info message
Mon, 06 Jun 2016 22:34:31 logging_v1.py [line:26] WARNING This is warning message
参数讲解:
filename: 指定日志文件名
filemode: 和file函数意义相同,指定日志文件的打开模式,'w'或'a'
format: 指定输出的格式和内容,format可以输出很多有用信息,如上例所示:
%(levelno)s: 打印日志级别的数值
%(levelname)s: 打印日志级别名称
%(pathname)s: 打印当前执行程序的路径,其实就是sys.argv[0]
%(filename)s: 打印当前执行程序名
%(funcName)s: 打印日志的当前函数
%(lineno)d: 打印日志的当前行号
%(asctime)s: 打印日志的时间
%(thread)d: 打印线程ID
%(threadName)s: 打印线程名称
%(process)d: 打印进程ID
%(message)s: 打印日志信息
datefmt: 指定时间格式,同time.strftime()
level: 设置日志级别,默认为logging.WARNING
stream: 指定将日志的输出流,可以指定输出到sys.stderr,sys.stdout或者文件,默认输出到sys.stderr,当stream和filename同时指定时,stream被忽略
同时将日志输出到屏幕和日志
import logging logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s', datefmt='%a, %d %b %Y %H:%M:%S', filename='my_app.log', filemode='w') # 定义一个StreamHandler,将INFO级别或更高的日志信息打印到标准错误,并将其添加到当前的日志处理对象 console = logging.StreamHandler() console.setLevel(logging.INFO) formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s') console.setFormatter(formatter) logging.getLogger('').addHandler(console) logging.debug('This is debug message') logging.info('This is info message') logging.warning('This is warning message') # 屏幕上打印: # root : INFO This is info message # root : WARNING This is warning message # # ./my_app.log文件中内容为: # Sun, 24 May 2009 21:48:54 demo2.py[line:11] DEBUG This is debug message # Sun, 24 May 2009 21:48:54 demo2.py[line:12] INFO This is info message # Sun, 24 May 2009 21:48:54 demo2.py[line:13] WARNING This is warning message
日志回滚
import logging from logging.handlers import RotatingFileHandler # 定义一个RotatingFileHandler,最多备份5个日志文件,每个日志文件最大10M Rthandler = RotatingFileHandler('my_app.log', maxBytes=10*1024*1024, backupCount=5) Rthandler.setLevel(logging.INFO) formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s') Rthandler.setFormatter(formatter) logging.getLogger('').addHandler(Rthandler)
从上例和本例可以看出,logging有一个日志处理的主对象,其它处理方式都是通过addHandler添加进去的。
logging的几种handle方式如下:
logging.StreamHandler: 日志输出到流,可以是sys.stderr、sys.stdout或者文件
logging.FileHandler: 日志输出到文件
日志回滚方式,实际使用时用RotatingFileHandler和TimedRotatingFileHandler
logging.handlers.BaseRotatingHandler
logging.handlers.RotatingFileHandler
logging.handlers.TimedRotatingFileHandler
logging.handlers.SocketHandler: 远程输出日志到TCP/IP sockets
logging.handlers.DatagramHandler: 远程输出日志到UDP sockets
logging.handlers.SMTPHandler: 远程输出日志到邮件地址
logging.handlers.SysLogHandler: 日志输出到syslog
logging.handlers.NTEventLogHandler: 远程输出日志到Windows NT/2000/XP的事件日志
logging.handlers.MemoryHandler: 日志输出到内存中的制定buffer
logging.handlers.HTTPHandler: 通过"GET"或"POST"远程输出到HTTP服务器
由于StreamHandler和FileHandler是常用的日志处理方式,所以直接包含在logging模块中,而其他方式则包含在logging.handlers模块中,
上述其它处理方式的使用请参见python2.5手册!
通过logging.config模块配置日志
#logger.conf ############################################### [loggers] keys=root,example01,example02 [logger_root] level=DEBUG handlers=hand01,hand02 [logger_example01] handlers=hand01,hand02 qualname=example01 propagate=0 [logger_example02] handlers=hand01,hand03 qualname=example02 propagate=0 ############################################### [handlers] keys=hand01,hand02,hand03 [handler_hand01] class=StreamHandler level=INFO formatter=form02 args=(sys.stderr,) [handler_hand02] class=FileHandler level=DEBUG formatter=form01 args=('myapp.log', 'a') [handler_hand03] class=handlers.RotatingFileHandler level=INFO formatter=form02 args=('myapp.log', 'a', 10*1024*1024, 5) ############################################### [formatters] keys=form01,form02 [formatter_form01] format=%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s datefmt=%a, %d %b %Y %H:%M:%S [formatter_form02] format=%(name)-12s: %(levelname)-8s %(message)s datefmt=
例3:
import logging import logging.config logging.config.fileConfig("logger.conf") logger = logging.getLogger("example01") logger.debug('This is debug message') logger.info('This is info message') logger.warning('This is warning message')
例4:
import logging import logging.config logging.config.fileConfig("logger.conf") logger = logging.getLogger("example02") logger.debug('This is debug message') logger.info('This is info message') logger.warning('This is warning message')
logging是线程安全的
17.7 hashlib
用于加密相关的操作,代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法
# Python2.x 在Python3.x中已经废弃 import md5 hash = md5.new() hash.update('admin') print hash.hexdigest() # Python2.x 在Python3.x中已经废弃 import sha hash = sha.new() hash.update('admin') print hash.hexdigest()
import hashlib # ######## md5 ######## obj = hashlib.md5() obj.update(bytes('123', encoding='utf-8')) print("md5:", obj.hexdigest()) # ######## sha1 ######## obj = hashlib.sha1() obj.update(bytes('123', encoding='utf-8')) print("sha1:", obj.hexdigest()) # ######## sha256 ######## obj = hashlib.sha256() obj.update(bytes('123', encoding='utf-8')) print("sha256:", obj.hexdigest()) # ######## sha384 ######## obj = hashlib.sha384() obj.update(bytes('123', encoding='utf-8')) print("sha384:", obj.hexdigest()) # ######## sha512 ######## obj = hashlib.sha512() obj.update(bytes('123', encoding='utf-8')) print("sha512:", obj.hexdigest()) # 结果: # md5: 202cb962ac59075b964b07152d234b70 # sha1: 40bd001563085fc35165329ea1ff5c5ecbdbbeef # sha256: a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3 # sha384: 9a0a82f0c0cf31470d7affede3406cc9aa8410671520b727044eda15b4c25532a9b5cd8aaf9cec4919d76255b6bfb00f # sha512: 3c9909afec25354d551dae21590bb26e38d53f2173b8d3dc3eee4c047e7ab1c1eb8b85103e3be7ba613b31bb5c9c36214dc9f14a42fd7a2fdb84856bca5c44c2
import hashlib print(hashlib.algorithms_available) # 可以使用的加密算法 结果:{'SHA', 'SHA1', 'SHA512', 'md4', 'DSA', 'SHA224', 'sha1', 'sha256', 'md5', 'sha224', 'ecdsa-with-SHA1', 'sha', 'RIPEMD160', 'dsaEncryption', 'ripemd160', 'MD4', 'DSA-SHA', 'whirlpool', 'MD5', 'SHA256', 'dsaWithSHA', 'sha384', 'SHA384', 'sha512'} print(hashlib.algorithms_guaranteed) # python在所有平台上都可以使用的函数,也就是比较稳定的函数 结果:{'sha384', 'sha1', 'sha256', 'md5', 'sha224', 'sha512'}
以上加密算法虽然依然非常厉害,但是还存在撞库的问题。有必要对加密算法中添加自定义key再来做加密。
import hashlib obj = hashlib.md5(bytes('gyyx', encoding='utf-8')) obj.update(bytes('123', encoding='utf-8')) result = obj.hexdigest() print("md5:", result) #结果: # md5: 83a2156b15c1376ae1e93980d7d1af56
#Python 还有一个hmac模块,它内部对创建key和内容再进行处理然后再加密 import hmac obj = hmac.new(b'mykey') obj.update(b'mymessage') print(obj.hexdigest()) 结果: # hmac:d811630c4e62c6ef90d1bfe540212aaf
应用案例:
def md5(arg): """ md5加密 :param arg: 接收用户输入的加密字符串 :return: 返回字符串的加密结果 """ obj = hashlib.md5(bytes('oldboy', encoding='utf-8')) obj.update(bytes(arg, encoding='utf-8')) return obj.hexdigest() user_name = 'admin' pass_word = '1e5413879f3e972653fdcf9101698b29' # 明文:123456 for i in range(3): username = input("请输入用户名:") password = input("请输入密码:") if username == user_name and md5(password) == pass_word: print("登录成功") break else: print("账号或者密码错误")
结果:
17.8 random
17.9 执行系统命令
可执行shell命令的相关模块和函数:
- os.system
- os.spawn*
- os.popen* --废弃
- popen2.* --废弃
- commands.* --废弃,3.x中被异常
import commands result = commands.getoutput('cmd') result = commands.getstatus('cmd') result = commands.getstatusoutput('cmd')
从Python 2.4开始,Python引入subprocess模块来管理子进程,以取代一些旧模块的方法。以上执行shell命令的相关的模块和函数的功能均在 subprocess 模块中实现,并提供了更丰富的功能。
subprocess.call:执行命令,返回状态码(shell=True,允许shell命令是字符串形式)
import subprocess res = subprocess.call('ls -l', shell=True) print(res) res = subprocess.call(["ls", "-l"], shell=False) print(res)
subprocess.check_call:执行命令,如果执行状态是0,则返回0,否则抛异常
import subprocess res = subprocess.check_call(["ls", "-l"]) print(res) res = subprocess.check_call("lss", shell=True) # 执行一个非shell命令 print(res)
subprocess.check_output:执行命令,如果状态码是0,则返回执行结果,否则抛异常
import subprocess res = subprocess.check_output(["echo", "Hello World!"]) print(res) res = subprocess.check_output("exit 1", shell=True) # 抛异常 print(res)
subprocess.Popen():用于执行复杂的系统命令
参数:
- args:shell命令,可以是字符串或者序列类型(如:list,元组)
- bufsize:指定缓冲。0 无缓冲,1 行缓冲,其他 缓冲区大小,负值 系统缓冲
- stdin, stdout, stderr:分别表示程序的标准输入、输出、错误句柄
- preexec_fn:只在Unix平台下有效,用于指定一个可执行对象(callable object),它将在子进程运行之前被调用
- close_sfs:在windows平台下,如果close_fds被设置为True,则新创建的子进程将不会继承父进程的输入、输出、错误管道。所以不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。
- shell:同上
- cwd:用于设置子进程的当前目录
- env:用于指定子进程的环境变量。如果env = None,子进程的环境变量将从父进程中继承。
- universal_newlines:不同系统的换行符不同,True -> 同意使用 \n
- startupinfo与createionflags只在windows下有效,将被传递给底层的CreateProcess()函数,用于设置子进程的一些属性,如:主窗口的外观,进程的优先级等等
import subprocess ret1 = subprocess.Popen(["mkdir","t1"]) ret2 = subprocess.Popen("mkdir t2", shell=True)
终端输入命令分为两种:
- 输入命令即可得到输入,如:ifconfig
- 输入命令进入某种环境,再输入特定的命令,如:Python
# 第一种情况:输入命令即可得到输出结果 import subprocess obj = subprocess.Popen("mkdir t3", shell=True, cwd='/home/dev',) # 第二种情况:进入某种环境之下特定命令,返回结果 import subprocess obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) obj.stdin.write('print 1 \n ') obj.stdin.write('print 2 \n ') obj.stdin.write('print 3 \n ') obj.stdin.write('print 4 \n ') obj.stdin.close() cmd_out = obj.stdout.read() obj.stdout.close() cmd_error = obj.stderr.read() obj.stderr.close() print cmd_out print cmd_error # 第二种情况:进入某种环境之下特定命令,返回结果,相对上一种 将输出和错误输出通过一个管道输出 import subprocess obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) obj.stdin.write('print 1 \n ') obj.stdin.write('print 2 \n ') obj.stdin.write('print 3 \n ') obj.stdin.write('print 4 \n ') out_error_list = obj.communicate() print out_error_list
17.11 shutil
提供高级的文件、文件夹等操作
shutil.copyfileobj(fsrc, fdst[, length]):将文件内容拷贝到另一个文件中,可以部分内容
# 源代码实现 def copyfileobj(fsrc, fdst, length=16*1024): """copy data from file-like object fsrc to file-like object fdst""" while 1: buf = fsrc.read(length) if not buf: break fdst.write(buf)
shutil.copyfile(src, dst):拷贝文件
# 源代码实现 def copyfile(src, dst): """Copy data from src to dst""" if _samefile(src, dst): raise Error("`%s` and `%s` are the same file" % (src, dst)) for fn in [src, dst]: try: st = os.stat(fn) except OSError: # File most likely does not exist pass else: # XXX What about other special files? (sockets, devices...) if stat.S_ISFIFO(st.st_mode): raise SpecialFileError("`%s` is a named pipe" % fn) with open(src, 'rb') as fsrc: with open(dst, 'wb') as fdst: copyfileobj(fsrc, fdst)
shutil.copymode(src, dst):仅拷贝权限。内容、组、用户均不变
# 源代码实现 def copymode(src, dst): """Copy mode bits from src to dst""" if hasattr(os, 'chmod'): st = os.stat(src) mode = stat.S_IMODE(st.st_mode) os.chmod(dst, mode)
shutil.copystat(src, dst):拷贝状态的信息,包括:mode bits, atime, mtime, flags
# 源代码实现 def copystat(src, dst): """Copy all stat info (mode bits, atime, mtime, flags) from src to dst""" st = os.stat(src) mode = stat.S_IMODE(st.st_mode) if hasattr(os, 'utime'): os.utime(dst, (st.st_atime, st.st_mtime)) if hasattr(os, 'chmod'): os.chmod(dst, mode) if hasattr(os, 'chflags') and hasattr(st, 'st_flags'): try: os.chflags(dst, st.st_flags) except OSError, why: for err in 'EOPNOTSUPP', 'ENOTSUP': if hasattr(errno, err) and why.errno == getattr(errno, err): break else: raise
shutil.copy(src, dst):拷贝文件和权限
# 源代码实现 def copy(src, dst): """Copy data and mode bits ("cp src dst"). The destination may be a directory. """ if os.path.isdir(dst): dst = os.path.join(dst, os.path.basename(src)) copyfile(src, dst) copymode(src, dst)
shutil.copy2(src, dst):拷贝文件和状态信息
# 源代码实现 def copy2(src, dst): """Copy data and all stat info ("cp -p src dst"). The destination may be a directory. """ if os.path.isdir(dst): dst = os.path.join(dst, os.path.basename(src)) copyfile(src, dst) copystat(src, dst)
shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None):递归的去拷贝文件
例如:copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))
# 源代码实现 def ignore_patterns(*patterns): """Function that can be used as copytree() ignore parameter. Patterns is a sequence of glob-style patterns that are used to exclude files""" def _ignore_patterns(path, names): ignored_names = [] for pattern in patterns: ignored_names.extend(fnmatch.filter(names, pattern)) return set(ignored_names) return _ignore_patterns def copytree(src, dst, symlinks=False, ignore=None): """Recursively copy a directory tree using copy2(). The destination directory must not already exist. If exception(s) occur, an Error is raised with a list of reasons. If the optional symlinks flag is true, symbolic links in the source tree result in symbolic links in the destination tree; if it is false, the contents of the files pointed to by symbolic links are copied. The optional ignore argument is a callable. If given, it is called with the `src` parameter, which is the directory being visited by copytree(), and `names` which is the list of `src` contents, as returned by os.listdir(): callable(src, names) -> ignored_names Since copytree() is called recursively, the callable will be called once for each directory that is copied. It returns a list of names relative to the `src` directory that should not be copied. XXX Consider this example code rather than the ultimate tool. """ names = os.listdir(src) if ignore is not None: ignored_names = ignore(src, names) else: ignored_names = set() os.makedirs(dst) errors = [] for name in names: if name in ignored_names: continue srcname = os.path.join(src, name) dstname = os.path.join(dst, name) try: if symlinks and os.path.islink(srcname): linkto = os.readlink(srcname) os.symlink(linkto, dstname) elif os.path.isdir(srcname): copytree(srcname, dstname, symlinks, ignore) else: # Will raise a SpecialFileError for unsupported file types copy2(srcname, dstname) # catch the Error from the recursive copytree so that we can # continue with other files except Error, err: errors.extend(err.args[0]) except EnvironmentError, why: errors.append((srcname, dstname, str(why))) try: copystat(src, dst) except OSError, why: if WindowsError is not None and isinstance(why, WindowsError): # Copying file access times may fail on Windows pass else: errors.append((src, dst, str(why))) if errors: raise Error, errors
shutil.rmtree(path[, ignore_errors[, onerror]]):递归的去删除文件
# 源代码实现 def rmtree(path, ignore_errors=False, onerror=None): """Recursively delete a directory tree. If ignore_errors is set, errors are ignored; otherwise, if onerror is set, it is called to handle the error with arguments (func, path, exc_info) where func is os.listdir, os.remove, or os.rmdir; path is the argument to that function that caused it to fail; and exc_info is a tuple returned by sys.exc_info(). If ignore_errors is false and onerror is None, an exception is raised. """ if ignore_errors: def onerror(*args): pass elif onerror is None: def onerror(*args): raise try: if os.path.islink(path): # symlinks to directories are forbidden, see bug #1669 raise OSError("Cannot call rmtree on a symbolic link") except OSError: onerror(os.path.islink, path, sys.exc_info()) # can't continue even if onerror hook returns return names = [] try: names = os.listdir(path) except os.error, err: onerror(os.listdir, path, sys.exc_info()) for name in names: fullname = os.path.join(path, name) try: mode = os.lstat(fullname).st_mode except os.error: mode = 0 if stat.S_ISDIR(mode): rmtree(fullname, ignore_errors, onerror) else: try: os.remove(fullname) except os.error, err: onerror(os.remove, fullname, sys.exc_info()) try: os.rmdir(path) except os.error: onerror(os.rmdir, path, sys.exc_info())
shutil.move(src, dst):递归的去移动文件
# 递归的去移动文件 def move(src, dst): """Recursively move a file or directory to another location. This is similar to the Unix "mv" command. If the destination is a directory or a symlink to a directory, the source is moved inside the directory. The destination path must not already exist. If the destination already exists but is not a directory, it may be overwritten depending on os.rename() semantics. If the destination is on our current filesystem, then rename() is used. Otherwise, src is copied to the destination and then removed. A lot more could be done here... A look at a mv.c shows a lot of the issues this implementation glosses over. """ real_dst = dst if os.path.isdir(dst): if _samefile(src, dst): # We might be on a case insensitive filesystem, # perform the rename anyway. os.rename(src, dst) return real_dst = os.path.join(dst, _basename(src)) if os.path.exists(real_dst): raise Error, "Destination path '%s' already exists" % real_dst try: os.rename(src, real_dst) except OSError: if os.path.isdir(src): if _destinsrc(src, dst): raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst) copytree(src, real_dst, symlinks=True) rmtree(src) else: copy2(src, real_dst) os.unlink(src)
shutil.make_archive(base_name, format,...):创建压缩包并返回文件路径,例如:zip、tar
base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径,
如:www =>保存至当前路径
如:/Users/www =>保存至/Users/
format: 压缩包种类,“zip”, “tar”, “bztar”,“gztar”
root_dir: 要压缩的文件夹路径(默认当前目录)
owner: 用户,默认当前用户
group: 组,默认当前组
logger: 用于记录日志,通常是logging.Logger对象
#将 /Users/Downloads/test 下的文件打包放置当前程序目录 import shutil ret = shutil.make_archive("www", 'gztar', root_dir='/Users/Downloads/test') #将 /Users/Downloads/test 下的文件打包放置 /Users/目录 import shutil ret = shutil.make_archive("/Users/www", 'gztar', root_dir='/Users/Downloads/test')
# 源代码实现 def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0, dry_run=0, owner=None, group=None, logger=None): """Create an archive file (eg. zip or tar). 'base_name' is the name of the file to create, minus any format-specific extension; 'format' is the archive format: one of "zip", "tar", "bztar" or "gztar". 'root_dir' is a directory that will be the root directory of the archive; ie. we typically chdir into 'root_dir' before creating the archive. 'base_dir' is the directory where we start archiving from; ie. 'base_dir' will be the common prefix of all files and directories in the archive. 'root_dir' and 'base_dir' both default to the current directory. Returns the name of the archive file. 'owner' and 'group' are used when creating a tar archive. By default, uses the current owner and group. """ save_cwd = os.getcwd() if root_dir is not None: if logger is not None: logger.debug("changing into '%s'", root_dir) base_name = os.path.abspath(base_name) if not dry_run: os.chdir(root_dir) if base_dir is None: base_dir = os.curdir kwargs = {'dry_run': dry_run, 'logger': logger} try: format_info = _ARCHIVE_FORMATS[format] except KeyError: raise ValueError, "unknown archive format '%s'" % format func = format_info[0] for arg, val in format_info[1]: kwargs[arg] = val if format != 'zip': kwargs['owner'] = owner kwargs['group'] = group try: filename = func(base_name, base_dir, **kwargs) finally: if root_dir is not None: if logger is not None: logger.debug("changing back to '%s'", save_cwd) os.chdir(save_cwd) return filename
shutil对压缩包的处理是调用ZipFile和TarFile两个模块来完成的
import zipfile # 文件压缩 z = zipfile.ZipFile('test.zip', 'w') # z.write('a.txt', 'b.txt') # 这样压缩只会压缩b.txt z.write('a.txt') z.write('b.txt') z.close() # 文件解压 z = zipfile.ZipFile('test.zip', 'r') # z.extractall() # 解压所有 z.extract('a.txt') # 解压指定文件 z.close()
import tarfile # tar打包文件 tar = tarfile.TarFile('test.tar', 'w') tar.add('a.txt') tar.add('bbs2.zip', arcname='bbs2.zip') # 将被打包文件重命名 tar.add('cmdb.zip', arcname='cmdb.zip') tar.close() # 解压单个文件 tar = tarfile.TarFile('test.tar', 'r') a = tar.getmembers('a.txt') tar.extract(a) tar.extractall() # 解压所有文件,还可以设置解压地址 tar.close()
源代码实现:参看tarfile.py及zipfile.py
17.11 xml
XML是实现不同语言或程序之间进行数据交换的协议,XML文件格式如下:
<data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2026</year> <gdppc>59900</gdppc> <neighbor direction="N" name="Malaysia" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data>
1、解析XML
# 利用ElementTree.XML将字符串解析成xml对象 from xml.etree import ElementTree as ET # 打开文件,读取XML内容 str_xml = open('xo.xml', 'r').read() # 将字符串解析成xml特殊对象,root代指xml文件的根节点 root = ET.XML(str_xml) # 利用ElementTree.parse将文件直接解析成xml对象 from xml.etree import ElementTree as ET # 直接解析xml文件 tree = ET.parse("xo.xml") # 获取xml文件的根节点 root = tree.getroot()
2、操作XML
XML格式类型是节点嵌套节点,对于每一个节点均有以下功能,以便对当前节点进行操作:
class Element: """An XML element. This class is the reference implementation of the Element interface. An element's length is its number of subelements. That means if you want to check if an element is truly empty, you should check BOTH its length AND its text attribute. The element tag, attribute names, and attribute values can be either bytes or strings. *tag* is the element name. *attrib* is an optional dictionary containing element attributes. *extra* are additional element attributes given as keyword arguments. Example form: <tag attrib>text<child/>...</tag>tail """ 当前节点的标签名 tag = None """The element's name.""" 当前节点的属性 attrib = None """Dictionary of the element's attributes.""" 当前节点的内容 text = None """ Text before first subelement. This is either a string or the value None. Note that if there is no text, this attribute may be either None or the empty string, depending on the parser. """ tail = None """ Text after this element's end tag, but before the next sibling element's start tag. This is either a string or the value None. Note that if there was no text, this attribute may be either None or an empty string, depending on the parser. """ def __init__(self, tag, attrib={}, **extra): if not isinstance(attrib, dict): raise TypeError("attrib must be dict, not %s" % ( attrib.__class__.__name__,)) attrib = attrib.copy() attrib.update(extra) self.tag = tag self.attrib = attrib self._children = [] def __repr__(self): return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self)) def makeelement(self, tag, attrib): 创建一个新节点 """Create a new element with the same type. *tag* is a string containing the element name. *attrib* is a dictionary containing the element attributes. Do not call this method, use the SubElement factory function instead. """ return self.__class__(tag, attrib) def copy(self): """Return copy of current element. This creates a shallow copy. Subelements will be shared with the original tree. """ elem = self.makeelement(self.tag, self.attrib) elem.text = self.text elem.tail = self.tail elem[:] = self return elem def __len__(self): return len(self._children) def __bool__(self): warnings.warn( "The behavior of this method will change in future versions. " "Use specific 'len(elem)' or 'elem is not None' test instead.", FutureWarning, stacklevel=2 ) return len(self._children) != 0 # emulate old behaviour, for now def __getitem__(self, index): return self._children[index] def __setitem__(self, index, element): # if isinstance(index, slice): # for elt in element: # assert iselement(elt) # else: # assert iselement(element) self._children[index] = element def __delitem__(self, index): del self._children[index] def append(self, subelement): 为当前节点追加一个子节点 """Add *subelement* to the end of this element. The new element will appear in document order after the last existing subelement (or directly after the text, if it's the first subelement), but before the end tag for this element. """ self._assert_is_element(subelement) self._children.append(subelement) def extend(self, elements): 为当前节点扩展 n 个子节点 """Append subelements from a sequence. *elements* is a sequence with zero or more elements. """ for element in elements: self._assert_is_element(element) self._children.extend(elements) def insert(self, index, subelement): 在当前节点的子节点中插入某个节点,即:为当前节点创建子节点,然后插入指定位置 """Insert *subelement* at position *index*.""" self._assert_is_element(subelement) self._children.insert(index, subelement) def _assert_is_element(self, e): # Need to refer to the actual Python implementation, not the # shadowing C implementation. if not isinstance(e, _Element_Py): raise TypeError('expected an Element, not %s' % type(e).__name__) def remove(self, subelement): 在当前节点在子节点中删除某个节点 """Remove matching subelement. Unlike the find methods, this method compares elements based on identity, NOT ON tag value or contents. To remove subelements by other means, the easiest way is to use a list comprehension to select what elements to keep, and then use slice assignment to update the parent element. ValueError is raised if a matching element could not be found. """ # assert iselement(element) self._children.remove(subelement) def getchildren(self): 获取所有的子节点(废弃) """(Deprecated) Return all subelements. Elements are returned in document order. """ warnings.warn( "This method will be removed in future versions. " "Use 'list(elem)' or iteration over elem instead.", DeprecationWarning, stacklevel=2 ) return self._children def find(self, path, namespaces=None): 获取第一个寻找到的子节点 """Find first matching element by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return the first matching element, or None if no element was found. """ return ElementPath.find(self, path, namespaces) def findtext(self, path, default=None, namespaces=None): 获取第一个寻找到的子节点的内容 """Find text for first matching element by tag name or path. *path* is a string having either an element tag or an XPath, *default* is the value to return if the element was not found, *namespaces* is an optional mapping from namespace prefix to full name. Return text content of first matching element, or default value if none was found. Note that if an element is found having no text content, the empty string is returned. """ return ElementPath.findtext(self, path, default, namespaces) def findall(self, path, namespaces=None): 获取所有的子节点 """Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Returns list containing all matching elements in document order. """ return ElementPath.findall(self, path, namespaces) def iterfind(self, path, namespaces=None): 获取所有指定的节点,并创建一个迭代器(可以被for循环) """Find all matching subelements by tag name or path. *path* is a string having either an element tag or an XPath, *namespaces* is an optional mapping from namespace prefix to full name. Return an iterable yielding all matching elements in document order. """ return ElementPath.iterfind(self, path, namespaces) def clear(self): 清空节点 """Reset element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to None. """ self.attrib.clear() self._children = [] self.text = self.tail = None def get(self, key, default=None): 获取当前节点的属性值 """Get element attribute. Equivalent to attrib.get, but some implementations may handle this a bit more efficiently. *key* is what attribute to look for, and *default* is what to return if the attribute was not found. Returns a string containing the attribute value, or the default if attribute was not found. """ return self.attrib.get(key, default) def set(self, key, value): 为当前节点设置属性值 """Set element attribute. Equivalent to attrib[key] = value, but some implementations may handle this a bit more efficiently. *key* is what attribute to set, and *value* is the attribute value to set it to. """ self.attrib[key] = value def keys(self): 获取当前节点的所有属性的 key """Get list of attribute names. Names are returned in an arbitrary order, just like an ordinary Python dict. Equivalent to attrib.keys() """ return self.attrib.keys() def items(self): 获取当前节点的所有属性值,每个属性都是一个键值对 """Get element attributes as a sequence. The attributes are returned in arbitrary order. Equivalent to attrib.items(). Return a list of (name, value) tuples. """ return self.attrib.items() def iter(self, tag=None): 在当前节点的子孙中根据节点名称寻找所有指定的节点,并返回一个迭代器(可以被for循环)。 """Create tree iterator. The iterator loops over the element and all subelements in document order, returning all elements with a matching tag. If the tree structure is modified during iteration, new or removed elements may or may not be included. To get a stable set, use the list() function on the iterator, and loop over the resulting list. *tag* is what tags to look for (default is to return all elements) Return an iterator containing all the matching elements. """ if tag == "*": tag = None if tag is None or self.tag == tag: yield self for e in self._children: yield from e.iter(tag) # compatibility def getiterator(self, tag=None): # Change for a DeprecationWarning in 1.4 warnings.warn( "This method will be removed in future versions. " "Use 'elem.iter()' or 'list(elem.iter())' instead.", PendingDeprecationWarning, stacklevel=2 ) return list(self.iter(tag)) def itertext(self): 在当前节点的子孙中根据节点名称寻找所有指定的节点的内容,并返回一个迭代器(可以被for循环)。 """Create text iterator. The iterator loops over the element and all subelements in document order, returning all inner text. """ tag = self.tag if not isinstance(tag, str) and tag is not None: return if self.text: yield self.text for e in self: yield from e.itertext() if e.tail: yield e.tail
由于每个节点都具有以上的方法,并且在上一步骤中解析时均得到了root(xml文件的根节点),所有可以利用以上方法进行操作xml文件
操作1:遍历XML文档的所有内容
from xml.etree import ElementTree as ET ############ 解析方式一 ############ """ # 打开文件,读取XML内容 str_xml = open('xo.xml', 'r').read() # 将字符串解析成xml特殊对象,root代指xml文件的根节点 root = ET.XML(str_xml) """ ############ 解析方式二 ############ # 直接解析xml文件 tree = ET.parse("xo.xml") # 获取xml文件的根节点 root = tree.getroot() ### 操作 # 顶层标签 print(root.tag) # 遍历XML文档的第二层 for child in root: # 第二层节点的标签名称和标签属性 print(child.tag, child.attrib) # 遍历XML文档的第三层 for i in child: # 第二层节点的标签名称和内容 print(i.tag,i.text)
操作2:遍历XML中指定的节点
from xml.etree import ElementTree as ET ############ 解析方式一 ############ """ # 打开文件,读取XML内容 str_xml = open('xo.xml', 'r').read() # 将字符串解析成xml特殊对象,root代指xml文件的根节点 root = ET.XML(str_xml) """ ############ 解析方式二 ############ # 直接解析xml文件 tree = ET.parse("xo.xml") # 获取xml文件的根节点 root = tree.getroot() ### 操作 # 顶层标签 print(root.tag) # 遍历XML中所有的year节点 for node in root.iter('year'): # 节点的标签名称和内容 print(node.tag, node.text)
操作3:修改节点内容
由于修改的节点时,均是在内存中进行,其不会影响文件中的内容。所以,如果想要修改,则需要重新将内存中的内容写到文件。
方法1:解析字符串方式,修改并保存
from xml.etree import ElementTree as ET ############ 解析方式一 ############ # 打开文件,读取XML内容 str_xml = open('xo.xml', 'r').read() # 将字符串解析成xml特殊对象,root代指xml文件的根节点 root = ET.XML(str_xml) ############ 操作 ############ # 顶层标签 print(root.tag) # 循环所有的year节点 for node in root.iter('year'): # 将year节点中的内容自增一 new_year = int(node.text) + 1 node.text = str(new_year) # 设置属性 node.set('name', 'alex') node.set('age', '18') # 删除属性 del node.attrib['name'] ############ 保存文件 ############ tree = ET.ElementTree(root) tree.write("newnew.xml", encoding='utf-8') 解析字符串方式,修改,保存
方法2:解析文件方式,修改并保存
from xml.etree import ElementTree as ET ############ 解析方式二 ############ # 直接解析xml文件 tree = ET.parse("xo.xml") # 获取xml文件的根节点 root = tree.getroot() ############ 操作 ############ # 顶层标签 print(root.tag) # 循环所有的year节点 for node in root.iter('year'): # 将year节点中的内容自增一 new_year = int(node.text) + 1 node.text = str(new_year) # 设置属性 node.set('name', 'alex') node.set('age', '18') # 删除属性 del node.attrib['name'] ############ 保存文件 ############ tree.write("newnew.xml", encoding='utf-8') 解析文件方式,修改,保存
操作4:删除节点
方法1:解析字符串方式打开,删除并保存
from xml.etree import ElementTree as ET ############ 解析字符串方式打开 ############ # 打开文件,读取XML内容 str_xml = open('xo.xml', 'r').read() # 将字符串解析成xml特殊对象,root代指xml文件的根节点 root = ET.XML(str_xml) ############ 操作 ############ # 顶层标签 print(root.tag) # 遍历data下的所有country节点 for country in root.findall('country'): # 获取每一个country节点下rank节点的内容 rank = int(country.find('rank').text) if rank > 50: # 删除指定country节点 root.remove(country) ############ 保存文件 ############ tree = ET.ElementTree(root) tree.write("newnew.xml", encoding='utf-8') 解析字符串方式打开,删除,保存
方法2:解析文件方式打开,删除并保存
from xml.etree import ElementTree as ET ############ 解析文件方式 ############ # 直接解析xml文件 tree = ET.parse("xo.xml") # 获取xml文件的根节点 root = tree.getroot() ############ 操作 ############ # 顶层标签 print(root.tag) # 遍历data下的所有country节点 for country in root.findall('country'): # 获取每一个country节点下rank节点的内容 rank = int(country.find('rank').text) if rank > 50: # 删除指定country节点 root.remove(country) ############ 保存文件 ############ tree.write("newnew.xml", encoding='utf-8') 解析文件方式打开,删除,保存
3、创建XML
创建方法1:
from xml.etree import ElementTree as ET # 创建根节点 root = ET.Element("famliy") # 创建节点大儿子 son1 = ET.Element('son', {'name': '儿1'}) # 创建小儿子 son2 = ET.Element('son', {"name": '儿2'}) # 在大儿子中创建两个孙子 grandson1 = ET.Element('grandson', {'name': '儿11'}) grandson2 = ET.Element('grandson', {'name': '儿12'}) son1.append(grandson1) son1.append(grandson2) # 把儿子添加到根节点中 root.append(son1) root.append(son1) tree = ET.ElementTree(root) tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)
创建方法2:
from xml.etree import ElementTree as ET # 创建根节点 root = ET.Element("famliy") # 创建大儿子 # son1 = ET.Element('son', {'name': '儿1'}) son1 = root.makeelement('son', {'name': '儿1'}) # 创建小儿子 # son2 = ET.Element('son', {"name": '儿2'}) son2 = root.makeelement('son', {"name": '儿2'}) # 在大儿子中创建两个孙子 # grandson1 = ET.Element('grandson', {'name': '儿11'}) grandson1 = son1.makeelement('grandson', {'name': '儿11'}) # grandson2 = ET.Element('grandson', {'name': '儿12'}) grandson2 = son1.makeelement('grandson', {'name': '儿12'}) son1.append(grandson1) son1.append(grandson2) # 把儿子添加到根节点中 root.append(son1) root.append(son1) tree = ET.ElementTree(root) tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)
创建方法3:
from xml.etree import ElementTree as ET # 创建根节点 root = ET.Element("famliy") # 创建节点大儿子 son1 = ET.SubElement(root, "son", attrib={'name': '儿1'}) # 创建小儿子 son2 = ET.SubElement(root, "son", attrib={"name": "儿2"}) # 在大儿子中创建一个孙子 grandson1 = ET.SubElement(son1, "age", attrib={'name': '儿11'}) grandson1.text = '孙子' et = ET.ElementTree(root) #生成文档对象 et.write("test.xml", encoding="utf-8", xml_declaration=True, short_empty_elements=False)
由于原生保存的XML时默认无缩进,如果要设置缩进的话,需要修改保存方式:
from xml.etree import ElementTree as ET from xml.dom import minidom def prettify(elem): """将节点转换成字符串,并添加缩进。 """ rough_string = ET.tostring(elem, 'utf-8') reparsed = minidom.parseString(rough_string) return reparsed.toprettyxml(indent="\t") # 创建根节点 root = ET.Element("famliy") # 创建大儿子 # son1 = ET.Element('son', {'name': '儿1'}) son1 = root.makeelement('son', {'name': '儿1'}) # 创建小儿子 # son2 = ET.Element('son', {"name": '儿2'}) son2 = root.makeelement('son', {"name": '儿2'}) # 在大儿子中创建两个孙子 # grandson1 = ET.Element('grandson', {'name': '儿11'}) grandson1 = son1.makeelement('grandson', {'name': '儿11'}) # grandson2 = ET.Element('grandson', {'name': '儿12'}) grandson2 = son1.makeelement('grandson', {'name': '儿12'}) son1.append(grandson1) son1.append(grandson2) # 把儿子添加到根节点中 root.append(son1) root.append(son1) raw_str = prettify(root) f = open("xxxoo.xml",'w',encoding='utf-8') f.write(raw_str) f.close()
4、命名空间
http://www.w3school.com.cn/xml/xml_namespaces.asp
17.12 requests
Requests 是使用 Apache2 Licensed 许可证的 基于Python开发的HTTP 库,其在Python内置模块的基础上进行了高度的封装,从而使得Pythoner进行网络请求时,变得美好了许多,使用Requests可以轻而易举的 完成浏览器可有的任何操作。
1、模块安装
pip3 install requests
2、模块使用
# GET请求 # 无参数案例 import requests ret = requests.get('https://github.com/timeline.json') print(ret.url) print(ret.text) # 有参数实例 import requests payload = {'key1': 'value1', 'key2': 'value2'} ret = requests.get("http://httpbin.org/get", params=payload) print(ret.url) print(ret.text)
# POST请求 # 1、基本POST实例 import requests payload = {'key1': 'value1', 'key2': 'value2'} ret = requests.post("http://httpbin.org/post", data=payload) print(ret.text) # 2、发送请求头和数据实例 import requests import json url = 'https://api.github.com/some/endpoint' payload = {'some': 'data'} headers = {'content-type': 'application/json'} ret = requests.post(url, data=json.dumps(payload), headers=headers) print(ret.text) print(ret.cookies)
# POST请求 # 1、基本POST实例 import requests payload = {'key1': 'value1', 'key2': 'value2'} ret = requests.post("http://httpbin.org/post", data=payload) print(ret.text) # 2、发送请求头和数据实例 import requests import json url = 'https://api.github.com/some/endpoint' payload = {'some': 'data'} headers = {'content-type': 'application/json'} ret = requests.post(url, data=json.dumps(payload), headers=headers) print(ret.text) print(ret.cookies)
# 其他请求 requests.get(url, params=None, **kwargs) requests.post(url, data=None, json=None, **kwargs) requests.put(url, data=None, **kwargs) requests.head(url, **kwargs) requests.delete(url, **kwargs) requests.patch(url, data=None, **kwargs) requests.options(url, **kwargs) # 以上方法均是在此方法的基础上构建 requests.request(method, url, **kwargs)
其他资料:http://cn.python-requests.org/zh_CN/latest/
3、http请求和xml实例
实例1:检测QQ账号是否在线
import urllib import requests from xml.etree import ElementTree as ET # 使用内置模块urllib发送HTTP请求,或者XML格式内容 """ f = urllib.request.urlopen('http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=424662508') result = f.read().decode('utf-8') """ # 使用第三方模块requests发送HTTP请求,或者XML格式内容 r = requests.get('http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=424662508') result = r.text # 解析XML格式内容 node = ET.XML(result) # 获取内容 if node.text == "Y": print("在线") else: print("离线")
实例2:查看火车停靠信息
import urllib import requests from xml.etree import ElementTree as ET # 使用内置模块urllib发送HTTP请求,或者XML格式内容 """ f = urllib.request.urlopen('http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=G666&UserID=') result = f.read().decode('utf-8') """ # 使用第三方模块requests发送HTTP请求,或者XML格式内容 r = requests.get('http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=G666&UserID=') result = r.text # 解析XML格式内容 root = ET.XML(result) for node in root.iter('TrainDetailInfo'): print(node.find('TrainStation').text,node.find('StartTime').text,node.tag,node.attrib)
更多实例:http://www.cnblogs.com/wupeiqi/archive/2012/11/18/2776014.html
17.13 ConfigParser
configparser用来处理特定格式的文件,其本质是利用open来操作文件
# 注释1 ; 注释2 [section1] # 节点 k1 = v1 # 值 k2:v2 # 值 [section2] # 节点 k1 = v1 # 值
# 获取所有节点 import configparser config_file = 'test.ini' config = configparser.ConfigParser() config.read(config_file, encoding='utf-8') ret = config.sections() print(ret) # 判断某个节点是否存在 import configparser config_file = 'test.ini' config = configparser.ConfigParser() config.read(config_file, encoding='utf-8') has_se = config.has_section('section1') print(has_se) # 添加节点 import configparser config_file = 'test.ini' config = configparser.ConfigParser() config.read(config_file, encoding='utf-8') config.add_section('section3') config.write(open(config_file, 'w')) # 删除节点 import configparser config_file = 'test.ini' config = configparser.ConfigParser() config.read(config_file, encoding='utf-8') config.remove_section('section3') config.write(open(config_file, 'w')) # 获取指定节点下所有的键值对 import configparser config_file = 'test.ini' config = configparser.ConfigParser() config.read(config_file, encoding='utf-8') kv = config.items('section2') # 获取指定节点下所有键 import configparser config_file = 'test.ini' config = configparser.ConfigParser() config.read(config_file, encoding='utf-8') ret = config.options('section1') # 获取指定节点下指定key的值 import configparser config_file = 'test.ini' config = configparser.ConfigParser() config.read(config_file, encoding='utf-8') v = config.get('section1', 'k1') # v = config.getint('section1', 'k1') # v = config.getfloat('section1', 'k1') # v = config.getboolean('section1', 'k1') # 检查指定节点内的键值对 has_opt = config.has_option('section1', 'k1') print(has_opt) # 删除指定节点内的键值对 config.remove_option('section1', 'k1') config.write(open('xxxooo', 'w')) # 设置指定节点内的键值对 config.set('section1', 'k10', "123") config.write(open('xxxooo', 'w')) print(v)
17.14 xml convert json
#!/usr/bin/env python # encoding: utf-8 # json转换xml import xmltodict import json def python_conver_xml_to_json(): """ demo Python conversion between xml and json :return: """ xml = """ <data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2026</year> <gdppc>59900</gdppc> <neighbor direction="N" name="Malaysia" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data> """ dic = xmltodict.parse(xml) # json_str = json.dumps(dic) # 默认没有进行格式化 json_str = json.dumps(dic, indent=1) # 默认没有进行格式化 print(json_str) def python_conver_json_to_xml(): """ demo Python conversion between xml and json :return: """ dic = { 'page': { 'title': 'King Crimson', 'ns': 0, 'revision': { 'id': 547909091, } } } xml = xmltodict.unparse(dic) print(xml) if __name__ == '__main__': python_conver_xml_to_json() python_conver_json_to_xml()
练习题
1、通过HTTP请求和XML实现获取电视节目
API:http://www.webxml.com.cn/webservices/ChinaTVprogramWebService.asmx
2、通过HTTP请求和JSON实现获取天气状况
API:http://wthrcdn.etouch.cn/weather_mini?city=北京
出处:http://www.cnblogs.com/madsnotes/
声明:本文版权归作者和博客园共有,欢迎转载,但未经作者同意必须保留此段声明,且在文章页面明显位置给出原文连接。