Python之模块(一)

什么是Python模块?

1.1 Python模块简介

  模块让你能够有逻辑地组织你的Python代码段。把相关的代码分配到一个 模块里能让你的代码更好用,更易懂。模块也是Python对象,具有随机的名字属性用来绑定或引用。简单地说,模块就是一个保存了Python代码的文件。模块能定义函数,类和变量。模块里也能包含可执行的代码。

一个叫做aname的模块里的Python代码一般都能在一个叫aname.py的文件中找到。下例是个简单的模块support.py。

def print_func( par ):
   print "Hello : ", par
   return

1.2 Python模块导入

  当解释器遇到import语句,如果模块在当前的搜索路径就会被导入。搜索路径是一个解释器会先进行搜索的所有目录的列表。一个模块只会被导入一次,不管你执行了多少次import。这样可以防止导入模块被一遍又一遍地执行。

# 单模块
import xxx
# 嵌套在文件夹下
from xxx import xxx
from xxx import *
from xxx import xxx as xxx

导入模块其实就是告诉Python解释器去解释那个py文件

  • 导入一个py文件,解释器解释该py文件
  • 导入一个包,解释器解释该包下的 __init__.py 文件

那么问题来了,导入模块时是根据那个路径作为基准来进行的呢?当你导入一个模块,Python解析器对模块位置的搜索顺序是:

  • 当前目录
  • 如果不在当前目录,Python 则搜索在 shell 变量 PYTHONPATH 下的每个目录。
  • 如果都找不到,Python会察看默认路径。UNIX下,默认路径一般为/usr/local/lib/python/。

模块搜索路径存储在system模块的sys.path变量中。变量里包含当前目录,PYTHONPATH和由安装过程决定的默认目录。

模块路径:

# 获取路径
>>> import sys
>>> for i in sys.path:
...     print(i)
...

C:\Python35-32\python35.zip
C:\Python35-32\DLLs
C:\Python35-32\lib                         # 存放标准库
C:\Python35-32
C:\Python35-32\lib\site-packages        # 存放第三方库,扩展库

>>> import sys
>>> import os
>>> pre_path = os.path.abspath('../')
>>> sys.path.append(pre_path)
>>> for i in sys.path:
...     print(i)
...

C:\Python35-32\python35.zip
C:\Python35-32\DLLs
C:\Python35-32\lib
C:\Python35-32
C:\Python35-32\lib\site-packages
C:\Users

1.3 dir()函数

dir()函数一个排好序的字符串列表,内容是一个模块里定义过的名字。返回的列表容纳了在一个模块里定义的所有模块,变量和函数。如下一个简单的实例:

>>> for i in dir(os):
...     print(i)
...
.....省略
sys
system
urandom
utime
waitpid
walk
write

1.4 reload()函数

  当一个模块被导入到一个脚本,模块顶层部分的代码只会被执行一次。因此,如果你想重新执行模块里顶层部分的代码,可以用reload()函数。该函数会重新导入之前导入过的模块。语法如下:

reload(module_name)

在这里,module_name要直接放模块的名字,而不是一个字符串形式。比如想重载hello模块,如下:

reload(hello)

1.5 模块分类

自定义模块
内置模块
开源模块

开源模块

1.6.1 模块安装

1)使用Python包管理工具

pip
# 生成依赖包列表
pip freeze > requirements.txt
pip install -r requirements.txt
pip官网资料:https://pip.pypa.io/en/stable/quickstart/
easy_install

2)源码安装

# 在使用源码安装时,需要使用到gcc编译和python开发环境,所以,需要先执行:
yum install gcc
yum install python-devel
或
apt-get python-dev

# 源码包安装流程
下载源码
解压源码
进入目录
编译源码    python setup.py build
安装源码    python setup.py install

# 安装成功后,模块会自动安装到 sys.path中的某个目录中,如:
/usr/lib/python2.7/site-packages/

16.2 模块导入

  导入方式与自定义模块相同

16.3 案例讲解

  paramiko是一个用于做远程控制的模块,使用该模块可以对远程服务器进行命令或文件操作,值得一说的是,fabric和ansible内部的远程管理就是使用的paramiko来现实。

官网:http://www.paramiko.org/

1)模块安装

# pip 安装
python -m pip install paramiko   #  Python 2.x
python3 -m pip install paramiko  #  Python 3.x

# 源码安装
# pycrypto,由于 paramiko 模块内部依赖pycrypto,所以先下载安装pycrypto
# 下载安装 pycrypto
wget https://files.cnblogs.com/files/wupeiqi/pycrypto-2.6.1.tar.gz
tar -xvf pycrypto-2.6.1.tar.gz
cd pycrypto-2.6.1
python setup.py build
python setup.py install

# 进入python环境,导入Crypto检查是否安装成功
# 下载安装 paramiko
wget https://files.cnblogs.com/files/wupeiqi/paramiko-1.10.1.tar.gz
tar -xvf paramiko-1.10.1.tar.gz
cd paramiko-1.10.1
python setup.py build
python setup.py install
# 进入python环境,导入paramiko检查是否安装成功
python3 -c "import paramiko"

2)模块应用

# 通过账号密码连接服务器
import paramiko

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('192.168.1.108', 22, 'alex', '123')
stdin, stdout, stderr = ssh.exec_command('df')
print stdout.read()
ssh.close();

# SSHClient封装Transport
import paramiko

transport = paramiko.Transport(('hostname', 22))
transport.connect(username='wupeiqi', password='123')

ssh = paramiko.SSHClient()
ssh._transport = transport
stdin, stdout, stderr = ssh.exec_command('df')
print(stdout.read())
transport.close()

 

# 通过密钥链接服务器
import paramiko

private_key_path = '/home/auto/.ssh/id_rsa'
key = paramiko.RSAKey.from_private_key_file(private_key_path)

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('主机名 ', 端口, '用户名', key)

stdin, stdout, stderr = ssh.exec_command('df')
print stdout.read()
ssh.close()

#  SSHClient 封装 Transport
import paramiko

private_key = paramiko.RSAKey.from_private_key_file('/home/auto/.ssh/id_rsa')
transport = paramiko.Transport(('hostname', 22))
transport.connect(username='wupeiqi', pkey=private_key)
ssh = paramiko.SSHClient()
ssh._transport = transport
stdin, stdout, stderr = ssh.exec_command('df')
transport.close()

 

# 上传或者下载文件 - 通过用户名和密码
import os,sys
import paramiko

t = paramiko.Transport(('192.168.1.1',22))
t.connect(username='root',password='123456')
sftp = paramiko.SFTPClient.from_transport(t)
sftp.put('/tmp/test.py','/tmp/test.py') 
# sftp.get('/tmp/test.py','/tmp/test2.py')
t.close()

 

 

# 上传或下载文件 - 通过密钥
import paramiko

pravie_key_path = '/home/auto/.ssh/id_rsa'
key = paramiko.RSAKey.from_private_key_file(pravie_key_path)

t = paramiko.Transport(('192.168.1.100',22))
t.connect(username='root',pkey=key)

sftp = paramiko.SFTPClient.from_transport(t)
sftp.put('/tmp/test3.py','/tmp/test3.py') 
# sftp.get('/tmp/test3.py','/tmp/test4.py') 
t.close()
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import paramiko
import uuid

class SSHConnection(object):

    def __init__(self, host='172.16.103.191', port=22, username='wupeiqi',pwd='123'):
        self.host = host
        self.port = port
        self.username = username
        self.pwd = pwd
        self.__k = None

    def create_file(self):
        file_name = str(uuid.uuid4())
        with open(file_name,'w') as f:
            f.write('sb')
        return file_name

    def run(self):
        self.connect()
        self.upload('/home/wupeiqi/tttttttttttt.py')
        self.rename('/home/wupeiqi/tttttttttttt.py', '/home/wupeiqi/ooooooooo.py)
        self.close()

    def connect(self):
        transport = paramiko.Transport((self.host,self.port))
        transport.connect(username=self.username,password=self.pwd)
        self.__transport = transport

    def close(self):

        self.__transport.close()

    def upload(self,target_path):
        # 连接,上传
        file_name = self.create_file()

        sftp = paramiko.SFTPClient.from_transport(self.__transport)
        # 将location.py 上传至服务器 /tmp/test.py
        sftp.put(file_name, target_path)

    def rename(self, old_path, new_path):

        ssh = paramiko.SSHClient()
        ssh._transport = self.__transport
        # 执行命令
        cmd = "mv %s %s" % (old_path, new_path,)
        stdin, stdout, stderr = ssh.exec_command(cmd)
        # 获取命令结果
        result = stdout.read()

    def cmd(self, command):
        ssh = paramiko.SSHClient()
        ssh._transport = self.__transport
        # 执行命令
        stdin, stdout, stderr = ssh.exec_command(command)
        # 获取命令结果
        result = stdout.read()
        return result
        


ha = SSHConnection()
ha.run()
案例1
import paramiko
import uuid

class SSHConnection(object):

    def __init__(self, host='192.168.11.61', port=22, username='alex',pwd='alex3714'):
        self.host = host
        self.port = port
        self.username = username
        self.pwd = pwd
        self.__k = None

    def run(self):
        self.connect()
        pass
        self.close()

    def connect(self):
        transport = paramiko.Transport((self.host,self.port))
        transport.connect(username=self.username,password=self.pwd)
        self.__transport = transport

    def close(self):
        self.__transport.close()

    def cmd(self, command):
        ssh = paramiko.SSHClient()
        ssh._transport = self.__transport
        # 执行命令
        stdin, stdout, stderr = ssh.exec_command(command)
        # 获取命令结果
        result = stdout.read()
        return result

    def upload(self,local_path, target_path):
        # 连接,上传
        sftp = paramiko.SFTPClient.from_transport(self.__transport)
        # 将location.py 上传至服务器 /tmp/test.py
        sftp.put(local_path, target_path)

ssh = SSHConnection()
ssh.connect()
r1 = ssh.cmd('df')
ssh.upload('s2.py', "/home/alex/s7.py")
ssh.close()
案例2

内置模块

  Python 带有一个内置模块库,并发布有单独的文档叫Python 库参考手册(以下简称"库参考手册")。有些模块被直接构建在解析器里;这些操作虽然不是语言核心的部分,但是依然被内建进来,一方面是效率的原因,另一方面是为了提供访问操作系统原语如系统调用的功能。这些模块是可配置的,也取决于底层的平台。例如,winreg 模块只在 Windows 系统上提供。有一个特别的模块需要注意: sys,它内置在每一个 Python 解释器中。变量 sys.ps1 和 sys.ps2 定义了主提示符和辅助提示符使用的字符串:

>>> import sys
>>> sys.ps1                      # 只有在交互式模式中,这两个变量才有定义
'>>> '
>>> sys.ps2
'... '
>>> sys.ps1 = 'mads>>>'          # 重新设置ps1,ps2的值
mads>>>

模块中的特殊变量

1)__doc__ 将文档的注释封装到该方法  """这种注释才会被封装"""

#!/usr/bin/env python
# encoding: utf-8

"""
@version: v1.0
@author: XXX
@license: Apache Licence
@contact: xxx@qq.com
@site: http://madsstudy.blog.51cto.com/
@software: PyCharm
@file: mod_index.py
@time: 2016/6/11 16:22
"""

print(__doc__)

结果:
@version: v1.0
@author: xxx
@license: Apache Licence
@contact: xxx@qq.com
@site: http://madsstudy.blog.51cto.com/
@software: PyCharm
@file: mod_index.py
@time: 2016/6/11 16:22

2)__file__ 获取当前文件的所在路径

print(__file__)
E:/开源中国/192.168.21.140/trunk/Python/scripts/S13/s13day06/模块/mod_index.py

# 应用案例
import sys
import os
sys.path.append(os.path.dirname(os.path.abspath(__file__)))

3) __name__只有执行当前文件时候,当前文件的特殊变量__name__ == "__main__"

4)__cached__

5)__package__指定的函数从哪个模块导入

from bin import admin
print(__package__)
print(admin.__package__)

# 结果:
None
bin

 

17.5 json和pickel模块

用于序列化的两个模块
1)json
功能:用于字符串和 python数据类型间进行转换
特点:更加适合跨语言,处理Python的基本数据类型
方法:
loads:将字符串形式转化为Python基本数据类型
dumps:将Python基本数据类型转化为字符串
load :读文件,进行反序列化
dump :序列化写文件

2)pickel
功能:用于python特有的类型 和 python的数据类型间进行转换
特点:Python所有类型的序列化,仅适用Python,还有可能发生版本兼容问题,应用场景:游戏自自定义玩家数据
方法:
dumps:
loads:
 dump:只支持写字符串类型,不能是二进制
 load:

3)应用案例

import json

dic = {"name": "alex", "age": 29}  # 字典
print(dic, type(dic))
st = json.dumps(dic)
print(st, type(st))


li = ["alex", 29]  # 列表
print(li, type(li))
st = json.dumps(li)
print(st, type(st))


st = '{"k1": "v1" ,"k2": 23}'   # 最外层必须使用单引号,里面使用双引号 否则loads时报错
print(st, type(st))
dic = json.loads(st)
print(dic, type(dic))


li = ["alex", 29]  # 列表
print(li, type(li))
st = json.dump(li, open('user_info.txt', 'w'))

st = json.load(open('user_info.txt', 'r'))
print(st, type(st))
import pickle

li = [11, 22, 33, ]
r = pickle.dumps(li)
print(r)

res = pickle.loads(r)
print(res)

li = [11, 22, 33, ]
pickle.dump(li, open('user_info.pickle', 'wb'))

res = pickle.load(open('user_info.pickle', 'rb'))
print(res)

17.6 logging模块

  很多程序都有记录日志的需求,并且日志中包含的信息即有正常的程序访问日志,还可能有错误、警告等信息输出,python的logging模块提供了标准的日志接口,你可以通过它存储各种格式的日志,logging的日志可以分为 debug(), info(), warning(), error() and critical() 5个级别
简单的将日志打印到屏幕

>>> import logging
>>>
>>> logging.debug('This is debug message')
>>> logging.info('This is info message')
>>> logging.warning('This is warning message')
WARNING:root:This is warning message
>>>

默认情况下,logging将日志打印到屏幕,日志级别为WARNING;
日志级别大小关系为:CRITICAL > ERROR > WARNING > INFO > DEBUG > NOTSET,当然也可以自己定义日志级别。

将日志写入文件

import logging

logging.basicConfig(level=logging.DEBUG,                       # 设置日志级别为debug
                format='%(asctime)s %(filename)s [line:%(lineno)d] %(levelname)s %(message)s',
                datefmt='%a, %d %b %Y %H:%M:%S',
                filename='my_app.log',
                filemode='w')

logging.debug('This is debug message')
logging.info('This is info message')
logging.warning('This is warning message')

# 日志文件内容
E:\开源中国\192.168.21.140\trunk\Python\scripts\S13\s13day05>cat my_app.log                                                                                                                                          
Mon, 06 Jun 2016 22:34:31 logging_v1.py [line:24] DEBUG This is debug message
Mon, 06 Jun 2016 22:34:31 logging_v1.py [line:25] INFO This is info message
Mon, 06 Jun 2016 22:34:31 logging_v1.py [line:26] WARNING This is warning message

 参数讲解:

filename: 指定日志文件名
filemode: 和file函数意义相同,指定日志文件的打开模式,'w'或'a'
format: 指定输出的格式和内容,format可以输出很多有用信息,如上例所示:
%(levelno)s: 打印日志级别的数值
%(levelname)s: 打印日志级别名称
%(pathname)s: 打印当前执行程序的路径,其实就是sys.argv[0]
%(filename)s: 打印当前执行程序名
%(funcName)s: 打印日志的当前函数
%(lineno)d: 打印日志的当前行号
%(asctime)s: 打印日志的时间
%(thread)d: 打印线程ID
%(threadName)s: 打印线程名称
%(process)d: 打印进程ID
%(message)s: 打印日志信息
datefmt: 指定时间格式,同time.strftime()
level: 设置日志级别,默认为logging.WARNING
stream: 指定将日志的输出流,可以指定输出到sys.stderr,sys.stdout或者文件,默认输出到sys.stderr,当stream和filename同时指定时,stream被忽略

同时将日志输出到屏幕和日志

import logging

logging.basicConfig(level=logging.DEBUG,
                format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',
                datefmt='%a, %d %b %Y %H:%M:%S',
                filename='my_app.log',
                filemode='w')


# 定义一个StreamHandler,将INFO级别或更高的日志信息打印到标准错误,并将其添加到当前的日志处理对象
console = logging.StreamHandler()
console.setLevel(logging.INFO)
formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
console.setFormatter(formatter)
logging.getLogger('').addHandler(console)


logging.debug('This is debug message')
logging.info('This is info message')
logging.warning('This is warning message')

# 屏幕上打印:
# root        : INFO     This is info message
# root        : WARNING  This is warning message
#
# ./my_app.log文件中内容为:
# Sun, 24 May 2009 21:48:54 demo2.py[line:11] DEBUG This is debug message
# Sun, 24 May 2009 21:48:54 demo2.py[line:12] INFO This is info message
# Sun, 24 May 2009 21:48:54 demo2.py[line:13] WARNING This is warning message

日志回滚

import logging
from logging.handlers import RotatingFileHandler

# 定义一个RotatingFileHandler,最多备份5个日志文件,每个日志文件最大10M
Rthandler = RotatingFileHandler('my_app.log', maxBytes=10*1024*1024, backupCount=5)
Rthandler.setLevel(logging.INFO)
formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
Rthandler.setFormatter(formatter)
logging.getLogger('').addHandler(Rthandler)

从上例和本例可以看出,logging有一个日志处理的主对象,其它处理方式都是通过addHandler添加进去的。
logging的几种handle方式如下:
logging.StreamHandler: 日志输出到流,可以是sys.stderr、sys.stdout或者文件
logging.FileHandler: 日志输出到文件

日志回滚方式,实际使用时用RotatingFileHandler和TimedRotatingFileHandler
logging.handlers.BaseRotatingHandler
logging.handlers.RotatingFileHandler
logging.handlers.TimedRotatingFileHandler

logging.handlers.SocketHandler: 远程输出日志到TCP/IP sockets
logging.handlers.DatagramHandler:  远程输出日志到UDP sockets
logging.handlers.SMTPHandler:  远程输出日志到邮件地址
logging.handlers.SysLogHandler: 日志输出到syslog
logging.handlers.NTEventLogHandler: 远程输出日志到Windows NT/2000/XP的事件日志
logging.handlers.MemoryHandler: 日志输出到内存中的制定buffer
logging.handlers.HTTPHandler: 通过"GET"或"POST"远程输出到HTTP服务器
  由于StreamHandler和FileHandler是常用的日志处理方式,所以直接包含在logging模块中,而其他方式则包含在logging.handlers模块中,
上述其它处理方式的使用请参见python2.5手册!

通过logging.config模块配置日志

#logger.conf

###############################################

[loggers]
keys=root,example01,example02

[logger_root]
level=DEBUG
handlers=hand01,hand02

[logger_example01]
handlers=hand01,hand02
qualname=example01
propagate=0

[logger_example02]
handlers=hand01,hand03
qualname=example02
propagate=0

###############################################

[handlers]
keys=hand01,hand02,hand03

[handler_hand01]
class=StreamHandler
level=INFO
formatter=form02
args=(sys.stderr,)

[handler_hand02]
class=FileHandler
level=DEBUG
formatter=form01
args=('myapp.log', 'a')

[handler_hand03]
class=handlers.RotatingFileHandler
level=INFO
formatter=form02
args=('myapp.log', 'a', 10*1024*1024, 5)

###############################################

[formatters]
keys=form01,form02

[formatter_form01]
format=%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s
datefmt=%a, %d %b %Y %H:%M:%S

[formatter_form02]
format=%(name)-12s: %(levelname)-8s %(message)s
datefmt=

例3:

import logging
import logging.config

logging.config.fileConfig("logger.conf")
logger = logging.getLogger("example01")

logger.debug('This is debug message')
logger.info('This is info message')
logger.warning('This is warning message')

例4:

import logging
import logging.config

logging.config.fileConfig("logger.conf")
logger = logging.getLogger("example02")

logger.debug('This is debug message')
logger.info('This is info message')
logger.warning('This is warning message')

logging是线程安全的

17.7 hashlib

  用于加密相关的操作,代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法

# Python2.x 在Python3.x中已经废弃
import md5

hash = md5.new()
hash.update('admin')
print hash.hexdigest()

# Python2.x 在Python3.x中已经废弃
import sha

hash = sha.new()
hash.update('admin')
print hash.hexdigest()
import hashlib

# ######## md5 ########

obj = hashlib.md5()
obj.update(bytes('123', encoding='utf-8'))
print("md5:", obj.hexdigest())

# ######## sha1 ########

obj = hashlib.sha1()
obj.update(bytes('123', encoding='utf-8'))
print("sha1:", obj.hexdigest())

# ######## sha256 ########

obj = hashlib.sha256()
obj.update(bytes('123', encoding='utf-8'))
print("sha256:", obj.hexdigest())

# ######## sha384 ########

obj = hashlib.sha384()
obj.update(bytes('123', encoding='utf-8'))
print("sha384:", obj.hexdigest())

# ######## sha512 ########

obj = hashlib.sha512()
obj.update(bytes('123', encoding='utf-8'))
print("sha512:", obj.hexdigest())

# 结果:
# md5: 202cb962ac59075b964b07152d234b70
# sha1: 40bd001563085fc35165329ea1ff5c5ecbdbbeef
# sha256: a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3
# sha384: 9a0a82f0c0cf31470d7affede3406cc9aa8410671520b727044eda15b4c25532a9b5cd8aaf9cec4919d76255b6bfb00f
# sha512: 3c9909afec25354d551dae21590bb26e38d53f2173b8d3dc3eee4c047e7ab1c1eb8b85103e3be7ba613b31bb5c9c36214dc9f14a42fd7a2fdb84856bca5c44c2
import hashlib

print(hashlib.algorithms_available)  # 可以使用的加密算法
结果:{'SHA', 'SHA1', 'SHA512', 'md4', 'DSA', 'SHA224', 'sha1', 'sha256', 'md5', 'sha224', 'ecdsa-with-SHA1', 'sha', 'RIPEMD160', 'dsaEncryption', 'ripemd160', 'MD4', 'DSA-SHA', 'whirlpool', 'MD5', 'SHA256', 'dsaWithSHA', 'sha384', 'SHA384', 'sha512'}
print(hashlib.algorithms_guaranteed)  # python在所有平台上都可以使用的函数,也就是比较稳定的函数
结果:{'sha384', 'sha1', 'sha256', 'md5', 'sha224', 'sha512'}

     以上加密算法虽然依然非常厉害,但是还存在撞库的问题。有必要对加密算法中添加自定义key再来做加密。

import hashlib

obj = hashlib.md5(bytes('gyyx', encoding='utf-8'))
obj.update(bytes('123', encoding='utf-8'))
result = obj.hexdigest()
print("md5:", result)

#结果:
# md5: 83a2156b15c1376ae1e93980d7d1af56
#Python 还有一个hmac模块,它内部对创建key和内容再进行处理然后再加密
import hmac

obj = hmac.new(b'mykey')
obj.update(b'mymessage')
print(obj.hexdigest())

结果:
# hmac:d811630c4e62c6ef90d1bfe540212aaf

应用案例:

def md5(arg):
    """
    md5加密
    :param arg: 接收用户输入的加密字符串
    :return: 返回字符串的加密结果
    """
    obj = hashlib.md5(bytes('oldboy', encoding='utf-8'))
    obj.update(bytes(arg, encoding='utf-8'))
    return obj.hexdigest()

user_name = 'admin'
pass_word = '1e5413879f3e972653fdcf9101698b29'  # 明文:123456
for i in range(3):
    username = input("请输入用户名:")
    password = input("请输入密码:")
    if username == user_name and md5(password) == pass_word:
        print("登录成功")
        break
    else:
        print("账号或者密码错误")

结果:
 

17.8 random

17.9 执行系统命令

可执行shell命令的相关模块和函数:

  • os.system
  • os.spawn*
  • os.popen*  --废弃
  • popen2.*    --废弃
  • commands.*  --废弃,3.x中被异常
import commands

result = commands.getoutput('cmd')
result = commands.getstatus('cmd')
result = commands.getstatusoutput('cmd')

  从Python 2.4开始,Python引入subprocess模块来管理子进程,以取代一些旧模块的方法。以上执行shell命令的相关的模块和函数的功能均在 subprocess 模块中实现,并提供了更丰富的功能。

subprocess.call:执行命令,返回状态码(shell=True,允许shell命令是字符串形式)

import subprocess

res = subprocess.call('ls -l', shell=True)
print(res)

res = subprocess.call(["ls", "-l"], shell=False)
print(res)

subprocess.check_call:执行命令,如果执行状态是0,则返回0,否则抛异常

import subprocess

res = subprocess.check_call(["ls", "-l"])
print(res)

res = subprocess.check_call("lss", shell=True)  # 执行一个非shell命令
print(res)

subprocess.check_output:执行命令,如果状态码是0,则返回执行结果,否则抛异常

import subprocess

res = subprocess.check_output(["echo", "Hello World!"])
print(res)

res = subprocess.check_output("exit 1", shell=True)  # 抛异常
print(res)

subprocess.Popen():用于执行复杂的系统命令

参数:

  • args:shell命令,可以是字符串或者序列类型(如:list,元组)
  • bufsize:指定缓冲。0 无缓冲,1 行缓冲,其他 缓冲区大小,负值 系统缓冲
  • stdin, stdout, stderr:分别表示程序的标准输入、输出、错误句柄
  • preexec_fn:只在Unix平台下有效,用于指定一个可执行对象(callable object),它将在子进程运行之前被调用
  • close_sfs:在windows平台下,如果close_fds被设置为True,则新创建的子进程将不会继承父进程的输入、输出、错误管道。所以不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。
  • shell:同上
  • cwd:用于设置子进程的当前目录
  • env:用于指定子进程的环境变量。如果env = None,子进程的环境变量将从父进程中继承。
  • universal_newlines:不同系统的换行符不同,True -> 同意使用 \n
  • startupinfo与createionflags只在windows下有效,将被传递给底层的CreateProcess()函数,用于设置子进程的一些属性,如:主窗口的外观,进程的优先级等等
import subprocess
ret1 = subprocess.Popen(["mkdir","t1"])
ret2 = subprocess.Popen("mkdir t2", shell=True)

终端输入命令分为两种:

  • 输入命令即可得到输入,如:ifconfig
  • 输入命令进入某种环境,再输入特定的命令,如:Python
# 第一种情况:输入命令即可得到输出结果
import subprocess
obj = subprocess.Popen("mkdir t3", shell=True, cwd='/home/dev',)

# 第二种情况:进入某种环境之下特定命令,返回结果
import subprocess

obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
obj.stdin.write('print 1 \n ')
obj.stdin.write('print 2 \n ')
obj.stdin.write('print 3 \n ')
obj.stdin.write('print 4 \n ')
obj.stdin.close()

cmd_out = obj.stdout.read()
obj.stdout.close()
cmd_error = obj.stderr.read()
obj.stderr.close()

print cmd_out
print cmd_error

# 第二种情况:进入某种环境之下特定命令,返回结果,相对上一种 将输出和错误输出通过一个管道输出
import subprocess

obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
obj.stdin.write('print 1 \n ')
obj.stdin.write('print 2 \n ')
obj.stdin.write('print 3 \n ')
obj.stdin.write('print 4 \n ')

out_error_list = obj.communicate()
print out_error_list

17.11 shutil

提供高级的文件、文件夹等操作

shutil.copyfileobj(fsrc, fdst[, length])将文件内容拷贝到另一个文件中,可以部分内容

# 源代码实现
def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

shutil.copyfile(src, dst)拷贝文件

# 源代码实现
def copyfile(src, dst):
    """Copy data from src to dst"""
    if _samefile(src, dst):
        raise Error("`%s` and `%s` are the same file" % (src, dst))

    for fn in [src, dst]:
        try:
            st = os.stat(fn)
        except OSError:
            # File most likely does not exist
            pass
        else:
            # XXX What about other special files? (sockets, devices...)
            if stat.S_ISFIFO(st.st_mode):
                raise SpecialFileError("`%s` is a named pipe" % fn)

    with open(src, 'rb') as fsrc:
        with open(dst, 'wb') as fdst:
            copyfileobj(fsrc, fdst)

shutil.copymode(src, dst)仅拷贝权限。内容、组、用户均不变

# 源代码实现
def copymode(src, dst):
    """Copy mode bits from src to dst"""
    if hasattr(os, 'chmod'):
        st = os.stat(src)
        mode = stat.S_IMODE(st.st_mode)
        os.chmod(dst, mode)

shutil.copystat(src, dst)拷贝状态的信息,包括:mode bits, atime, mtime, flags

# 源代码实现
def copystat(src, dst):
    """Copy all stat info (mode bits, atime, mtime, flags) from src to dst"""
    st = os.stat(src)
    mode = stat.S_IMODE(st.st_mode)
    if hasattr(os, 'utime'):
        os.utime(dst, (st.st_atime, st.st_mtime))
    if hasattr(os, 'chmod'):
        os.chmod(dst, mode)
    if hasattr(os, 'chflags') and hasattr(st, 'st_flags'):
        try:
            os.chflags(dst, st.st_flags)
        except OSError, why:
            for err in 'EOPNOTSUPP', 'ENOTSUP':
                if hasattr(errno, err) and why.errno == getattr(errno, err):
                    break
            else:
                raise

shutil.copy(src, dst)拷贝文件和权限

# 源代码实现
def copy(src, dst):
    """Copy data and mode bits ("cp src dst").

    The destination may be a directory.

    """
    if os.path.isdir(dst):
        dst = os.path.join(dst, os.path.basename(src))
    copyfile(src, dst)
    copymode(src, dst)

shutil.copy2(src, dst)拷贝文件和状态信息

# 源代码实现
def copy2(src, dst):
    """Copy data and all stat info ("cp -p src dst").

    The destination may be a directory.

    """
    if os.path.isdir(dst):
        dst = os.path.join(dst, os.path.basename(src))
    copyfile(src, dst)
    copystat(src, dst)

shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)递归的去拷贝文件

例如:copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))

# 源代码实现
def ignore_patterns(*patterns):
    """Function that can be used as copytree() ignore parameter.

    Patterns is a sequence of glob-style patterns
    that are used to exclude files"""
    def _ignore_patterns(path, names):
        ignored_names = []
        for pattern in patterns:
            ignored_names.extend(fnmatch.filter(names, pattern))
        return set(ignored_names)
    return _ignore_patterns

def copytree(src, dst, symlinks=False, ignore=None):
    """Recursively copy a directory tree using copy2().

    The destination directory must not already exist.
    If exception(s) occur, an Error is raised with a list of reasons.

    If the optional symlinks flag is true, symbolic links in the
    source tree result in symbolic links in the destination tree; if
    it is false, the contents of the files pointed to by symbolic
    links are copied.

    The optional ignore argument is a callable. If given, it
    is called with the `src` parameter, which is the directory
    being visited by copytree(), and `names` which is the list of
    `src` contents, as returned by os.listdir():

        callable(src, names) -> ignored_names

    Since copytree() is called recursively, the callable will be
    called once for each directory that is copied. It returns a
    list of names relative to the `src` directory that should
    not be copied.

    XXX Consider this example code rather than the ultimate tool.

    """
    names = os.listdir(src)
    if ignore is not None:
        ignored_names = ignore(src, names)
    else:
        ignored_names = set()

    os.makedirs(dst)
    errors = []
    for name in names:
        if name in ignored_names:
            continue
        srcname = os.path.join(src, name)
        dstname = os.path.join(dst, name)
        try:
            if symlinks and os.path.islink(srcname):
                linkto = os.readlink(srcname)
                os.symlink(linkto, dstname)
            elif os.path.isdir(srcname):
                copytree(srcname, dstname, symlinks, ignore)
            else:
                # Will raise a SpecialFileError for unsupported file types
                copy2(srcname, dstname)
        # catch the Error from the recursive copytree so that we can
        # continue with other files
        except Error, err:
            errors.extend(err.args[0])
        except EnvironmentError, why:
            errors.append((srcname, dstname, str(why)))
    try:
        copystat(src, dst)
    except OSError, why:
        if WindowsError is not None and isinstance(why, WindowsError):
            # Copying file access times may fail on Windows
            pass
        else:
            errors.append((src, dst, str(why)))
    if errors:
        raise Error, errors

 

shutil.rmtree(path[, ignore_errors[, onerror]])递归的去删除文件

 

# 源代码实现
def rmtree(path, ignore_errors=False, onerror=None):
    """Recursively delete a directory tree.

    If ignore_errors is set, errors are ignored; otherwise, if onerror
    is set, it is called to handle the error with arguments (func,
    path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
    path is the argument to that function that caused it to fail; and
    exc_info is a tuple returned by sys.exc_info().  If ignore_errors
    is false and onerror is None, an exception is raised.

    """
    if ignore_errors:
        def onerror(*args):
            pass
    elif onerror is None:
        def onerror(*args):
            raise
    try:
        if os.path.islink(path):
            # symlinks to directories are forbidden, see bug #1669
            raise OSError("Cannot call rmtree on a symbolic link")
    except OSError:
        onerror(os.path.islink, path, sys.exc_info())
        # can't continue even if onerror hook returns
        return
    names = []
    try:
        names = os.listdir(path)
    except os.error, err:
        onerror(os.listdir, path, sys.exc_info())
    for name in names:
        fullname = os.path.join(path, name)
        try:
            mode = os.lstat(fullname).st_mode
        except os.error:
            mode = 0
        if stat.S_ISDIR(mode):
            rmtree(fullname, ignore_errors, onerror)
        else:
            try:
                os.remove(fullname)
            except os.error, err:
                onerror(os.remove, fullname, sys.exc_info())
    try:
        os.rmdir(path)
    except os.error:
        onerror(os.rmdir, path, sys.exc_info())

shutil.move(src, dst)递归的去移动文件

# 递归的去移动文件
def move(src, dst):
    """Recursively move a file or directory to another location. This is
    similar to the Unix "mv" command.

    If the destination is a directory or a symlink to a directory, the source
    is moved inside the directory. The destination path must not already
    exist.

    If the destination already exists but is not a directory, it may be
    overwritten depending on os.rename() semantics.

    If the destination is on our current filesystem, then rename() is used.
    Otherwise, src is copied to the destination and then removed.
    A lot more could be done here...  A look at a mv.c shows a lot of
    the issues this implementation glosses over.

    """
    real_dst = dst
    if os.path.isdir(dst):
        if _samefile(src, dst):
            # We might be on a case insensitive filesystem,
            # perform the rename anyway.
            os.rename(src, dst)
            return

        real_dst = os.path.join(dst, _basename(src))
        if os.path.exists(real_dst):
            raise Error, "Destination path '%s' already exists" % real_dst
    try:
        os.rename(src, real_dst)
    except OSError:
        if os.path.isdir(src):
            if _destinsrc(src, dst):
                raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst)
            copytree(src, real_dst, symlinks=True)
            rmtree(src)
        else:
            copy2(src, real_dst)
            os.unlink(src)

shutil.make_archive(base_name, format,...)创建压缩包并返回文件路径,例如:zip、tar

base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径,
如:www                        =>保存至当前路径
如:/Users/www =>保存至/Users/
format: 压缩包种类,“zip”, “tar”, “bztar”,“gztar”
root_dir: 要压缩的文件夹路径(默认当前目录)
owner: 用户,默认当前用户
group: 组,默认当前组
logger: 用于记录日志,通常是logging.Logger对象

#将 /Users/Downloads/test 下的文件打包放置当前程序目录
 
import shutil
ret = shutil.make_archive("www", 'gztar', root_dir='/Users/Downloads/test')
 
 
#将 /Users/Downloads/test 下的文件打包放置 /Users/目录
import shutil
ret = shutil.make_archive("/Users/www", 'gztar', root_dir='/Users/Downloads/test')
# 源代码实现
def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
                 dry_run=0, owner=None, group=None, logger=None):
    """Create an archive file (eg. zip or tar).

    'base_name' is the name of the file to create, minus any format-specific
    extension; 'format' is the archive format: one of "zip", "tar", "bztar"
    or "gztar".

    'root_dir' is a directory that will be the root directory of the
    archive; ie. we typically chdir into 'root_dir' before creating the
    archive.  'base_dir' is the directory where we start archiving from;
    ie. 'base_dir' will be the common prefix of all files and
    directories in the archive.  'root_dir' and 'base_dir' both default
    to the current directory.  Returns the name of the archive file.

    'owner' and 'group' are used when creating a tar archive. By default,
    uses the current owner and group.
    """
    save_cwd = os.getcwd()
    if root_dir is not None:
        if logger is not None:
            logger.debug("changing into '%s'", root_dir)
        base_name = os.path.abspath(base_name)
        if not dry_run:
            os.chdir(root_dir)

    if base_dir is None:
        base_dir = os.curdir

    kwargs = {'dry_run': dry_run, 'logger': logger}

    try:
        format_info = _ARCHIVE_FORMATS[format]
    except KeyError:
        raise ValueError, "unknown archive format '%s'" % format

    func = format_info[0]
    for arg, val in format_info[1]:
        kwargs[arg] = val

    if format != 'zip':
        kwargs['owner'] = owner
        kwargs['group'] = group

    try:
        filename = func(base_name, base_dir, **kwargs)
    finally:
        if root_dir is not None:
            if logger is not None:
                logger.debug("changing back to '%s'", save_cwd)
            os.chdir(save_cwd)

    return filename

shutil对压缩包的处理是调用ZipFile和TarFile两个模块来完成的

import zipfile

# 文件压缩
z = zipfile.ZipFile('test.zip', 'w')
# z.write('a.txt', 'b.txt')  # 这样压缩只会压缩b.txt
z.write('a.txt')
z.write('b.txt')
z.close()

# 文件解压
z = zipfile.ZipFile('test.zip', 'r')
# z.extractall()    # 解压所有
z.extract('a.txt')  # 解压指定文件
z.close()
import tarfile

# tar打包文件
tar = tarfile.TarFile('test.tar', 'w')
tar.add('a.txt')
tar.add('bbs2.zip', arcname='bbs2.zip')  # 将被打包文件重命名
tar.add('cmdb.zip', arcname='cmdb.zip')
tar.close()

# 解压单个文件
tar = tarfile.TarFile('test.tar', 'r')
a = tar.getmembers('a.txt')
tar.extract(a)
tar.extractall()  # 解压所有文件,还可以设置解压地址
tar.close()

源代码实现:参看tarfile.py及zipfile.py

17.11 xml

XML是实现不同语言或程序之间进行数据交换的协议,XML文件格式如下:

<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2026</year>
        <gdppc>59900</gdppc>
        <neighbor direction="N" name="Malaysia" />
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>

1、解析XML

# 利用ElementTree.XML将字符串解析成xml对象
from xml.etree import ElementTree as ET

# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()
# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)


# 利用ElementTree.parse将文件直接解析成xml对象
from xml.etree import ElementTree as ET

# 直接解析xml文件
tree = ET.parse("xo.xml")
# 获取xml文件的根节点
root = tree.getroot()

2、操作XML

XML格式类型是节点嵌套节点,对于每一个节点均有以下功能,以便对当前节点进行操作:

class Element:
    """An XML element.

    This class is the reference implementation of the Element interface.

    An element's length is its number of subelements.  That means if you
    want to check if an element is truly empty, you should check BOTH
    its length AND its text attribute.

    The element tag, attribute names, and attribute values can be either
    bytes or strings.

    *tag* is the element name.  *attrib* is an optional dictionary containing
    element attributes. *extra* are additional element attributes given as
    keyword arguments.

    Example form:
        <tag attrib>text<child/>...</tag>tail

    """

    当前节点的标签名
    tag = None
    """The element's name."""

    当前节点的属性

    attrib = None
    """Dictionary of the element's attributes."""

    当前节点的内容
    text = None
    """
    Text before first subelement. This is either a string or the value None.
    Note that if there is no text, this attribute may be either
    None or the empty string, depending on the parser.

    """

    tail = None
    """
    Text after this element's end tag, but before the next sibling element's
    start tag.  This is either a string or the value None.  Note that if there
    was no text, this attribute may be either None or an empty string,
    depending on the parser.

    """

    def __init__(self, tag, attrib={}, **extra):
        if not isinstance(attrib, dict):
            raise TypeError("attrib must be dict, not %s" % (
                attrib.__class__.__name__,))
        attrib = attrib.copy()
        attrib.update(extra)
        self.tag = tag
        self.attrib = attrib
        self._children = []

    def __repr__(self):
        return "<%s %r at %#x>" % (self.__class__.__name__, self.tag, id(self))

    def makeelement(self, tag, attrib):
        创建一个新节点
        """Create a new element with the same type.

        *tag* is a string containing the element name.
        *attrib* is a dictionary containing the element attributes.

        Do not call this method, use the SubElement factory function instead.

        """
        return self.__class__(tag, attrib)

    def copy(self):
        """Return copy of current element.

        This creates a shallow copy. Subelements will be shared with the
        original tree.

        """
        elem = self.makeelement(self.tag, self.attrib)
        elem.text = self.text
        elem.tail = self.tail
        elem[:] = self
        return elem

    def __len__(self):
        return len(self._children)

    def __bool__(self):
        warnings.warn(
            "The behavior of this method will change in future versions.  "
            "Use specific 'len(elem)' or 'elem is not None' test instead.",
            FutureWarning, stacklevel=2
            )
        return len(self._children) != 0 # emulate old behaviour, for now

    def __getitem__(self, index):
        return self._children[index]

    def __setitem__(self, index, element):
        # if isinstance(index, slice):
        #     for elt in element:
        #         assert iselement(elt)
        # else:
        #     assert iselement(element)
        self._children[index] = element

    def __delitem__(self, index):
        del self._children[index]

    def append(self, subelement):
        为当前节点追加一个子节点
        """Add *subelement* to the end of this element.

        The new element will appear in document order after the last existing
        subelement (or directly after the text, if it's the first subelement),
        but before the end tag for this element.

        """
        self._assert_is_element(subelement)
        self._children.append(subelement)

    def extend(self, elements):
        为当前节点扩展 n 个子节点
        """Append subelements from a sequence.

        *elements* is a sequence with zero or more elements.

        """
        for element in elements:
            self._assert_is_element(element)
        self._children.extend(elements)

    def insert(self, index, subelement):
        在当前节点的子节点中插入某个节点,即:为当前节点创建子节点,然后插入指定位置
        """Insert *subelement* at position *index*."""
        self._assert_is_element(subelement)
        self._children.insert(index, subelement)

    def _assert_is_element(self, e):
        # Need to refer to the actual Python implementation, not the
        # shadowing C implementation.
        if not isinstance(e, _Element_Py):
            raise TypeError('expected an Element, not %s' % type(e).__name__)

    def remove(self, subelement):
        在当前节点在子节点中删除某个节点
        """Remove matching subelement.

        Unlike the find methods, this method compares elements based on
        identity, NOT ON tag value or contents.  To remove subelements by
        other means, the easiest way is to use a list comprehension to
        select what elements to keep, and then use slice assignment to update
        the parent element.

        ValueError is raised if a matching element could not be found.

        """
        # assert iselement(element)
        self._children.remove(subelement)

    def getchildren(self):
        获取所有的子节点(废弃)
        """(Deprecated) Return all subelements.

        Elements are returned in document order.

        """
        warnings.warn(
            "This method will be removed in future versions.  "
            "Use 'list(elem)' or iteration over elem instead.",
            DeprecationWarning, stacklevel=2
            )
        return self._children

    def find(self, path, namespaces=None):
        获取第一个寻找到的子节点
        """Find first matching element by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return the first matching element, or None if no element was found.

        """
        return ElementPath.find(self, path, namespaces)

    def findtext(self, path, default=None, namespaces=None):
        获取第一个寻找到的子节点的内容
        """Find text for first matching element by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *default* is the value to return if the element was not found,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return text content of first matching element, or default value if
        none was found.  Note that if an element is found having no text
        content, the empty string is returned.

        """
        return ElementPath.findtext(self, path, default, namespaces)

    def findall(self, path, namespaces=None):
        获取所有的子节点
        """Find all matching subelements by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Returns list containing all matching elements in document order.

        """
        return ElementPath.findall(self, path, namespaces)

    def iterfind(self, path, namespaces=None):
        获取所有指定的节点,并创建一个迭代器(可以被for循环)
        """Find all matching subelements by tag name or path.

        *path* is a string having either an element tag or an XPath,
        *namespaces* is an optional mapping from namespace prefix to full name.

        Return an iterable yielding all matching elements in document order.

        """
        return ElementPath.iterfind(self, path, namespaces)

    def clear(self):
        清空节点
        """Reset element.

        This function removes all subelements, clears all attributes, and sets
        the text and tail attributes to None.

        """
        self.attrib.clear()
        self._children = []
        self.text = self.tail = None

    def get(self, key, default=None):
        获取当前节点的属性值
        """Get element attribute.

        Equivalent to attrib.get, but some implementations may handle this a
        bit more efficiently.  *key* is what attribute to look for, and
        *default* is what to return if the attribute was not found.

        Returns a string containing the attribute value, or the default if
        attribute was not found.

        """
        return self.attrib.get(key, default)

    def set(self, key, value):
        为当前节点设置属性值
        """Set element attribute.

        Equivalent to attrib[key] = value, but some implementations may handle
        this a bit more efficiently.  *key* is what attribute to set, and
        *value* is the attribute value to set it to.

        """
        self.attrib[key] = value

    def keys(self):
        获取当前节点的所有属性的 key

        """Get list of attribute names.

        Names are returned in an arbitrary order, just like an ordinary
        Python dict.  Equivalent to attrib.keys()

        """
        return self.attrib.keys()

    def items(self):
        获取当前节点的所有属性值,每个属性都是一个键值对
        """Get element attributes as a sequence.

        The attributes are returned in arbitrary order.  Equivalent to
        attrib.items().

        Return a list of (name, value) tuples.

        """
        return self.attrib.items()

    def iter(self, tag=None):
        在当前节点的子孙中根据节点名称寻找所有指定的节点,并返回一个迭代器(可以被for循环)。
        """Create tree iterator.

        The iterator loops over the element and all subelements in document
        order, returning all elements with a matching tag.

        If the tree structure is modified during iteration, new or removed
        elements may or may not be included.  To get a stable set, use the
        list() function on the iterator, and loop over the resulting list.

        *tag* is what tags to look for (default is to return all elements)

        Return an iterator containing all the matching elements.

        """
        if tag == "*":
            tag = None
        if tag is None or self.tag == tag:
            yield self
        for e in self._children:
            yield from e.iter(tag)

    # compatibility
    def getiterator(self, tag=None):
        # Change for a DeprecationWarning in 1.4
        warnings.warn(
            "This method will be removed in future versions.  "
            "Use 'elem.iter()' or 'list(elem.iter())' instead.",
            PendingDeprecationWarning, stacklevel=2
        )
        return list(self.iter(tag))

    def itertext(self):
        在当前节点的子孙中根据节点名称寻找所有指定的节点的内容,并返回一个迭代器(可以被for循环)。
        """Create text iterator.

        The iterator loops over the element and all subelements in document
        order, returning all inner text.

        """
        tag = self.tag
        if not isinstance(tag, str) and tag is not None:
            return
        if self.text:
            yield self.text
        for e in self:
            yield from e.itertext()
            if e.tail:
                yield e.tail

由于每个节点都具有以上的方法,并且在上一步骤中解析时均得到了root(xml文件的根节点),所有可以利用以上方法进行操作xml文件

操作1:遍历XML文档的所有内容

from xml.etree import ElementTree as ET

############ 解析方式一 ############
"""
# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)
"""
############ 解析方式二 ############

# 直接解析xml文件
tree = ET.parse("xo.xml")

# 获取xml文件的根节点
root = tree.getroot()


### 操作

# 顶层标签
print(root.tag)


# 遍历XML文档的第二层
for child in root:
    # 第二层节点的标签名称和标签属性
    print(child.tag, child.attrib)
    # 遍历XML文档的第三层
    for i in child:
        # 第二层节点的标签名称和内容
        print(i.tag,i.text)

操作2:遍历XML中指定的节点

from xml.etree import ElementTree as ET

############ 解析方式一 ############
"""
# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)
"""
############ 解析方式二 ############

# 直接解析xml文件
tree = ET.parse("xo.xml")

# 获取xml文件的根节点
root = tree.getroot()

### 操作

# 顶层标签
print(root.tag)

# 遍历XML中所有的year节点
for node in root.iter('year'):
    # 节点的标签名称和内容
    print(node.tag, node.text)

操作3:修改节点内容

  由于修改的节点时,均是在内存中进行,其不会影响文件中的内容。所以,如果想要修改,则需要重新将内存中的内容写到文件。

方法1:解析字符串方式,修改并保存

from xml.etree import ElementTree as ET

############ 解析方式一 ############

# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)

############ 操作 ############

# 顶层标签
print(root.tag)

# 循环所有的year节点
for node in root.iter('year'):
    # 将year节点中的内容自增一
    new_year = int(node.text) + 1
    node.text = str(new_year)

    # 设置属性
    node.set('name', 'alex')
    node.set('age', '18')
    # 删除属性
    del node.attrib['name']


############ 保存文件 ############
tree = ET.ElementTree(root)
tree.write("newnew.xml", encoding='utf-8')

解析字符串方式,修改,保存

方法2:解析文件方式,修改并保存

from xml.etree import ElementTree as ET

############ 解析方式二 ############

# 直接解析xml文件
tree = ET.parse("xo.xml")

# 获取xml文件的根节点
root = tree.getroot()

############ 操作 ############

# 顶层标签
print(root.tag)

# 循环所有的year节点
for node in root.iter('year'):
    # 将year节点中的内容自增一
    new_year = int(node.text) + 1
    node.text = str(new_year)

    # 设置属性
    node.set('name', 'alex')
    node.set('age', '18')
    # 删除属性
    del node.attrib['name']


############ 保存文件 ############
tree.write("newnew.xml", encoding='utf-8')

解析文件方式,修改,保存

操作4:删除节点

方法1:解析字符串方式打开,删除并保存

from xml.etree import ElementTree as ET

############ 解析字符串方式打开 ############

# 打开文件,读取XML内容
str_xml = open('xo.xml', 'r').read()

# 将字符串解析成xml特殊对象,root代指xml文件的根节点
root = ET.XML(str_xml)

############ 操作 ############

# 顶层标签
print(root.tag)

# 遍历data下的所有country节点
for country in root.findall('country'):
    # 获取每一个country节点下rank节点的内容
    rank = int(country.find('rank').text)

    if rank > 50:
        # 删除指定country节点
        root.remove(country)

############ 保存文件 ############
tree = ET.ElementTree(root)
tree.write("newnew.xml", encoding='utf-8')

解析字符串方式打开,删除,保存

方法2:解析文件方式打开,删除并保存

from xml.etree import ElementTree as ET

############ 解析文件方式 ############

# 直接解析xml文件
tree = ET.parse("xo.xml")

# 获取xml文件的根节点
root = tree.getroot()

############ 操作 ############

# 顶层标签
print(root.tag)

# 遍历data下的所有country节点
for country in root.findall('country'):
    # 获取每一个country节点下rank节点的内容
    rank = int(country.find('rank').text)

    if rank > 50:
        # 删除指定country节点
        root.remove(country)

############ 保存文件 ############
tree.write("newnew.xml", encoding='utf-8')

解析文件方式打开,删除,保存

3、创建XML

创建方法1:

from xml.etree import ElementTree as ET


# 创建根节点
root = ET.Element("famliy")


# 创建节点大儿子
son1 = ET.Element('son', {'name': '儿1'})
# 创建小儿子
son2 = ET.Element('son', {"name": '儿2'})

# 在大儿子中创建两个孙子
grandson1 = ET.Element('grandson', {'name': '儿11'})
grandson2 = ET.Element('grandson', {'name': '儿12'})
son1.append(grandson1)
son1.append(grandson2)


# 把儿子添加到根节点中
root.append(son1)
root.append(son1)

tree = ET.ElementTree(root)
tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)

创建方法2:

from xml.etree import ElementTree as ET

# 创建根节点
root = ET.Element("famliy")


# 创建大儿子
# son1 = ET.Element('son', {'name': '儿1'})
son1 = root.makeelement('son', {'name': '儿1'})
# 创建小儿子
# son2 = ET.Element('son', {"name": '儿2'})
son2 = root.makeelement('son', {"name": '儿2'})

# 在大儿子中创建两个孙子
# grandson1 = ET.Element('grandson', {'name': '儿11'})
grandson1 = son1.makeelement('grandson', {'name': '儿11'})
# grandson2 = ET.Element('grandson', {'name': '儿12'})
grandson2 = son1.makeelement('grandson', {'name': '儿12'})

son1.append(grandson1)
son1.append(grandson2)


# 把儿子添加到根节点中
root.append(son1)
root.append(son1)

tree = ET.ElementTree(root)
tree.write('oooo.xml',encoding='utf-8', short_empty_elements=False)

创建方法3:

from xml.etree import ElementTree as ET


# 创建根节点
root = ET.Element("famliy")


# 创建节点大儿子
son1 = ET.SubElement(root, "son", attrib={'name': '儿1'})
# 创建小儿子
son2 = ET.SubElement(root, "son", attrib={"name": "儿2"})

# 在大儿子中创建一个孙子
grandson1 = ET.SubElement(son1, "age", attrib={'name': '儿11'})
grandson1.text = '孙子'


et = ET.ElementTree(root)  #生成文档对象
et.write("test.xml", encoding="utf-8", xml_declaration=True, short_empty_elements=False)

由于原生保存的XML时默认无缩进,如果要设置缩进的话,需要修改保存方式:

from xml.etree import ElementTree as ET
from xml.dom import minidom


def prettify(elem):
    """将节点转换成字符串,并添加缩进。
    """
    rough_string = ET.tostring(elem, 'utf-8')
    reparsed = minidom.parseString(rough_string)
    return reparsed.toprettyxml(indent="\t")

# 创建根节点
root = ET.Element("famliy")


# 创建大儿子
# son1 = ET.Element('son', {'name': '儿1'})
son1 = root.makeelement('son', {'name': '儿1'})
# 创建小儿子
# son2 = ET.Element('son', {"name": '儿2'})
son2 = root.makeelement('son', {"name": '儿2'})

# 在大儿子中创建两个孙子
# grandson1 = ET.Element('grandson', {'name': '儿11'})
grandson1 = son1.makeelement('grandson', {'name': '儿11'})
# grandson2 = ET.Element('grandson', {'name': '儿12'})
grandson2 = son1.makeelement('grandson', {'name': '儿12'})

son1.append(grandson1)
son1.append(grandson2)


# 把儿子添加到根节点中
root.append(son1)
root.append(son1)


raw_str = prettify(root)

f = open("xxxoo.xml",'w',encoding='utf-8')
f.write(raw_str)
f.close()

4、命名空间

http://www.w3school.com.cn/xml/xml_namespaces.asp

17.12 requests

  Requests 是使用 Apache2 Licensed 许可证的 基于Python开发的HTTP 库,其在Python内置模块的基础上进行了高度的封装,从而使得Pythoner进行网络请求时,变得美好了许多,使用Requests可以轻而易举的 完成浏览器可有的任何操作。

1、模块安装

pip3 install requests

2、模块使用

# GET请求
# 无参数案例
import requests
 
ret = requests.get('https://github.com/timeline.json')
 
print(ret.url)
print(ret.text)

# 有参数实例
import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.get("http://httpbin.org/get", params=payload)
 
print(ret.url)
print(ret.text)
# POST请求

# 1、基本POST实例
import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.post("http://httpbin.org/post", data=payload)
 
print(ret.text)
 
# 2、发送请求头和数据实例
import requests
import json
 
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}
 
ret = requests.post(url, data=json.dumps(payload), headers=headers)
 
print(ret.text)
print(ret.cookies)
# POST请求

# 1、基本POST实例
import requests
 
payload = {'key1': 'value1', 'key2': 'value2'}
ret = requests.post("http://httpbin.org/post", data=payload)
 
print(ret.text)
 
# 2、发送请求头和数据实例
import requests
import json
 
url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}
 
ret = requests.post(url, data=json.dumps(payload), headers=headers)
 
print(ret.text)
print(ret.cookies)
# 其他请求
requests.get(url, params=None, **kwargs)
requests.post(url, data=None, json=None, **kwargs)
requests.put(url, data=None, **kwargs)
requests.head(url, **kwargs)
requests.delete(url, **kwargs)
requests.patch(url, data=None, **kwargs)
requests.options(url, **kwargs)
 
# 以上方法均是在此方法的基础上构建
requests.request(method, url, **kwargs)

其他资料:http://cn.python-requests.org/zh_CN/latest/

3、http请求和xml实例

实例1:检测QQ账号是否在线

import urllib
import requests
from xml.etree import ElementTree as ET

# 使用内置模块urllib发送HTTP请求,或者XML格式内容
"""
f = urllib.request.urlopen('http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=424662508')
result = f.read().decode('utf-8')
"""


# 使用第三方模块requests发送HTTP请求,或者XML格式内容
r = requests.get('http://www.webxml.com.cn//webservices/qqOnlineWebService.asmx/qqCheckOnline?qqCode=424662508')
result = r.text

# 解析XML格式内容
node = ET.XML(result)

# 获取内容
if node.text == "Y":
    print("在线")
else:
    print("离线")

实例2:查看火车停靠信息

import urllib
import requests
from xml.etree import ElementTree as ET

# 使用内置模块urllib发送HTTP请求,或者XML格式内容
"""
f = urllib.request.urlopen('http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=G666&UserID=')
result = f.read().decode('utf-8')
"""

# 使用第三方模块requests发送HTTP请求,或者XML格式内容
r = requests.get('http://www.webxml.com.cn/WebServices/TrainTimeWebService.asmx/getDetailInfoByTrainCode?TrainCode=G666&UserID=')
result = r.text

# 解析XML格式内容
root = ET.XML(result)
for node in root.iter('TrainDetailInfo'):
    print(node.find('TrainStation').text,node.find('StartTime').text,node.tag,node.attrib)

更多实例:http://www.cnblogs.com/wupeiqi/archive/2012/11/18/2776014.html

17.13 ConfigParser

configparser用来处理特定格式的文件,其本质是利用open来操作文件

# 注释1
; 注释2
 
[section1]    # 节点
k1 = v1       #
k2:v2         #
 
[section2]    # 节点
k1 = v1       #
# 获取所有节点
import configparser

config_file = 'test.ini'
config = configparser.ConfigParser()
config.read(config_file, encoding='utf-8')
ret = config.sections()
print(ret)

# 判断某个节点是否存在
import configparser

config_file = 'test.ini'
config = configparser.ConfigParser()
config.read(config_file, encoding='utf-8')
has_se = config.has_section('section1')
print(has_se)

# 添加节点
import configparser

config_file = 'test.ini'
config = configparser.ConfigParser()
config.read(config_file, encoding='utf-8')
config.add_section('section3')
config.write(open(config_file, 'w'))

# 删除节点
import configparser

config_file = 'test.ini'
config = configparser.ConfigParser()
config.read(config_file, encoding='utf-8')
config.remove_section('section3')
config.write(open(config_file, 'w'))

# 获取指定节点下所有的键值对
import configparser

config_file = 'test.ini'
config = configparser.ConfigParser()
config.read(config_file, encoding='utf-8')
kv = config.items('section2')

# 获取指定节点下所有键
import configparser

config_file = 'test.ini'
config = configparser.ConfigParser()
config.read(config_file, encoding='utf-8')
ret = config.options('section1')

# 获取指定节点下指定key的值
import configparser

config_file = 'test.ini'
config = configparser.ConfigParser()
config.read(config_file, encoding='utf-8')
v = config.get('section1', 'k1')
# v = config.getint('section1', 'k1')
# v = config.getfloat('section1', 'k1')
# v = config.getboolean('section1', 'k1')

# 检查指定节点内的键值对
has_opt = config.has_option('section1', 'k1')
print(has_opt)

# 删除指定节点内的键值对
config.remove_option('section1', 'k1')
config.write(open('xxxooo', 'w'))

# 设置指定节点内的键值对
config.set('section1', 'k10', "123")
config.write(open('xxxooo', 'w'))
print(v)

17.14 xml convert json

#!/usr/bin/env python
# encoding: utf-8

# json转换xml
import xmltodict
import json


def python_conver_xml_to_json():
    """
    demo Python conversion between xml and json
    :return:
    """
    xml = """
    <data>
        <country name="Liechtenstein">
            <rank updated="yes">2</rank>
            <year>2023</year>
            <gdppc>141100</gdppc>
            <neighbor direction="E" name="Austria" />
            <neighbor direction="W" name="Switzerland" />
        </country>
        <country name="Singapore">
            <rank updated="yes">5</rank>
            <year>2026</year>
            <gdppc>59900</gdppc>
            <neighbor direction="N" name="Malaysia" />
        </country>
        <country name="Panama">
            <rank updated="yes">69</rank>
            <year>2026</year>
            <gdppc>13600</gdppc>
            <neighbor direction="W" name="Costa Rica" />
            <neighbor direction="E" name="Colombia" />
        </country>
    </data>
    """
    dic = xmltodict.parse(xml)
    # json_str = json.dumps(dic)  # 默认没有进行格式化
    json_str = json.dumps(dic, indent=1)  # 默认没有进行格式化
    print(json_str)


def python_conver_json_to_xml():
    """
    demo Python conversion between xml and json
    :return:
    """
    dic =  {
        'page': {
            'title': 'King Crimson',
            'ns': 0,
            'revision': {
                'id': 547909091,
            }
        }
    }
    xml = xmltodict.unparse(dic)
    print(xml)


if __name__ == '__main__':
    python_conver_xml_to_json()
    python_conver_json_to_xml()

练习题

1、通过HTTP请求和XML实现获取电视节目

     API:http://www.webxml.com.cn/webservices/ChinaTVprogramWebService.asmx  

2、通过HTTP请求和JSON实现获取天气状况

     API:http://wthrcdn.etouch.cn/weather_mini?city=北京

posted @ 2016-05-28 17:33  每天进步一点点!!!  阅读(6292)  评论(0编辑  收藏  举报