python---基础知识回顾(四)(模块sys,os,random,hashlib,re,序列化json和pickle,xml,shutil,configparser,logging,datetime和time,其他)
前提:dir,__all__,help,__doc__,__file__
dir:可以用来查看模块中的所有特性(函数,类,变量等)
>>> import copy >>> dir(copy) ['Error', 'PyStringMap', '_EmptyClass', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_c opy_dispatch', '_copy_immutable', '_copy_with_constructor', '_copy_with_copy_met hod', '_deepcopy_atomic', '_deepcopy_dict', '_deepcopy_dispatch', '_deepcopy_lis t', '_deepcopy_method', '_deepcopy_tuple', '_keep_alive', '_reconstruct', 'built ins', 'copy', 'deepcopy', 'dispatch_table', 'error', 'name', 't', 'weakref'] >>> [x for x in dir(copy) if not x.startswith('_')] ['Error', 'PyStringMap', 'builtins', 'copy', 'deepcopy', 'dispatch_table', 'erro r', 'name', 't', 'weakref']
__all__:(dir中有这个变量)这个变量中包含了一个列表。和我们使用dir加上列表推导式相似。
>>> copy.__all__ ['Error', 'copy', 'deepcopy']
他定义了模块的公有接口,或者说他告诉解释器当我们使用
from copy import *
时,会导入模块的那些函数方法。__all__在编写模块是,可以过滤掉大多不需要的函数方法。若是没有__all__,使用import *会将除了以下划线开头的所有全局名称导入
help:获取帮助,提供日常需要的信息
>>> help(copy) Help on module copy: NAME copy - Generic (shallow and deep) copying operations. DESCRIPTION Interface summary: .... >>> help(copy.copy) Help on function copy in module copy: copy(x) Shallow copy operation on arbitrary Python objects. See the module's __doc__ string for more info.
引用了__doc__特性,事实上是使用了文档字符串(写在模块开头,或者函数开头的)
__file__:获取文件位置:便于查看文件源代码位置:
>>> copy.__file__ 'C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python35\\lib\\copy .py'
一.sys
sys.argv 命令行参数List,第一个元素是程序本身路径
import sys args = sys.argv[1:] #默认0是程序名 args.reverse() print(','.join(args)) D:\MyPython\day24\基础回顾\01装饰器>python test.py ag1 ag2 ag3 ag3,ag2,ag1
sys.exit(n) 退出程序,正常退出时exit(0)
>>> import sys
>>> sys.exit()
sys.version 获取Python解释程序的版本信息 #python --version
>>> sys.version '3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
sys.path 返回模块的搜索路径,初始化时使用PYTHONPATH环境变量的值
>>> sys.path ['', 'C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python35\\pyth on35.zip', 'C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python35 \\DLLs', 'C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python35\\ lib', 'C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python35', 'C :\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python35\\lib\\site-p ackages']
sys.platform 返回操作系统平台名称
>>> sys.platform 'win32'
sys.stdin 输入相关 有读取属性r 从屏幕中读取
>>> var = sys.stdin.read() aasddsa ^Z >>> var 'aasddsa\n' >>> var = sys.stdin.read(5) dsad >>> var 'dsad\n' >>>
sys.stdout 输出相关 有写入属性w 向屏幕中写入
>>> sys.stdout.write('dasf') dasf4 >>> sys.stdout.flush() #刷新当前屏幕 shell中无用
sys.stderror 错误相关 有写入属性w 向屏幕写入(会含有输出错误信息的信息长度)
print(sys.stderr) print(sys.stderr.write("errfawfa")) <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'> 8 errfawfa
二,os
os.getcwd() 获取当前工作目录,即当前python脚本工作的目录路径 os.chdir("dirname") 改变当前脚本工作目录;相当于shell下cd os.curdir 返回当前目录: ('.') os.pardir 获取当前目录的父目录字符串名:('..') os.makedirs('dir1/dir2') 可生成多层递归目录 os.removedirs('dirname1') 若目录为空,则删除,并递归到上一级目录,如若也为空,则删除,依此类推 os.mkdir('dirname') 生成单级目录;相当于shell中mkdir dirname os.rmdir('dirname') 删除单级空目录,若目录不为空则无法删除,报错;相当于shell中rmdir dirname os.listdir('dirname') 列出指定目录下的所有文件和子目录,包括隐藏文件,并以列表方式打印 os.remove() 删除一个文件 os.rename("oldname","new") 重命名文件/目录 os.stat('path/filename') 获取文件/目录信息 os.sep 操作系统特定的路径分隔符,win下为"\\",Linux下为"/" os.linesep 当前平台使用的行终止符,win下为"\t\n",Linux下为"\n" os.pathsep 用于分割文件路径的字符串 os.name 字符串指示当前使用平台。win->'nt'; Linux->'posix'
>>> sys.platform 'win32' >>> os.name 'nt'
os.system("bash command") 运行shell命令,直接显示。用于运行外部程序
>>> os.system('ls -al') total 50565
os.environ 获取系统环境变量 在系统中高级环境变量Path设置中的数据
os.path.abspath(path) 返回path规范化的绝对路径
os.path.split(path) 将path分割成目录和文件名二元组返回
os.path.dirname(path) 返回path的目录。其实就是os.path.split(path)的第一个元素,就是返回上级目录
>>> os.path.dirname("c:/sys") 'c:/' >>> os.path.dirname("c:/sys/windows/1.txt") 'c:/sys/windows'
os.path.basename(path) 返回path最后的文件名。如何path以/或\结尾,那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path) 如果path存在,返回True;如果path不存在,返回False
os.path.isabs(path) 如果path是绝对路径,返回True
os.path.isfile(path) 如果path是一个存在的文件,返回True。否则返回False
os.path.isdir(path) 如果path是一个存在的目录,则返回True。否则返回False
os.path.join(path1[, path2[, ...]]) 将多个路径组合后返回,第一个绝对路径之前的参数将被忽略
os.path.getatime(path) 返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path) 返回path所指向的文件或者目录的最后修改时间 是时间戳
三:random
该模块包括返回随机数的函数。可以用于模拟或者产生随机输出的程序。
>>> random.__all__ ['Random', 'seed', 'random', 'uniform', 'randint', 'choice', 'sample', 'randrang e', 'shuffle', 'normalvariate', 'lognormvariate', 'expovariate', 'vonmisesvariat e', 'gammavariate', 'triangular', 'gauss', 'betavariate', 'paretovariate', 'weib ullvariate', 'getstate', 'setstate', 'getrandbits', 'SystemRandom']
注意:事实上,所产生的数字都是伪随机数,也就是说他们看起来是完全随机的,实际上,他们是以一个可预测的系统作为基础。不过,已经很不错了。若是想实现真正的随机可以使用os中的urandom或者random中的SystemRandom
>>> random.random() #返回一个在0-之间的随机数 0.5134022843262868 >>> help(random.random) Help on built-in function random: random(...) method of random.Random instance random() -> x in the interval [0, 1). >>> random.randint(1,100) #返回一个在1,100之间的整数 20 >>> random.randrange(1,100) 80
四:hashlib
用于加密相关的操作,代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法
import hashlib
######### md5 ########
h2 = hashlib.md5() h2.update(bytes('123456', encoding='utf-8')) print(h2.hexdigest()) #是字符串十六进制 print(h2.digest()) #是字节byte型 通过.hex()可以转换为上面的字符串十六进制
补充:
digest()
>>> help(hashlib._hashlib.HASH.digest) Help on method_descriptor: digest(...) Return the digest value as a string of binary data. 返回一个bytes 八位一字节(ASCII),对于(编码的字符,若是ASCII中字符则直接显示,否则按照编码进行转换)
b'\xeaHWo0\xbe\x16i\x97\x16\x99\xc0\x9a\xd0\\\x94'
对于bytes编码的字符,若是ASCII中字符则直接显示,否则按照编码进行转换
>>> b = bytes("a",encoding="utf-8") >>> b b'a' >>> b = bytes("a你",encoding="utf-8") >>> b b'a\xe4\xbd\xa0'
hexdigest()
>>> help(hashlib._hashlib.HASH.hexdigest) Help on method_descriptor: hexdigest(...) Return the digest value as a string of hexadecimal digits. 返回一个十六进制字符串str类型
'ea48576f30be1669971699c09ad05c94'
-------------------------------------------------------------------------------------
digest()转hexdigest()
>>> h2.digest().hex()
-------------------------------------------------------------------------------------
hexdigest()转digest()
需要使用binascii模块
>>> help(binascii) Help on built-in module binascii: NAME binascii - Conversion between binary data and ASCII
用于转换 --- 在二进制和ASCII码之间
binascii中a2b_hex
>>> help(binascii.a2b_hex) Help on built-in function a2b_hex in module binascii: a2b_hex(hexstr, /) Binary data of hexadecimal representation.
将十六进制字符串转化为二进制用bytes类型显示(ASCII) hexstr must contain an even number of hex digits (upper or lower case). This function is also available as "unhexlify()".
其中十六进制必须是偶数
一般我们直接使用十六进制字符串,直接是32位字符串
转换成功:
>>> binascii.a2b_hex(h2.hexdigest()) b'\xeaHWo0\xbe\x16i\x97\x16\x99\xc0\x9a\xd0\\\x94' >>> h2.digest() b'\xeaHWo0\xbe\x16i\x97\x16\x99\xc0\x9a\xd0\\\x94' >>> h2.hexdigest() 'ea48576f30be1669971699c09ad05c94' >>> binascii.a2b_hex(h2.hexdigest()) b'\xeaHWo0\xbe\x16i\x97\x16\x99\xc0\x9a\xd0\\\x94'
-------------------------------------------------------------------------------------
其中md5算法时不能被反解的,但是可以被撞库,获取密码。
更加安全的方法是在加密算法中添加自定义key再来进行加密:
没有key时:
>>> h1 = hashlib.md5(bytes("123456",encoding="utf-8")) >>> h1.hexdigest() 'e10adc3949ba59abbe56e057f20f883e'
上面的数据很容易被撞库获取出来密码。尤其是这些简单的
-------------------------------------------------------------------------------------
使用自定义key时
>>> h2 = hashlib.md5(bytes("asd",encoding="utf-8")) >>> h2.update(bytes("123456",encoding="utf-8")) >>> h2.hexdigest() '1e55dbf412cb74d5e2c21fb6452408c7'
相当于使用两次update:
>>> h3 = hashlib.md5() >>> h3.update(byte("asd",encoding="utf-8")) >>> h3.update(bytes("123456",encoding="utf-8")) >>> h3.hexdigest() '1e55dbf412cb74d5e2c21fb6452408c7'
-------------------------------------------------------------------------------------
######## sha1 ########(这些算法的使用和md5相似)
h = hashlib.sha1() h.update(bytes('123456', encoding='utf-8')) print(h.hexdigest())
SHA1, SHA224, SHA256, SHA384, SHA512使用时一样的
-------------------------------------------------------------------------------------
python内置还有一个 hmac 模块,它内部对我们创建 key 和 内容 进行进一步的处理然后再加密
import hmac h = hmac.new(bytes('asd',encoding="utf-8")) h.update(bytes('123456',encoding="utf-8")) print(h.hexdigest())
#548b23c538c78d7053e3231919f78f36 与上面自定义key得出的密码不一样,说明在内部对key和内容又进行了处理
五:re正则模块
基础了解:正则表达式了解
(一)re模块中一些重要函数:
compile:根据原来包含正则表达式的字符串创建模式对象
函数re.compile将正则表达式(以字符串书写的)转换为模式对象,可以实现更加有效率的匹配。在调用search或者match函数的时候使用字符串正则表达式,他们也会在内部将正则表达式字符串转换为对象。与其每一次调用匹配时都去进行转换,不如在开始创建正则表达式时就使用对象,使用compile完成一次转换,后面就不需要转换。(调用方式也由re.search(正则字符串,匹配数据)----->正则对象.search(匹配数据))、
import fileinput, re pat = re.compile('Form: (.*) <.*?>$') #*、+限定符都是贪婪的,因为它们会尽可能多的匹配文字,只有在它们的后面加上一个?就可以实现非贪婪或最小匹配。 for line in fileinput.input(): m = pat.search(line) if m: print(m.group(1))
search:在字符串中寻找模式(浏览整个字符串去匹配第一个)
函数re.search会在给定的字符串中(浏览整个字符串去匹配第一个)寻找第一个匹配的字符串。一旦找到,返回匹配对象,否则为空
>>> import re >>> pat = re.compile("<(.*)>") >>> st = "a email from <ld@qq.com>" >>> ret = pat.search(st) >>> ret <_sre.SRE_Match object; span=(13, 24), match='<ld@qq.com>'>
否则为空:
>>> st2 = "a email from ld@qq.com" >>> ret2 = pat.search(st2) >>> ret2 >>>
匹配第一个:
pat2 = re.compile(r"\*\*(.+?)\*\*") st = "**this** is **book**" res = pat2.search(st) #只会匹配出第一个,然后不再继续向后匹配--->只会匹配出 **this** print(res.group(0)) #**this** print(res.group(1)) #this
#向下则全部会报错:因为只匹配到第一组,只有0和1索引 #Traceback (most recent call last): # File "D:/MyPython/day24/基础回顾/01装饰器/test.py", line #59, in <module> # print(res.group(2)) #IndexError: no such group print(res.group(2)) print(res.group(3))
注意:findall会匹配所有,sub会替换所有匹配项
match:从起始位置开始匹配(只是从字符串开头开始匹配),匹配成功返回一个对象,未匹配成功返回None
import re st = 'a email from <ld@qq.com>' pat = re.compile("<(.*)>") ret3 = pat.match(st) print(ret3) #None 因为字符串开头没有匹配到 st = '<ld@qq.com> a email from ' pat = re.compile("<(.*)>") ret3 = pat.match(st) #匹配数据出现在开头,才能够被匹配 print(ret3) #<_sre.SRE_Match object; span=(0, 11), match='<ld@qq.com>'>
split:会根据模式的匹配项来分割字符串。类似于字符串的split
some_text = "alpha, fawfgwa,,,,,fwafaw fwafaaaa" pat = re.compile('[, ]+') #[] 中括号中的任意数据 ret = pat.split(some_text) print(ret) #['alpha', 'fawfgwa', 'fwafaw', 'fwafaaaa']
findall:以列表形式返回给定模式的所有匹配项
some_text = "[faw非f]服务[wfw发a]adawf[你faw]" pat = re.compile('\[(.*?)\]') #记得取消贪婪模式*和+后面加上? ret = pat.findall(some_text) print(ret) #['faw非f', 'wfw发a', '你faw']
sub:替换匹配成功项的数据(含有替换字符串,替换个数)(相比于字符串replace来说查找更加灵活,替换也方便)
some_text = "[faw非f]服务[wfw发a]adawf[你faw]" pat = re.compile('faw') #记得取消贪婪模式*和+后面加上? ret = pat.sub(" haha ",some_text,2) print(ret) #[ haha 非f]服务[wfw发a]adawf[你 haha ]
escape(string):将字符串中所有条数的正则表达式字符转义(实用|用处不大)
some_text = "\. *+?" ret = re.escape(some_text) print(ret) #\\\.\ \*\+\?
(二)匹配对象和组
首先了解一下什么是“组”:
我们所定义的匹配模式中,放在圆括号中的子模式就是组(单独的一个组,详细理解看补充中的栗子)。默认组0是我们整个匹配模式。其他的组号判断,一句他左侧的括号数
st = "https://www.baidu.com" pat = re.compile("https://((.*?)\.(.*?)\.(.*))")
#注意不要在这个匹配模式最后小括号中加上?取消贪婪模式,会出错,贪婪模式需要找到下一个界限(这里设置的是.)在哪,才会去获取到自己的边界。但是这里到最后就结束了,并没有找到下一个边界,所以不会匹配到最后一个
https://((www).(baidu).(com)) 组0:https://www.baidu.com 组1:www.baidu.com 组2:www 组3:baidu 组4:com
匹配对象的重要方法:
group:获取给定子模式(组)的匹配项(若没有给出组号,默认为0.而且我们只能使用99个组(除0外),即1-99)
st = 'https://www.baidu.com' pat = re.compile("https://((.*?)\.(.*?)\.(.*))") ret3 = pat.search(st) print(ret3) #<_sre.SRE_Match object; span=(0, 21), match='https://www.baidu.com'> print(ret3.group(0)) #https://www.baidu.com print(ret3.group(1)) #www.baidu.com print(ret3.group(2)) #www print(ret3.group(3)) #baidu print(ret3.group(4)) #com
start:获取给定组的开始位置
end:获取给定组的结束位置
span:获取一个组的开始和结束位置
st = 'https://www.baidu.com' pat = re.compile("https://((.*?)\.(.*?)\.(.*))") ret3 = pat.search(st) print(ret3.group(2)) #www print(ret3.span(2)) #(8, 11) print(ret3.start(2)) #8 print(ret3.end(2)) #11
补充:组和sub
多用于页面模板渲染等。可以按照组号进行替换
st = 'https://www.baidu.com' pat = re.compile("https://((.*?)\.(.*?)\.(.*))") ret = pat.sub(r"<h1>\4</h1>",st) # ret = re.sub(pat,r'<h1>\4</h1>',st) print(ret) #只返回了替换组号的数据<h1>com</h1>
注意:上面出现的情况是因为:
sub函数进行替换时:
sub(pat,repl,string)是使用repl将pat中匹配模式字符串全部替换,而我们这里pat就是 https://((.*?)\.(.*?)\.(.*)) ------ https://www.baidu.com,而repl就是我们获取的 <h1>com</h1>,所以替换出了问题
这里顺便再举一个栗子,让我们好好重新了解一下什么是组:
pat2 = re.compile(r"\*\*(.+?)\*\*") st = "**this** is **book**" ret = pat2.sub(r'<em>\1</em>',st)
#注意:sub会将所有pat的匹配项进行替换 **this** 和 **book** 都是符合我们pat匹配模式的匹配项。他们每一个都是一个单独的组,都有自己的组号
#而这里使用\1是去匹配了每一个组的组号为1的数据,都进行了替换(而且替换是将组中的数据都进行替换) #将**this** --> <em>this</em>
#将**book** --> <em>book</em>
相当于将组0替换为我们想要的数据(这里我们想要注意,不能对组0进行替换操作,像<em>\0</em>是会得出错误信息的)
print(ret) #<em>this</em> is <em>book</em>
这里我们想要注意,不能对组0进行替换操作,像<em>\0</em>是会得出错误信息的
pat2 = re.compile(r"\*\*(.+?)\*\*") st = "**this** is **book**" ret = pat2.sub(r'<em>\0</em>',st) print(ret) #<em> </em> is <em> </em> 错误:\0获取的是空格
错误原因:
对一个正则表达式模式或部分模式两边添加圆括号将导致相关匹配存储到一个临时缓冲区中,所捕获的每个子匹配都按照在正则表达式模式中从左到右出现的顺序存储。
缓冲区编号从 1 开始,最多可存储 99 个捕获的子表达式。
进行替换时获取数据<em>\1</em> \组号会从缓存区中获取,而\0并没有存放在缓冲区中,所以获取的\0为空
而group(0)可以获取组0数据,是因为默认为0时,直接返回匹配对象中的字符串,而不是去缓冲区中去找(效率高)。但是对于sub替换时,则是无法去匹配对象中快速获取数据了
若是真的想替换整体数据,我们需要对整体加上括号作为一个组号来进行修改
pat2 = re.compile(r"(\*\*(.+)\*\*)") st = "**this** is **book**" ret = pat2.sub(r'<em>\1</em>',st) print(ret) #<em>**this** is **book**</em>
其他补充看分类标签中正则部分
六:序列化json和pickle
通过将对象序列化可以将其存储在变量或者文件中,可以保存当时对象的状态,实现其生命周期的延长。并且需要时可以再次将这个对象读取出来。
其中主要有两个模块pickle和json,上面两个都有相似的功能:dumps,dump(序列化),loads.load(反序列化)其中dumps和loads是直接转换为str字符串类型,进行操作,而dump和load是需要先将数据导入file-like Object中,然后读取出来。
pickle:用于【python特有的类型】 和 【python基本数据类型】间进行转换(只能在python内部工作)
cPickle和pickle功能是一样的,但是cPickle是c语言写的,速度要快写。 所以我们最好使用:
try: import cPickle as pickle except ImportError: import pickle
序列化:
>>> d = dict(name='Bob', age=20, score=88)
>>> id(d)
17302664
>>> pickle.dumps(d)
b'\x80\x03}q\x00(X\x03\x00\x00\x00ageq\x01K\x14X\x04\x00\x00\x00nameq\x02X\x03\x
00\x00\x00Bobq\x03X\x05\x00\x00\x00scoreq\x04KXu.'
#这些都是Python保存的对象内部信息,是将在内存中存储的数据(对象)直接序列化
反序列化:
>>> d2 = pickle.loads(s) >>> d2 {'age': 20, 'name': 'Bob', 'score': 88} >>> id(d2) 17302472
注意:虽然pickle是将内存中的数据全部序列化,但是当反序列化后,获取的数据,只是数据相同,不再是原来的数据,由上面两个的id我们不难发现
json:用于【字符串】和 【python基本数据类型】 间进行转换(用于不同的编程语言之间传递对象)
与pickle相似,不过是将基本数据类型(列表,字典,string,int,float,bool,None等)转换为字符串
>>> d = dict(name='Bob', age=20, score=88) >>> json.dumps(d) '{"age": 20, "name": "Bob", "score": 88}'
补充:对于json序列化的这些基本数据类型。其实都是对象(python一切皆对象)
>>> dict(name='Bob', age=20, score=88) {'age': 20, 'name': 'Bob', 'score': 88} >>> type(dict) <class 'type'>
那么我们能不能直接使用json来序列化对象?
class MyDict: def __init__(self,name,age): self.name = name self.age = age d = MyDict("mk",6) json.dumps(d) #不行 #TypeError: <__main__.MyDict instance at 0x0000000002577BC8> is not JSON serializable不是可序列化对象
那么如何将一个对象变为可序列化对象。毕竟基础类型也是对象,而且序列化成功了。那么去看json.dumps提供的信息。查找时哪个参数会导致TypeError
def dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding='utf-8', default=None, sort_keys=False, **kw): """Serialize ``obj`` to a JSON formatted ``str``. If ``skipkeys`` is false then ``dict`` keys that are not basic types (``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``) will be skipped instead of raising a ``TypeError``. ...... ``default(obj)`` is a function that should return a serializable version of obj or raise TypeError. The default simply raises TypeError. ......""" 逻辑代码
其中skipkeys:是说对于字典的key,如果skipkeys设置是默认false,那么key只能是基础数据类型
(``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
d = {'a':6,True:10,11:12,'d':[1,2,3]} #对值(只要是python基础类型像字典,列表的话)没有其他要求 json.dumps(d,skipkeys=False)
但是key是其他像元组(可哈希类型),就会报错,TypeError
当skipkeys设置为True时,其他数据类型(可哈希的)就可以作为键(但是会将这个键值对跳过):
d = {10:'faw',(12,2,):12} ret = json.dumps(d,skipkeys=True) #正确
{"10": "faw"} #元组的消失了
其中default可以就是把任意一个对象变成一个可序列为JSON的对象,但是我们需要为这个对象
专门写一个转换函数,再把函数传进去即可:
class MyDict: def __init__(self,name,age): self.name = name self.age = age d = MyDict("mk",6) def conv(obj): return { 'name':obj.name, 'age':obj.age } ret = json.dumps(d,default=conv) print(ret) #{"age": 6, "name": "mk"}
对于不同的类我们需要写入不同的转换函数,这样有点麻烦,不如在定义类时,写入__dict__,序列化是就使用obj.__dict__进行序列化
补充:注意下转换函数,是对我们要json序列化中的特殊类型去进行转换
def conv(date_obj): return date_obj.strftime("%Y-%m-%d %H:%M:%S") def batch_task_mgr(request): task_log_obj = models.TaskLogDetail.objects.filter(task=task_obj.task_obj).values("id","status","result",'date') #其中date是datetime类型,我们需要去转换
for task_log in task_log_obj: task_log['date2'] = task_log['date'] #向字典找那个再添加一个特殊数据 log_data = json.dumps(list(task_log_obj),default=conv)
#现在数据是一个列表
#[
#{'id':1,'status':0,'result':'ddd', 'date': datetime.datetime(2018, 6, 14, 23, 11, 33, 719467, tzinfo=<UTC>),'date2': 'date': datetime.datetime(2018, 6, 14, 23, 11, 33, 719467, tzinfo=<UTC>)},
#{'id':2,'status':0,'result':'ddd', 'date': datetime.datetime(2018, 6, 14, 23, 11, 33, 719467, tzinfo=<UTC>),'date2': 'date': datetime.datetime(2018, 6, 14, 23, 11, 33, 719467, tzinfo=<UTC>)},
#]
上面的转换函数,会去每一个列表数据中获取特殊的,不能直接json序列化的数据,task_log_obj[0]['date'],task_log_obj[0]['date2'],task_log_obj[1]['date'],task_log_obj[1]['date2']这些数据分别放入conv中,作为参数进行处理
json反序列化对象
我们传入的是一个"对象",返回的时候却是一个字典,这似乎不太符合我们的要求。同样的,我们查看帮助文档,看看反序列化时可否直接转换为对象
def loads(s, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw): ``object_hook`` is an optional function that will be called with the result of any object literal decode (a ``dict``). The return value of ``object_hook`` will be used instead of the ``dict``. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting).
object_hook:是一个回调函数,我们自定义的解码函数,通过一个自定义解码函数,我们可以将数据拿去再建一个对象
class MyDict: def __init__(self,name,age): self.name = name self.age = age def dconvo(dic): return MyDict(dic['name'],dic['age']) lret = json.loads(ret,object_hook=dconvo) print(lret) #<__main__.MyDict instance at 0x00000000024E6EC8>
七:xml
XML是实现不同语言或程序之间进行数据交换的协议。类似于HTML标签
<data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2023</year> <gdppc>141100</gdppc> <neighbor direction="E" name="Austria" /> <neighbor direction="W" name="Switzerland" /> </country> <country name="Singapore"> <rank updated="yes">5</rank> <year>2026</year> <gdppc>59900</gdppc> <neighbor direction="N" name="Malaysia" /> </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2026</year> <gdppc>13600</gdppc> <neighbor direction="W" name="Costa Rica" /> <neighbor direction="E" name="Colombia" /> </country> </data>
操作XML有三种方法:ElementTree,DOM和SAX。
DOM会把整个XML读入内存,解析为树,因此占用内存大,解析慢,优点是可以任意遍历树的节点。
SAX是流模式,边读边解析,占用内存小,解析快,缺点是我们需要自己处理事件。
ElementTree就像一个轻量级的DOM,具有方便友好的API。代码可用性好,速度快,消耗内存少,这里主要介绍ElementTree。
ElementTree
1.解析xml获取根节点:
from xml.etree import ElementTree as ET #打开文件,读取xml内容 str_xml = open('ts.xml','r').read() #将字符串解析为xml文档对象,获取xml文件的根节点 root = ET.XML(str_xml) print(root) #<Element 'data' at 0x253b7b8>
from xml.etree import ElementTree as ET tree = ET.parse('ts.xml') #获取整个文档树<xml.etree.ElementTree.ElementTree object at 0x00000000024C6EB8> root = tree.getroot() #获取根节点<Element 'data' at 0x24ce7f0>
2.操作xml:xml格式是结点内可以嵌套,对于每一个结点都有相同的操作功能,便于操作
节点类型:<class 'xml.etree.ElementTree.Element'>
查找节点操作的功能:
class Element(object): tag = None #当前节点的标签名,字符串 attrib = None·· #当前节点的属性,字典,会有多个属性和值 text = None #当前节点的文本内容,字符串 tail = None #看全部代码中的注释。是该节点闭合后到下一个节点开始之间的文本(注释吗?不是注释哟) def __init__(self, tag, attrib={}, **extra):... #初始化数据(上面的) def __repr__(self):... #输出指定格式字符串
class Test(object): def __init__(self,name,age): self.name = name self.age = age t = Test("asde",16) print(t) #<__main__.Test object at 0x0000000001E449E8> 上面直接打印对象,并不是很友好,显示的是对象的内存地址 class Test(object): def __init__(self,name,age): self.name = name self.age = age def __str__(self): return "%s is %s"%(self.name,self.age) t = Test("asde",16) print(t) #asde is 16 打印操作会首先尝试__str__和str内置函数(print运行的内部等价形式),它通常应该返回一个友好的显示 但是在命令行中直接输出对象时,还是输出对象地址 >>> class Test(object): ... def __init__(self,name,age): ... self.name = name ... self.age = age ... def __repr__(self): ... return "%s is %s"%(self.name,self.age) ... >>> t = Test("asde",16) >>> t asde is 16 而__repr__不止对print对于命令行也是十分友好的 __repr__是面向程序开发者的,__str__是针对于用户
def makeelement(self, tag, attrib):... #创建一个新节点
def copy(self):... #返回当前节点的复制本
def __len__(self):... #返回子节点个数
def __nonzero__(self):...
def __getitem__(self, index):... #字典,列表操作,用于修改节点中的文本内容。
#但是这里只允许是int索引列表,毕竟通过key寻找的结点不是唯一,所以用int索引寻找更加可靠
## # Returns the number of subelements. Note that this only counts # full elements; to check if there's any content in an element, you # have to check both the length and the <b>text</b> attribute. # # @return The number of subelements.
def __setitem__(self, index, element):...
def __delitem__(self, index):... def append(self, element):... #为当前节点追加一个元素 def extend(self, elements):... #为当前节点追加多个子节点 def insert(self, index, element):... #向当前节点中插入一个子节点 def remove(self, element):... #删除当前节点中的某个子节点 def getchildren(self):... #获取所有子节点(废弃:直接使用list(节点)或者直接迭代即可) def find(self, path, namespaces=None):... #获取第一个寻找到的子节点 def findtext(self, path, default=None, namespaces=None):... #获取第一个寻找到的子节点内容 def findall(self, path, namespaces=None):... #获取所有指定子节点 def iterfind(self, path, namespaces=None):... #获取所有指定的子节点,并创建一个迭代器,可以被for循环 def clear(self):... #清空节点 def get(self, key, default=None):... #获取当前节点的属性值 def set(self, key, value):... #设置当前节点的属性值 def keys(self):... #获取当前节点的所有属性的keys def items(self):... #获取当前节点的所有属性值,每个属性都是一个键值对 def iter(self, tag=None):... #在当前节点的子孙节点中寻找指定tag的所有节点,返回一个迭代器(for) def getiterator(self, tag=None):... #与上面相似,看全部代码(废弃) def itertext(self):... #根据节点名称self.tag寻找指定的结点内容,并返回一个迭代器
class Element(object): # <tag attrib>text<child/>...</tag>tail ## # (Attribute) Element tag. tag = None ## # (Attribute) Element attribute dictionary. Where possible, use # {@link #Element.get}, # {@link #Element.set}, # {@link #Element.keys}, and # {@link #Element.items} to access # element attributes. attrib = None ## # (Attribute) Text before first subelement. This is either a # string or the value None. Note that if there was no text, this # attribute may be either None or an empty string, depending on # the parser. text = None ## # (Attribute) Text after this element's end tag, but before the # next sibling element's start tag. This is either a string or # the value None. Note that if there was no text, this attribute # may be either None or an empty string, depending on the parser. tail = None # text after end tag, if any # constructor def __init__(self, tag, attrib={}, **extra): attrib = attrib.copy() attrib.update(extra) self.tag = tag self.attrib = attrib self._children = [] def __repr__(self): return "<Element %s at 0x%x>" % (repr(self.tag), id(self)) ## # Creates a new element object of the same type as this element. # # @param tag Element tag. # @param attrib Element attributes, given as a dictionary. # @return A new element instance. def makeelement(self, tag, attrib): return self.__class__(tag, attrib) ## # (Experimental) Copies the current element. This creates a # shallow copy; subelements will be shared with the original tree. # # @return A new element instance. def copy(self): elem = self.makeelement(self.tag, self.attrib) elem.text = self.text elem.tail = self.tail elem[:] = self return elem ## # Returns the number of subelements. Note that this only counts # full elements; to check if there's any content in an element, you # have to check both the length and the <b>text</b> attribute. # # @return The number of subelements. def __len__(self): return len(self._children) def __nonzero__(self): warnings.warn( "The behavior of this method will change in future versions. " "Use specific 'len(elem)' or 'elem is not None' test instead.", FutureWarning, stacklevel=2 ) return len(self._children) != 0 # emulate old behaviour, for now ## # Returns the given subelement, by index. # # @param index What subelement to return. # @return The given subelement. # @exception IndexError If the given element does not exist. def __getitem__(self, index): return self._children[index] ## # Replaces the given subelement, by index. # # @param index What subelement to replace. # @param element The new element value. # @exception IndexError If the given element does not exist. def __setitem__(self, index, element): # if isinstance(index, slice): # for elt in element: # assert iselement(elt) # else: # assert iselement(element) self._children[index] = element ## # Deletes the given subelement, by index. # # @param index What subelement to delete. # @exception IndexError If the given element does not exist. def __delitem__(self, index): del self._children[index] ## # Adds a subelement to the end of this element. In document order, # the new element will appear after the last existing subelement (or # directly after the text, if it's the first subelement), but before # the end tag for this element. # # @param element The element to add. def append(self, element): # assert iselement(element) self._children.append(element) ## # Appends subelements from a sequence. # # @param elements A sequence object with zero or more elements. # @since 1.3 def extend(self, elements): # for element in elements: # assert iselement(element) self._children.extend(elements) ## # Inserts a subelement at the given position in this element. # # @param index Where to insert the new subelement. def insert(self, index, element): # assert iselement(element) self._children.insert(index, element) ## # Removes a matching subelement. Unlike the <b>find</b> methods, # this method compares elements based on identity, not on tag # value or contents. To remove subelements by other means, the # easiest way is often to use a list comprehension to select what # elements to keep, and use slice assignment to update the parent # element. # # @param element What element to remove. # @exception ValueError If a matching element could not be found. def remove(self, element): # assert iselement(element) self._children.remove(element) ## # (Deprecated) Returns all subelements. The elements are returned # in document order. # # @return A list of subelements. # @defreturn list of Element instances def getchildren(self): warnings.warn( "This method will be removed in future versions. " "Use 'list(elem)' or iteration over elem instead.", DeprecationWarning, stacklevel=2 ) return self._children ## # Finds the first matching subelement, by tag name or path. # # @param path What element to look for. # @keyparam namespaces Optional namespace prefix map. # @return The first matching element, or None if no element was found. # @defreturn Element or None def find(self, path, namespaces=None): return ElementPath.find(self, path, namespaces) ## # Finds text for the first matching subelement, by tag name or path. # # @param path What element to look for. # @param default What to return if the element was not found. # @keyparam namespaces Optional namespace prefix map. # @return The text content of the first matching element, or the # default value no element was found. Note that if the element # is found, but has no text content, this method returns an # empty string. # @defreturn string def findtext(self, path, default=None, namespaces=None): return ElementPath.findtext(self, path, default, namespaces) ## # Finds all matching subelements, by tag name or path. # # @param path What element to look for. # @keyparam namespaces Optional namespace prefix map. # @return A list or other sequence containing all matching elements, # in document order. # @defreturn list of Element instances def findall(self, path, namespaces=None): return ElementPath.findall(self, path, namespaces) ## # Finds all matching subelements, by tag name or path. # # @param path What element to look for. # @keyparam namespaces Optional namespace prefix map. # @return An iterator or sequence containing all matching elements, # in document order. # @defreturn a generated sequence of Element instances def iterfind(self, path, namespaces=None): return ElementPath.iterfind(self, path, namespaces) ## # Resets an element. This function removes all subelements, clears # all attributes, and sets the <b>text</b> and <b>tail</b> attributes # to None. def clear(self): self.attrib.clear() self._children = [] self.text = self.tail = None ## # Gets an element attribute. Equivalent to <b>attrib.get</b>, but # some implementations may handle this a bit more efficiently. # # @param key What attribute to look for. # @param default What to return if the attribute was not found. # @return The attribute value, or the default value, if the # attribute was not found. # @defreturn string or None def get(self, key, default=None): return self.attrib.get(key, default) ## # Sets an element attribute. Equivalent to <b>attrib[key] = value</b>, # but some implementations may handle this a bit more efficiently. # # @param key What attribute to set. # @param value The attribute value. def set(self, key, value): self.attrib[key] = value ## # Gets a list of attribute names. The names are returned in an # arbitrary order (just like for an ordinary Python dictionary). # Equivalent to <b>attrib.keys()</b>. # # @return A list of element attribute names. # @defreturn list of strings def keys(self): return self.attrib.keys() ## # Gets element attributes, as a sequence. The attributes are # returned in an arbitrary order. Equivalent to <b>attrib.items()</b>. # # @return A list of (name, value) tuples for all attributes. # @defreturn list of (string, string) tuples def items(self): return self.attrib.items() ## # Creates a tree iterator. The iterator loops over this element # and all subelements, in document order, and returns all elements # with a matching tag. # <p> # If the tree structure is modified during iteration, new or removed # elements may or may not be included. To get a stable set, use the # list() function on the iterator, and loop over the resulting list. # # @param tag What tags to look for (default is to return all elements). # @return An iterator containing all the matching elements. # @defreturn iterator def iter(self, tag=None): if tag == "*": tag = None if tag is None or self.tag == tag: yield self for e in self._children: for e in e.iter(tag): yield e # compatibility def getiterator(self, tag=None): # Change for a DeprecationWarning in 1.4 warnings.warn( "This method will be removed in future versions. " "Use 'elem.iter()' or 'list(elem.iter())' instead.", PendingDeprecationWarning, stacklevel=2 ) return list(self.iter(tag)) ## # Creates a text iterator. The iterator loops over this element # and all subelements, in document order, and returns all inner # text. # # @return An iterator containing all inner text. # @defreturn iterator def itertext(self): tag = self.tag if not isinstance(tag, basestring) and tag is not None: return if self.text: yield self.text for e in self: for s in e.itertext(): yield s if e.tail: yield e.tail
注意:只有iter会获取匹配子孙节点,其他的只会获取下一级中的子节点
a:获取结点的成员属性以及使用iter遍历指定的子孙所有节点
tree = ET.parse('ts.xml') #获取整个文档树<xml.etree.ElementTree.ElementTree object at 0x00000000024C6EB8> root = tree.getroot() #获取根节点<Element 'data' at 0x24ce7f0> first_rank = root.iter('rank') for item in first_rank: print(item.tag,item.attrib,item.text,item.tail) 输出结果: ('rank', {'updated': 'yes'}, '2', '\n ') ('rank', {'updated': 'yes'}, '5', '\n ') ('rank', {'updated': 'yes'}, '69', '\n ')
b:获取遍历xml文档所有内容(3层)
from xml.etree import ElementTree as ET tree = ET.parse('ts.xml') #获取整个文档树<xml.etree.ElementTree.ElementTree object at 0x00000000024C6EB8> root = tree.getroot() #获取根节点<Element 'data' at 0x24ce7f0> #已知节点共3层 #遍历第二层 for child in root: print(child.tag,child.attrib) #遍历第三层 for item in child: print('---%s-%s'%(item.tag,item.text),item.attrib)
c:修改节点内容
注意:由于修改的节点时,均是在内存中进行,其不会影响文件中的内容。所以,如果想要修改,则需要重新将内存中的内容写到文件。
from xml.etree import ElementTree as ET data = open("ts.xml","r").read() root = ET.XML(data) for item in root.iter("rank"): #排名内容加一 new_rank = int(item.text) + 1 item.text = str(new_rank) #注意只能是字符串,否则TypeError #添加/修改属性 item.set('name','dsa') #删除属性 del item.attrib['name'] #因为属性本来就是字典 #上面操作都是在内存中进行的,我们需要将他保存在文件中 #而保存数据,需要树结构,然后向树中添加数据(将根节点添加上去,数据都会添加,就像链表一样) #保数据 tree = ET.ElementTree(root) tree.write("newts.xml",encoding="utf-8")
from xml.etree import ElementTree as ET tree = ET.parse('ts.xml') #获取整个文档树<xml.etree.ElementTree.ElementTree object at 0x00000000024C6EB8> root = tree.getroot() #获取根节点<Element 'data' at 0x24ce7f0> for item in root.iter("rank"): #排名内容加一 new_rank = int(item.text) + 1 item.text = str(new_rank) #注意只能是字符串,否则TypeError #添加/修改属性 item.set('name','dsa') #删除属性 del item.attrib['name'] #因为属性本来就是字典 #保数据 #树在解析文件时已经存在,这棵树是包含所有数据的(包括根节点) tree.write("newts.xml",encoding="utf-8")
d:删除节点,只写了解析字符串。另一种同上
from xml.etree import ElementTree as ET tree = ET.parse('ts.xml') #获取整个文档树<xml.etree.ElementTree.ElementTree object at 0x00000000024C6EB8> root = tree.getroot() #获取根节点<Element 'data' at 0x24ce7f0> for item in root.findall("country"): #排名小于大于50删除 rank = int(item.find("rank").text) if rank > 50: root.remove(item) #remove是根据父来移除子 #上面操作都是在内存中进行的,我们需要将他保存在文件中 #而保存数据,需要树结构,而且每个树需要一个根节点 #保数据 tree = ET.ElementTree(root) tree.write("newts.xml",encoding="utf-8")
3.创建xml文档
创建方式和上面的解析字符串后保存文档相似。都是需要树结构(文档对象),xml数据,其中数据需要通过根节点添加进入树结构中,才可以被保存
创建方式有3种:
方式一:先创建好各个元素节点(相互之间并没有关联),然后使用append向上级逐渐追加,最后将根节点放入文档对象,然后写入文件。使用Element和append
from xml.etree import ElementTree as ET #创建元素 root = ET.Element("School") major1 = ET.Element("major",{'name':"材料"}) major2 = ET.Element("major",{'name':"计算机"}) classes1 = ET.Element("class") classes1.text = "3" classes2 = ET.Element("class") classes2.text = "5" #创建连接关系 #专业添加班级数 major1.append(classes1) major2.append(classes2) #根节点添加专业 root.append(major1) root.append(major2) #保存 tree = ET.ElementTree(root) tree.write("school.xml",encoding="utf-8",xml_declaration=True)
注意:python2.7不支持中文写入,需要再做处理,分析:windows下cmd默认的编码是ASCII编码 ,windows的中文环境下编码是GBK(一般我们还是保存为utf-8)
在python2.7中,还是支持reload和setdefaultdefault这两个函数的(python3中已经全部统一为Unicode,所以不需要,被废弃了)
from xml.etree import ElementTree as ET import sys #创建元素 root = ET.Element("School") print(sys.getdefaultencoding()) #ascii reload(sys) sys.setdefaultencoding("utf-8") major1 = ET.Element("major",{'name':"材料"}) major2 = ET.Element("major",{'name':"计算机"}) classes1 = ET.Element("class") classes1.text = "3" classes2 = ET.Element("class") classes2.text = "5" #创建连接关系 #专业添加班级数 major1.append(classes1) major2.append(classes2) #根节点添加专业 root.append(major1) root.append(major2) #保存 tree = ET.ElementTree(root) tree.write("school.xml",encoding="utf-8",xml_declaration=True)
方式二:使用makeelement创建节点。和直接创建相似。但是这个方法的功能似乎有点不对,原意应该是使用当前节点去创建子节点。但是这里的实现方法和方法一一样。只是单纯创建一个独立的节点。并没有关联。python3中也提及使用SubElement代替他,所以只需简单了解这个方法
"""Create a new element with the same type. *tag* is a string containing the element name. *attrib* is a dictionary containing the element attributes. Do not call this method, use the SubElement factory function instead. """
from xml.etree import ElementTree as ET #创建元素 root = ET.Element("School") major1 = root.makeelement("major",{'name':"材料"}) major2 = root.makeelement("major",{'name':"计算机"}) classes1 = major1.makeelement("class",{}) classes1.text = "3" classes2 = major2.makeelement("class",{}) classes2.text = "5" #创建连接关系 #专业添加班级数 major1.append(classes1) major2.append(classes2) #根节点添加专业 root.append(major1) root.append(major2) #保存 tree = ET.ElementTree(root) tree.write("school2.xml",encoding="utf-8",xml_declaration=True)
major1 = root.makeelement("major",{'name':"材料"}) major2 = root.makeelement("major",{'name':"计算机"}) classes1 = major1.makeelement("class",{}) classes1.text = "3" classes2 = major2.makeelement("class",{}) classes2.text = "5" #创建连接关系 #专业添加班级数 major1.append(classes2) major2.append(classes1) #从这个例子中不难看出,这里只是单纯的创建一个独立的节点。对于是谁(哪个节点)创建的他,并没有任何关联 上面例子中使用 “材料”创建班级3个,“计算机”创建班级5个,但是添加关系时,调换顺序后,依然成功。所以说。这个功能有点鸡肋。要想实现可以直接使用Element类创建,或者使用sunelement方法
方法三:使用SubElement(该方法属于ElementTree)为当前节点创建子节点,创建时自动添加管理,不需要我们再去追加。
def SubElement(parent, tag, attrib={}, **extra)
from xml.etree import ElementTree as ET #创建元素 root = ET.Element("School") major1 = ET.SubElement(root,"major",{'name':"材料"}) major2 = ET.SubElement(root,"major",{'name':"计算机"}) classes1 = ET.SubElement(major1,"class",{}) classes1.text = "3" classes2 = ET.SubElement(major2,"class",{}) classes2.text = "5" #保存 tree = ET.ElementTree(root) tree.write("school3.xml",encoding="utf-8",xml_declaration=True)
注意:我们使用上面方法创建的是不带缩进的,完全写在一起。不利于人为浏览。所以我们需要设置缩进xml.dom中的minidom
from xml.etree import ElementTree as ET from xml.dom import minidom #对xml节点转换为字符串进行缩进处理 def prettify(elem): ''' 对xml节点转换为字符串进行缩进处理 :param elem: :return: ''' elem_str = ET.tostring(elem,encoding="utf-8") #先解析为字符串 reparsed = minidom.parseString(elem_str) #再将字符串解析为dom对象,因为dom中含有可以处理缩进的方法 return reparsed.toprettyxml(indent='\t') #将dom对象转换为字符串(其中添加缩进) #创建元素 root = ET.Element("School") major1 = ET.SubElement(root,"major",{'name':"材料"}) major2 = ET.SubElement(root,"major",{'name':"计算机"}) classes1 = ET.SubElement(major1,"class",{}) classes1.text = "3" classes2 = ET.SubElement(major2,"class",{}) classes2.text = "5" raw_string = prettify(root) #保存 fp = open("school5.xml","w",encoding="utf-8") fp.write(raw_string) fp.close()
4.命名空间:
python操作命名空间:
ET.register_namespace('com',"http://www.company.com") #some name # build a tree structure root = ET.Element("{http://www.company.com}STUFF") body = ET.SubElement(root, "{http://www.company.com}MORE_STUFF", attrib={"{http://www.company.com}hhh": "123"}) body.text = "STUFF EVERYWHERE!" # wrap it in an ElementTree instance, and save as XML tree = ET.ElementTree(root)
八:shutil(高级的 文件、文件夹、压缩包 处理模块)
对外接口:
__all__ = ["copyfileobj", "copyfile", "copymode", "copystat", "copy", "copy2", "copytree", "move", "rmtree", "make_archive","unpack_archive",
...]
1.copyfileobj(fsrc, fdst, length=16*1024)将文件内容拷贝拷贝到另一个文件中去(需要将两个文件打开,注意编码)
import shutil shutil.copyfileobj(open('school3.xml','r',encoding="utf-8"),open('new_sch.xml','w',encoding="utf-8"))
2.copyfile(src, dst, *, follow_symlinks=True):拷贝文件
import shutil shutil.copyfile('school3.xml','new_sch3.xml')
3.copymode(src, dst, *, follow_symlinks=True):仅仅拷贝权限,其他内容不变
import shutil shutil.copymode('newts.xml','2.xml')
4.copystat(src, dst, *, follow_symlinks=True):仅拷贝状态(主要是访问时间,修改时间)创建时间没有修改,而且dst文件必须已存在(mode bits, atime, mtime, flags)
import shutil shutil.copystat('newts.xml','2.xml')
5.copy(src, dst, *, follow_symlinks=True):拷贝内容数据和权限Copy data and mode bits ("cp src dst")
import shutil ret = shutil.copy('newts.xml','2.xml') #返回目的文件名 print(ret)#2.xml
6.copy2(src, dst, *, follow_symlinks=True):拷贝文件数据和状态信息Copy data and all stat info ("cp -p src dst")
import shutil ret = shutil.copy2('newts.xml','2.xml') #返回目的文件名 print(ret)#2.xml
7.copytree(src, dst, symlinks=False, ignore=None, copy_function=copy2, ignore_dangling_symlinks=False):拷贝目录树,且目的目录不存在
注意:会将下面的文件一起拷贝,注意避免循环拷贝,不然报错,而且占据大量空间
import shutil ret = shutil.copytree("D:/MyPython/day24/基础回顾","D:/tree/")
8.move(src, dst, copy_function=copy2):移动文件,目的目录必须已存在
import shutil ret = shutil.move("1.xml","D:/MyPython/day24/基础回顾")
9.rmtree(path, ignore_errors=False, onerror=None):递归的删除目录下的文件已经目录
import shutil ret = shutil.rmtree("D:/MyPython/day24/基础回顾/02")
10.make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0, dry_run=0, owner=None, group=None, logger=None):
压缩文件,并返回文件路径:
'format' is the archive format: one of "zip", "tar", "gztar", "bztar", or "xztar". Or any other registered format.
def register_archive_format(name, function, extra_args=None, description=''): """Registers an archive format. name is the name of the format. function is the callable that will be used to create archives. If provided, extra_args is a sequence of (name, value) tuples that will be passed as arguments to the callable. description can be provided to describe the format, and will be returned by the get_archive_formats() function. """ if extra_args is None: extra_args = [] if not callable(function): raise TypeError('The %s object is not callable' % function) if not isinstance(extra_args, (tuple, list)): raise TypeError('extra_args needs to be a sequence') for element in extra_args: if not isinstance(element, (tuple, list)) or len(element) !=2: raise TypeError('extra_args elements are : (arg_name, value)') _ARCHIVE_FORMATS[name] = (function, extra_args, description)
base_name:压缩后的文件名
format:压缩格式(后缀会显示在文件名后面)
root_dir:要压缩的文件夹路径,必须是目录,默认是压缩当前文件夹。注意,若是没有base_dir压缩的文件中解压后全是文件(不含目录)
base_dir:会将目录一块压缩(不含根目录),而且可以指定文件进行压缩。优先级高于root_dir
owner:用户,默认当前用户
group:组,默认当前组
logger:用于记录日志,通常是logging.Logger对象
import shutil ret = shutil.make_archive("3","zip",base_dir="D:/MyPython/day24/基础回顾/03/1.xml") #返回目的文件名 print(ret)#3.zip 其中3.zip中目录结构是:MyPython/day24/基础回顾/03/1.xml
import shutil ret = shutil.make_archive("3","zip",root_dir="D:/MyPython/day24/基础回顾/03/") #返回目的文件名 print(ret)#D:\MyPython\day24\基础回顾\01装饰器\3.zip #其中3.zip中是root_dir目录下的文件及目录
import shutil ret = shutil.make_archive("3","zip",root_dir="D:/MyPython/day24/基础回顾/02/",base_dir="D:/MyPython/day24/基础回顾/03/2.xml") #返回目的文件名 print(ret)#D:\MyPython\day24\基础回顾\01装饰器\3.zip #其中内容会是 MyPython/day24/基础回顾/03/2.xml,几乎与root_dir无关了,但是加上root_dir会返回压缩文件的全路径
11.unpack_archive(filename, extract_dir=None, format=None):解压文件,filename:文件名,extract_dir:解压目录,默认当前,,format:默认文件后缀
import shutil shutil.unpack_archive("3.zip")
.....
九:configparser
configparser用于处理特定格式的文件(配置文件),其本质上是利用open来操作文件。
A configuration file consists of sections, lead by a "[section]" header, and followed by "name: value" entries, with continuations and such in the style of RFC 822.
如:my.ini中 [client] port=3306 [mysql] default-character-set=utf8 也支持key:val
# Example MySQL config file for small systems. # # This is for a system with little memory (<= 64M) where MySQL is only used # from time to time and it's important that the mysqld daemon # doesn't use much resources. # # MySQL programs look for option files in a set of # locations which depend on the deployment platform. # You can copy this option file to one of those # locations. For information about these locations, see: # http://dev.mysql.com/doc/mysql/en/option-files.html # # In this file, you can use all long options that a program supports. # If you want to know which options a program supports, run the program # with the "--help" option. # The following options will be passed to all MySQL clients [client] #password = your_password port = 3306 socket = /tmp/mysql.sock # Here follows entries for some specific programs # The MySQL server [mysqld] port = 3306 socket = /tmp/mysql.sock ;skip-external-locking key_buffer_size = 16K max_allowed_packet = 1M table_open_cache = 4 sort_buffer_size = 64K read_buffer_size = 256K read_rnd_buffer_size = 256K net_buffer_length = 2K thread_stack = 128K # Don't listen on a TCP/IP port at all. This can be a security enhancement, # if all processes that need to connect to mysqld run on the same host. # All interaction with mysqld must be made via Unix sockets or named pipes. # Note that using this option without enabling named pipes on Windows # (using the "enable-named-pipe" option) will render mysqld useless! # #skip-networking server-id = 1 # Uncomment the following if you want to log updates #log-bin=mysql-bin # binary logging format - mixed recommended #binlog_format=mixed # Causes updates to non-transactional engines using statement format to be # written directly to binary log. Before using this option make sure that # there are no dependencies between transactional and non-transactional # tables such as in the statement INSERT INTO t_myisam SELECT * FROM # t_innodb; otherwise, slaves may diverge from the master. #binlog_direct_non_transactional_updates=TRUE # Uncomment the following if you are using InnoDB tables #innodb_data_home_dir = C:\\mysql\\data\\ #innodb_data_file_path = ibdata1:10M:autoextend #innodb_log_group_home_dir = C:\\mysql\\data\\ # You can set .._buffer_pool_size up to 50 - 80 % # of RAM but beware of setting memory usage too high #innodb_buffer_pool_size = 16M #innodb_additional_mem_pool_size = 2M # Set .._log_file_size to 25 % of buffer pool size #innodb_log_file_size = 5M #innodb_log_buffer_size = 8M #innodb_flush_log_at_trx_commit = 1 #innodb_lock_wait_timeout = 50 [mysqldump] ;quick max_allowed_packet = 16M [mysql] ;no-auto-rehash # Remove the next comment character if you are not familiar with SQL #safe-updates [myisamchk] key_buffer_size = 8M sort_buffer_size = 8M [mysqlhotcopy] ;interactive-timeout
1.获取所有节点:
import configparser config = configparser.ConfigParser() config.read("my.ini",encoding="utf-8") ret = config.sections() #获取所有节点 print(ret) #['client', 'mysqld', 'mysqldump', 'mysql', 'myisamchk', 'mysqlhotcopy']
2.获取指定节点下的所有键值对:
import configparser config = configparser.ConfigParser() config.read("my.ini",encoding="utf-8") ret = config.items("client") print(ret) #[('port', '3306'), ('socket', '/tmp/mysql.sock')]
3.获取指定节点下的所有key:
ret = config.options("client") print(ret) #['port', 'socket']
4.获取指定节点下指定key的值:
ret = config.get("client","port") print(ret) #3306
5.检测,删除,添加节点:(注意:对于更新节点后,最后及时更新文件write)
import configparser config = configparser.ConfigParser() config.read("my.ini",encoding="utf-8") #检测 has_sec = config.has_section("client") print(has_sec) #True has_sec = config.has_section("client2") print(has_sec) #False #添加: config.add_section("client2") #已存在的节点再添加会出错 config.write(open("my.ini",'w')) #删除: config.remove_section("client2") config.write(open("my.ini",'w'))
6.检测,删除,添加节点下的键值对
import configparser config = configparser.ConfigParser() config.read("my.ini",encoding="utf-8") #检测 has_sec = config.has_option("client","port") print(has_sec) #True has_sec = config.has_option("client","port2") print(has_sec) #False #添加和更新: config.set("client","port2","1223") config.write(open("my.ini",'w')) #删除: config.remove_option("client","port2") config.write(open("my.ini",'w'))
十:logging日志模块
用于便捷记录日志且线程安全的模块
1.单文件日志
a.控制台流输出日志:
import logging import sys logging.basicConfig(stream=sys.stdout, format="%(asctime)s - %(name)s -%(levelname)s -%(module)s: %(message)s", datefmt='%Y-%m-%d %H:%M:%S %p', level=10) logging.debug("debug") logging.info("info") logging.warning("warning") logging.error('error') logging.critical('critical') logging.log(9,'log')
b.文件输出日志:
import logging logging.basicConfig(filename='log.log', format="%(asctime)s - %(name)s -%(levelname)s -%(module)s: %(message)s", datefmt='%Y-%m-%d %H:%M:%S %p', level=10) logging.debug("debug") logging.info("info") logging.warning("warning") logging.error('error') logging.critical('critical') logging.log(10,'log') #注意log(level,msg,...)
CRITICAL = 50 FATAL = CRITICAL ERROR = 40 WARNING = 30 WARN = WARNING INFO = 20 DEBUG = 10 NOTSET = 0
当前等级大于等于日志等级时,才会被记录
logging.basicConfig()函数中的具体参数:
filename: 指定的文件名创建FiledHandler,这样日志会被存储在指定的文件中;
filemode: 文件打开方式,在指定了filename时使用这个参数,默认值为“w”还可指定为“a”;
format: 指定handler使用的日志显示格式;
datefmt: 指定日期时间格式。,格式参考strftime时间格式化(下文)
level: 设置rootlogger的日志级别
stream: 用指定的stream创建StreamHandler。可以指定输出到sys.stderr,sys.stdout或者文件,默认为sys.stderr。
若同时列出了filename和stream两个参数,则stream参数会被忽略。
%(name)s |
Logger的名字 |
%(levelno)s |
数字形式的日志级别 |
%(levelname)s |
文本形式的日志级别 |
%(pathname)s |
调用日志输出函数的模块的完整路径名,可能没有 |
%(filename)s |
调用日志输出函数的模块的文件名 |
%(module)s |
调用日志输出函数的模块名 |
%(funcName)s |
调用日志输出函数的函数名 |
%(lineno)d |
调用日志输出函数的语句所在的代码行 |
%(created)f |
当前时间,用UNIX标准的表示时间的浮 点数表示 |
%(relativeCreated)d |
输出日志信息时的,自Logger创建以 来的毫秒数 |
%(asctime)s |
字符串形式的当前时间。默认格式是 “2003-07-08 16:49:45,896”。逗号后面的是毫秒 |
%(thread)d |
线程ID。可能没有 |
%(threadName)s |
线程名。可能没有 |
%(process)d |
进程ID。可能没有 |
%(message)s |
用户输出的消息 |
2.若是想让流和文件日志一块存在,需要向日志对象添加句柄logging.getLogger().addHandler(console)
若是想在控制台显示日志,需要加上:
import logging logging.basicConfig(filename='log.log', format="%(asctime)s - %(name)s -%(levelname)s -%(module)s: %(message)s", datefmt='%Y-%m-%d %H:%M:%S %p', level=10)
console = logging.StreamHandler() # 定义一个流handler,用于系统输出(控制台)
console.setLevel(logging.INFO) # 定义控制台输出日志级别
formatter = logging.Formatter('%(asctime)s %(filename)s : %(levelname)s %(message)s') #定义该日志格式
console.setFormatter(formatter) #设置格式
print(logging.getLogger()) #<logging.RootLogger object at 0x0000000001151DA0> #是一个日志对象
logging.getLogger().addHandler(console) # 获取当前日志对象,然后向该日志添加日志句柄(放入self.handlers列表中),在输出日志时,会去循环Logger对象中的self.handlers = []
def callHandlers(self, record): """ Pass a record to all relevant handlers. Loop through all handlers for this logger and its parents in the logger hierarchy. If no handler was found, output a one-off error message to sys.stderr. Stop searching up the hierarchy whenever a logger with the "propagate" attribute set to zero is found - that will be the last logger whose handlers are called. """ c = self found = 0 while c: for hdlr in c.handlers: found = found + 1 if record.levelno >= hdlr.level: hdlr.handle(record) if not c.propagate: c = None #break out else: c = c.parent if (found == 0): if lastResort: if record.levelno >= lastResort.level: lastResort.handle(record) elif raiseExceptions and not self.manager.emittedNoHandlerWarning: sys.stderr.write("No handlers could be found for logger" " \"%s\"\n" % self.name) self.manager.emittedNoHandlerWarning = True
logging.debug("debug") logging.info("info") logging.warning("warning") logging.error('error') logging.critical('critical') logging.log(9,'log')
若想输出多文件日志:
logging.basicConfig(stream=sys.stdout, format="%(asctime)s - %(name)s -%(levelname)s -%(module)s: %(message)s", datefmt='%Y-%m-%d %H:%M:%S %p', level=10)
file1 = logging.FileHandler("f1.log",'a',encoding="utf-8") fmt = logging.Formatter("%(asctime)s - %(name)s -%(levelname)s -%(module)s: %(message)s") file1.setFormatter(fmt) file2 = logging.FileHandler("f2.log",'a',encoding="utf-8") fmt = logging.Formatter("%(asctime)s - %(name)s -%(levelname)s -%(module)s: %(message)s") file2.setFormatter(fmt) logging.getLogger().addHandler(file1) logging.getLogger().addHandler(file2) logging.debug("debug") logging.info("info") logging.warning("warning") logging.error('error') logging.critical('critical') logging.log(9,'log')
十一:datetime和time模块
datetime:datetime是Python处理日期和时间的标准库。
1.获取当前日期:
>>> import datetime >>> dt = datetime.datetime.now() >>> dt datetime.datetime(2018, 4, 20, 21, 0, 29, 308843) >>> print(dt) 2018-04-20 21:00:29.308843 >>> type(dt) <class 'datetime.datetime'>
2.获取指定日期:
>>> dt = datetime.datetime(2018,4,12,12,24) >>> dt datetime.datetime(2018, 4, 12, 12, 24) >>> print(dt) 2018-04-12 12:24:00
3.datetime转时间戳timestamp
在计算机中,时间实际上是用数字表示的。我们把1970年1月1日 00:00:00 UTC+00:00时区的时刻称为epoch time,记为0(1970年以前的时间timestamp为负数),
当前时间就是相对于epoch time的秒数,称为timestamp。
timestamp = 0 = 1970-1-1 00:00:00 UTC+0:00
>>> dt.timestamp() 1523507040.0
4.timestamp时间戳转datetime
>>> datetime.datetime.fromtimestamp(st) datetime.datetime(2018, 4, 12, 12, 24)
5.str转datetime:转换方法是通过datetime.strptime()
实现,需要一个日期和时间的格式化字符串:
>>> datetime.datetime.strptime("2016-4-12 18:12:59",'%Y-%m-%d %H:%M:%S') datetime.datetime(2016, 4, 12, 18, 12, 59)
6.datetime转str
一般格式:直接print即可,因为datetime中内置__str__
>>> st = datetime.datetime.now().__str__() >>> st '2018-04-20 21:09:29.068715' >>> type(st) <class 'str'>
但是要进行格式转换,则需要strftime()
from datetime import datetime now = datetime.now() st = now.strftime("%Y-%m-%d") print(st) #2018-04-20
7.datetime进行加减
需要导入timedelta类
from datetime import datetime,timedelta now = datetime.now() now = now + timedelta(days=10) st = now.strftime("%Y-%m-%d") print(st) #2018-04-30
time
时间相关的操作,时间有三种表示方式:
- 时间戳 1970年1月1日之后的秒,即:time.time()
- 格式化的字符串 2018-04-20, 即:time.strftime('%Y-%m-%d')
- 结构化时间 元组包含了:年、日、星期等... time.struct_time 即:time.localtime()
time操作的主要函数
localtime([secs]) 将秒数转换为日期元组,不写秒数,默认现在时间
asctime([tuple]) 将时间元组转换为字符串
mktime([tuple]) 将时间元组转换为秒数,与localtime相反
sleep(secs) 休眠
strptime 字符串解析为时间元组
strftime 将时间格式为字符串
time() 当前时间戳
>>> import time >>> time.time() 1524230368.0729766 >>> time.strftime("%Y-%m-%d") '2018-04-20' >>> time.localtime() #返回一个日期结构元组,可用下面的asctime进行转换 time.struct_time(tm_year=2018, tm_mon=4, tm_mday=20, tm_hour=21, tm_min=20, tm_s ec=21, tm_wday=4, tm_yday=110, tm_isdst=0) >>> time.asctime() 'Fri Apr 20 21:21:47 2018'
Convert a time tuple to a string, e.g. 'Sat Jun 06 16:26:11 1998'. When the time tuple is not present, current time as returned by localtime() is used.
>>> time.strptime("2017-01-02","%Y-%m-%d") #字符串转换为时间格式 time.struct_time(tm_year=2017, tm_mon=1, tm_mday=2, tm_hour=0, tm_min=0, tm_sec 0, tm_wday=0, tm_yday=2, tm_isdst=-1)
%Y Year with century as a decimal number. %m Month as a decimal number [01,12]. %d Day of the month as a decimal number [01,31]. %H Hour (24-hour clock) as a decimal number [00,23]. %M Minute as a decimal number [00,59]. %S Second as a decimal number [00,61]. %z Time zone offset from UTC. %a Locale's abbreviated weekday name. %A Locale's full weekday name. %b Locale's abbreviated month name. %B Locale's full month name. %c Locale's appropriate date and time representation. %I Hour (12-hour clock) as a decimal number [01,12]. %p Locale's equivalent of either AM or PM.
注意:
datetime表示的时间需要时区信息才能确定一个特定的时间,否则只能视为本地时间。
如果要存储datetime,最佳方法是将其转换为timestamp再存储,因为timestamp的值与时区完全无关。
十二:paramiko
paramiko是一个用于做远程控制的模块,使用该模块可以对远程服务器进行命令或文件操作,值得一说的是,fabric和ansible内部的远程管理就是使用的paramiko来现实。
import paramiko ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) ssh.connect('192.168.218.128', 22, 'adminld', 'adminld') stdin, stdout, stderr = ssh.exec_command('df') print(stdout.read()) ssh.close()
import paramiko private_key_path = '/home/auto/.ssh/id_rsa' key = paramiko.RSAKey.from_private_key_file(private_key_path) ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) ssh.connect('主机名 ', 端口, '用户名', key) stdin, stdout, stderr = ssh.exec_command('df') print stdout.read() ssh.close()
注意:下面的上传和下载文件,都必须写文件路径,而不是目录
import os,sys import paramiko t = paramiko.Transport(('192.168.218.128',22)) t.connect(username='adminld',password='adminld') sftp = paramiko.SFTPClient.from_transport(t) sftp.put('/tmp/test.py','/tmp/test.py') t.close() import os,sys import paramiko t = paramiko.Transport(('192.168.218.128',22)) t.connect(username='adminld',password='adminld') sftp = paramiko.SFTPClient.from_transport(t) sftp.get('/tmp/test.py','/tmp/test2.py') t.close()
import paramiko private_key_path = 'pwd' key = paramiko.RSAKey.from_private_key_file(private_key_path,password="adminld") t = paramiko.Transport(('192.168.218.128',22)) t.connect(username='adminld',pkey=key) sftp = paramiko.SFTPClient.from_transport(t) sftp.put('/tmp/test.py','/tmp/test.py') t.close() import paramiko private_key_path = 'pwd' key = paramiko.RSAKey.from_private_key_file(private_key_path,password="adminld") t = paramiko.Transport(('192.168.218.128',22)) t.connect(username='adminld',pkey=key) sftp = paramiko.SFTPClient.from_transport(t) sftp.get('/tmp/test.py','/tmp/test2.py') t.close()
推文:使用ssh-keygen和ssh-copy-id三步实现SSH无密码登录
推文:在Windows下通过密钥认证机制连接Linux服务器的方法
推文:图解公钥与私钥
https://www.linuxprobe.com/public-private-key.html
十三:requests
urllib内置模块的使用:
urllib提供的功能就是利用程序去执行各种HTTP请求。如果要模拟浏览器完成特定功能,需要把请求伪装成浏览器。伪装的方法是先监控浏览器发出的请求,再根据浏览器的请求头来伪装,User-Agent
头就是用来标识浏览器的。
Get:豆瓣的一个URLhttps://api.douban.com/v2/book/2129650
from urllib import request with request.urlopen('https://api.douban.com/v2/book/2129650') as f: data = f.read() print('Status:', f.status, f.reason) for k, v in f.getheaders(): #获取请求头 print('%s: %s' % (k, v)) print('Data:', data.decode('utf-8')) #获取响应数据
Status: 200 OK Date: Fri, 20 Apr 2018 14:30:44 GMT Content-Type: application/json; charset=utf-8 Content-Length: 2138 Connection: close Vary: Accept-Encoding X-Ratelimit-Remaining2: 99 X-Ratelimit-Limit2: 100 Expires: Sun, 1 Jan 2006 01:00:00 GMT Pragma: no-cache Cache-Control: must-revalidate, no-cache, private Set-Cookie: bid=sRgrHMHx5oM; Expires=Sat, 20-Apr-19 14:30:44 GMT; Domain=.douban.com; Path=/ X-DOUBAN-NEWBID: sRgrHMHx5oM X-DAE-Node: dis8 X-DAE-App: book Server: dae Data: {"rating":{"max":10,"numRaters":16,"average":"7.4","min":0},"subtitle":"","author":["廖雪峰"],"pubdate":"2007","tags":[{"count":21,"name":"spring","title":"spring"},{"count":13,"name":"Java","title":"Java"},{"count":6,"name":"javaee","title":"javaee"},{"count":5,"name":"j2ee","title":"j2ee"},{"count":4,"name":"计算机","title":"计算机"},{"count":4,"name":"编程","title":"编程"},{"count":3,"name":"藏书","title":"藏书"},{"count":3,"name":"POJO","title":"POJO"}],"origin_title":"","image":"https://img3.doubanio.com\/view\/subject\/m\/public\/s2552283.jpg","binding":"平装","translator":[],"catalog":"","pages":"509","images":{"small":"https://img3.doubanio.com\/view\/subject\/s\/public\/s2552283.jpg","large":"https://img3.doubanio.com\/view\/subject\/l\/public\/s2552283.jpg","medium":"https://img3.doubanio.com\/view\/subject\/m\/public\/s2552283.jpg"},"alt":"https:\/\/book.douban.com\/subject\/2129650\/","id":"2129650","publisher":"电子工业出版社","isbn10":"7121042622","isbn13":"9787121042621","title":"Spring 2.0核心技术与最佳实践","url":"https:\/\/api.douban.com\/v2\/book\/2129650","alt_title":"","author_intro":"","summary":"本书注重实践而又深入理论,由浅入深且详细介绍了Spring 2.0框架的几乎全部的内容,并重点突出2.0版本的新特性。本书将为读者展示如何应用Spring 2.0框架创建灵活高效的JavaEE应用,并提供了一个真正可直接部署的完整的Web应用程序——Live在线书店(http:\/\/www.livebookstore.net)。\n在介绍Spring框架的同时,本书还介绍了与Spring相关的大量第三方框架,涉及领域全面,实用性强。本书另一大特色是实用性强,易于上手,以实际项目为出发点,介绍项目开发中应遵循的最佳开发模式。\n本书还介绍了大量实践性极强的例子,并给出了完整的配置步骤,几乎覆盖了Spring 2.0版本的新特性。\n本书适合有一定Java基础的读者,对JavaEE开发人员特别有帮助。本书既可以作为Spring 2.0的学习指南,也可以作为实际项目开发的参考手册。","price":"59.8"}
可以设置请求头:来仿造信息(模拟手机登录)
from urllib import request req = request.Request('http://www.douban.com/') req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25') with request.urlopen(req) as f: print('Status:', f.status, f.reason) for k, v in f.getheaders(): print('%s: %s' % (k, v)) print('Data:', f.read().decode('utf-8'))
POST:如果要以POST发送一个请求,只需要把参数data
以bytes形式传入。
我们模拟一个微博登录,先读取登录的邮箱和口令,然后按照weibo.cn的登录页的格式以username=xxx&password=xxx
的编码传入:
from urllib import request, parse print('Login to weibo.cn...') email = input('Email: ') passwd = input('Password: ') login_data = parse.urlencode([ ('username', email), ('password', passwd), ('entry', 'mweibo'), ('client_id', ''), ('savestate', '1'), ('ec', ''), ('pagerefer', 'https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F') ]) req = request.Request('https://passport.weibo.cn/sso/login') req.add_header('Origin', 'https://passport.weibo.cn') req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25') req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F') with request.urlopen(req, data=login_data.encode('utf-8')) as f: print('Status:', f.status, f.reason) for k, v in f.getheaders(): print('%s: %s' % (k, v)) print('Data:', f.read().decode('utf-8'))
但是urllib模块并非太好,这里使用requests
requests使用:
是使用 Apache2 Licensed 许可证的 基于Python开发的HTTP 库,其在Python内置模块的基础上进行了高度的封装,从而使得Pythoner进行网络请求时,变得美好了许多,使用Requests可以轻而易举的完成浏览器可有的任何操作。
import requests #无参 ret = requests.get('https://github.com/timeline.json') #有参 payload = {'key1': 'value1', 'key2': 'value2'} ret = requests.get("http://httpbin.org/get", params=payload) print(ret.url) print(ret.text)
# 1、基本POST实例 import requests payload = {'key1': 'value1', 'key2': 'value2'} ret = requests.post("http://httpbin.org/post", data=payload) print(ret.text) # 2、发送请求头和数据实例 import requests import json url = 'https://api.github.com/some/endpoint' payload = {'some': 'data'} headers = {'content-type': 'application/json'} ret = requests.post(url, data=json.dumps(payload), headers=headers) print(ret.text) print(ret.cookies)
requests.get(url, params=None, **kwargs) requests.post(url, data=None, json=None, **kwargs) requests.put(url, data=None, **kwargs) requests.head(url, **kwargs) requests.delete(url, **kwargs) requests.patch(url, data=None, **kwargs) requests.options(url, **kwargs) # 以上方法均是在此方法的基础上构建 requests.request(method, url, **kwargs)