常用模块

一、time模块

1.时间戳(timestamp)：通常来说，时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。我们运行“type(time.time())”，返回的是float类型

print(time.time())时间戳

2.格式化的时间字符串(Format String)

#格式化的字符串
# print(time.strftime('%Y-%m-%d %H:%M:%S'))
# print(time.strftime('%Y-%m-%d %X'))

3.结构化的时间(struct_time)：struct_time元组共有9个元素共九个元素:(年，月，日，时，分，秒，一年中第几周，一年中第几天，夏令时)

结构化的时间
# print(time.localtime())
# print(time.localtime().tm_year)
# print(time.gmtime())
print(time.localtime(1496807630))#将unixtime时间转换成各式话时间
结果：
time.struct_time(tm_year=2017, tm_mon=6, tm_mday=7, tm_hour=11, tm_min=53, tm_sec=50, tm_wday=2, tm_yday=158, tm_isdst=0)

print(time.strftime('%Y %X',time.localtime()))

结果：
2017 12:01:41

print(time.strptime('2017-06-04 11:59:59','%Y-%m-%d %X'))
结果：
time.struct_time(tm_year=2017, tm_mon=6, tm_mday=4, tm_hour=11, tm_min=59, tm_sec=59, tm_wday=6, tm_yday=155, tm_isdst=-1)

print(time.ctime(123123132))
结果：Mon Nov 26 08:52:12 1973
print(time.asctime(time.localtime()))
结果：Wed Jun  7 12:03:44 2017

二、random模块　

从列表中随机选出一个ip地址，应用场景爬虫
import random
proxy_ip=[
    '1.1.1.1',
    '1.1.1.2',
    '1.1.1.3',
    '1.1.1.4',
]
ip=random.choice(proxy_ip)
print(ip)

# import random
# print(random.sample([1,'23',[4,5]],2))#随机做两个组合
# 结果：[1, '23']

#验证码
def v_code(n=5):
    res=''
    for i in range(n):#n长度
        num=random.randint(0,9)#0-9的数字
        s=chr(random.randint(65,90))#65-90是a到z的asell码
        add=random.choice([num,s])#从num和s中选出一个
        res+=str(add)#拼接字符
    return res

print(v_code(6))

import random
 
print(random.random())#(0,1)----float    大于0且小于1之间的小数
 
print(random.randint(1,3))  #[1,3]    大于等于1且小于等于3之间的整数
 
print(random.randrange(1,3)) #[1,3)    大于等于1且小于3之间的整数
 
print(random.choice([1,'23',[4,5]]))#1或者23或者[4,5]
 
print(random.sample([1,'23',[4,5]],2))#列表元素任意2个组合
 
print(random.uniform(1,3))#大于1小于3的小数，如1.927109612082716 
 
 
item=[1,3,5,7,9]
random.shuffle(item) #打乱item的顺序,相当于"洗牌"
print(item)

三、os模块

与操作系统交互

os.getcwd() 获取当前工作目录，即当前python脚本工作的目录路径
os.chdir("dirname")  改变当前脚本工作目录；相当于shell下cd
os.curdir  返回当前目录: ('.')
os.pardir  获取当前目录的父目录字符串名：('..')
os.makedirs('dirname1/dirname2')    可生成多层递归目录
os.removedirs('dirname1')    若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
os.mkdir('dirname')    生成单级目录；相当于shell中mkdir dirname
os.rmdir('dirname')    删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
os.listdir('dirname')    列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
os.remove()  删除一个文件
os.rename("oldname","newname")  重命名文件/目录
os.stat('path/filename')  获取文件/目录信息
os.sep    输出操作系统特定的路径分隔符，win下为"\\",Linux下为"/"
os.linesep    输出当前平台使用的行终止符，win下为"\t\n",Linux下为"\n"
os.pathsep    输出用于分割文件路径的字符串 win下为;,Linux下为:
os.name    输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.system("bash command")  运行shell命令，直接显示
os.environ  获取系统环境变量
os.path.abspath(path)  返回path规范化的绝对路径
os.path.split(path)  将path分割成目录和文件名二元组返回
os.path.dirname(path)  返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path)  返回path最后的文件名。如何path以／或\结尾，那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path)  如果path存在，返回True；如果path不存在，返回False
os.path.isabs(path)  如果path是绝对路径，返回True
os.path.isfile(path)  如果path是一个存在的文件，返回True。否则返回False
os.path.isdir(path)  如果path是一个存在的目录，则返回True。否则返回False
os.path.join(path1[, path2[, ...]])  将多个路径组合后返回，第一个绝对路径之前的参数将被忽略
os.path.getatime(path)  返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path)  返回path所指向的文件或者目录的最后修改时间
os.path.getsize(path) 返回path的大小

os路径处理
#方式一：推荐使用
import os
#具体应用
import os,sys
#获取项目的跟目录的两种方式
possible_topdir = os.path.normpath(os.path.join(
    os.path.abspath(__file__),
    os.pardir, #上一级
    os.pardir,
    os.pardir
))
sys.path.insert(0,possible_topdir)


#方式二：不推荐使用
os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

四、sys模块　

1 sys.argv           命令行参数List，第一个元素是程序本身路径
2 sys.exit(n)        退出程序，正常退出时exit(0)
3 sys.version        获取Python解释程序的版本信息
4 sys.maxint         最大的Int值
5 sys.path           返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
6 sys.platform       返回操作系统平台名称

import sys,time

for i in range(50):
    sys.stdout.write('%s\r' %('#'*i))
    sys.stdout.flush()
    time.sleep(0.1)

'''
注意：在pycharm中执行无效，请到命令行中以脚本的方式执行
'''

五 shutil模块

1.将文件内容拷贝到另一个文件中

import shutil

shutil.copyfileobj(open('old.xml','r'), open('new.xml', 'w'))

2.拷贝文件

shutil.copyfile('f1.log', 'f2.log') #目标文件无需存在

3.仅拷贝权限。内容、组、用户均不变

shutil.copymode('f1.log', 'f2.log') #目标文件必须存在

4.shutil.copystat(src, dst)
仅拷贝状态的信息，包括：mode bits, atime, mtime, flags　

shutil.copystat('f1.log', 'f2.log') #目标文件必须存在

5.shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的，详细：　

import tarfile

# 压缩
>>> t=tarfile.open('/tmp/egon.tar','w')
>>> t.add('/test1/a.py',arcname='a.bak')
>>> t.add('/test1/b.py',arcname='b.bak')
>>> t.close()


# 解压
>>> t=tarfile.open('/tmp/egon.tar','r')
>>> t.extractall('/egon')
>>> t.close()

tarfile压缩解压缩

六 json&pickle模块

1.json

1.1json在python中的应用

在python中有些时候eval也可和json.load同样的功能但是遇到特殊的字符入Null，eval就无法转换了这时候就必须用json模块。

import json
x="[null,true,false,1]"
print(eval(x)) #报错，无法解析null类型，而json就可以
print(json.loads(x))

1.2 什么是序列化

我们把对象(变量)从内存中变成可存储或传输的过程称之为序列化，在Python中叫pickling，在其他语言中也被称之为serialization，marshalling，flattening等等，都是一个意思。

import json
#序列化的过程：dic---->res=json.dumps(dic)---->f.write(res)
dic={
    'name':'alex',
    'age':9000,
    'height':'150cm',
}

res=json.dumps(dic)
print(res,type(res))
with open('a.json','w') as f:
    f.write(res)

1.2 为什么要序列化

1：持久保存状态

需知一个软件/程序的执行就在处理一系列状态的变化，在编程语言中，'状态'会以各种各样有结构的数据类型(也可简单的理解为变量)的形式被保存在内存中。

内存是无法永久保存数据的，当程序运行了一段时间，我们断电或者重启程序，内存中关于这个程序的之前一段时间的数据（有结构）都被清空了。

在断电或重启程序之前将程序当前内存中所有的数据都保存下来（保存到文件中），以便于下次程序执行能够从文件中载入之前的数据，然后继续执行，这就是序列化。

具体的来说，你玩使命召唤闯到了第13关，你保存游戏状态，关机走人，下次再玩，还能从上次的位置开始继续闯关。或如，虚拟机状态的挂起等。

2：跨平台数据交互

序列化之后，不仅可以把序列化后的内容写入磁盘，还可以通过网络传输到别的机器上，如果收发的双方约定好实用一种序列化的格式，那么便打破了平台/语言差异化带来的限制，实现了跨平台数据交互。

反过来，把变量内容从序列化的对象重新读到内存里称之为反序列化，即unpickling。

json

如果我们要在不同的编程语言之间传递对象，就必须把对象序列化为标准格式，比如XML，但更好的方法是序列化为JSON，因为JSON表示出来就是一个字符串，可以被所有语言读取，也可以方便地存储到磁盘或者通过网络传输。JSON不仅是标准格式，并且比XML更快，而且可以直接在Web页面中读取，非常方便。

JSON表示的对象就是标准的JavaScript语言的对象，JSON和Python内置的数据类型对应如下：

import json
#反序列化的过程：res=f.read()---->res=json.loads(res)---->dic=res
with open('a.json','r') as f:
    dic=json.loads(f.read())
    print(dic,type(dic))
    print(dic['name'])

json操作　

#json的便捷操作
import json
dic={
    'name':'alex',
    'age':9000,
    'height':'150cm',
}
json.dump(dic,open('b.json','w'))

2.pickle

import pickle
 
dic={'name':'alvin','age':23,'sex':'male'}
 
print(type(dic))#<class 'dict'>
 
j=pickle.dumps(dic)
print(type(j))#<class 'bytes'>
 
 
f=open('序列化对象_pickle','wb')#注意是w是写入str,wb是写入bytes,j是'bytes'
f.write(j)  #-------------------等价于pickle.dump(dic,f)
 
f.close()
#-------------------------反序列化

import pickle
f=open('序列化对象_pickle','rb')
 
data=pickle.loads(f.read())#  等价于data=pickle.load(f)
 
print(data['age'])

pickle.load(open('c.pkl','rb'))

json 和pickle对比

import pickle

# dic={'name':'alex','age':13}

# print(pickle.dumps(dic))
# with open('a.pkl','wb') as f:
#     f.write(pickle.dumps(dic))

# with open('a.pkl','rb') as f:
#     d=pickle.loads(f.read())
#     print(d,type(d))



# dic={'name':'alex','age':13}
# pickle.dump(dic,open('b.pkl','wb'))
# res=pickle.load(open('b.pkl','rb'))
# print(res,type(res))


#
import json
import pickle
# def func():
#     print('from func')

# json.dumps(func)# 报错，json不支持python的函数类型
# f=pickle.dumps(func)
# print(f)

# pickle.dump(func,open('c.pkl','wb'))
# res=pickle.load(open('c.pkl','rb'))
# print(res)
# res()

　 Pickle的问题和所有其他编程语言特有的序列化问题一样，就是它只能用于Python，并且可能不同版本的Python彼此都不兼容，因此，只能用Pickle保存那些不重要的数据，不能成功地反序列化也没关系　　

七 shelve模块

import shelve

f=shelve.open(r'sheve.txt')
f['fff']={'name':'egon','age':18,'height':'180cm'}
print(f['fff']['name'])
f.close()

八 xml模块

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

xml数据

# print(root.iter('year')) #全文搜索
# print(root.find('country')) #在root的子节点找，只找一个
# print(root.findall('country')) #在root的子节点找，找所有

import xml.etree.ElementTree as ET
 
tree = ET.parse("xmltest.xml")
root = tree.getroot()
print(root.tag)
 
#遍历xml文档
for child in root:
    print('========>',child.tag,child.attrib,child.attrib['name'])
    for i in child:
        print(i.tag,i.attrib,i.text)
 
#只遍历year 节点
for node in root.iter('year'):
    print(node.tag,node.text)
#---------------------------------------

import xml.etree.ElementTree as ET
 
tree = ET.parse("xmltest.xml")
root = tree.getroot()
 
#修改
for node in root.iter('year'):
    new_year=int(node.text)+1
    node.text=str(new_year)
    node.set('updated','yes')
    node.set('version','1.0')
tree.write('test.xml')
 

 
#删除node
for country in root.findall('country'):
   rank = int(country.find('rank').text)
   if rank > 50:
     root.remove(country)
 
tree.write('output.xml')

#在country内添加（append）节点year2
import xml.etree.ElementTree as ET
tree = ET.parse("a.xml")
root=tree.getroot()
for country in root.findall('country'):
    for year in country.findall('year'):
        if int(year.text) > 2000:
            year2=ET.Element('year2')
            year2.text='新年'
            year2.attrib={'update':'yes'}
            country.append(year2) #往country节点下添加子节点

tree.write('a.xml.swap')

创建xml
import xml.etree.ElementTree as ET
 
 
new_xml = ET.Element("namelist")
name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
age = ET.SubElement(name,"age",attrib={"checked":"no"})
sex = ET.SubElement(name,"sex")
sex.text = '33'
name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
age = ET.SubElement(name2,"age")
age.text = '19'
 
et = ET.ElementTree(new_xml) #生成文档对象
et.write("test.xml", encoding="utf-8",xml_declaration=True)
 
ET.dump(new_xml) #打印生成的格式

九 configparser模块

import configparser

config=configparser.ConfigParser()
config.read('a.cfg')


#删除整个标题section2
config.remove_section('section2')

#删除标题section1下的某个k1和k2
config.remove_option('section1','k1')
config.remove_option('section1','k2')

#判断是否存在某个标题
print(config.has_section('section1'))

#判断标题section1下是否有user
print(config.has_option('section1',''))


#添加一个标题
config.add_section('egon')

#在标题egon下添加name=egon,age=18的配置
config.set('egon','name','egon')
config.set('egon','age',18) #报错,必须是字符串


#最后将修改的内容写入文件,完成最终的修改
config.write(open('a.cfg','w'))

test.ini配置文件

# 注释1
; 注释2

[section1]
k1 = v1
k2:v2
db=pymysql+mysql://egon:123@192.168.2.3/db1
max_conn=30
enable=0
[section2]
k1 = v1

1.获取所以节点　

import configparser
config=configparser.ConfigParser()
config.read('test.ini')#获取所以节点
print(config.sections())
结果：['section1', 'section2']

2.获取节点下的所以键值对

import configparser
config=configparser.ConfigParser()
config.read('a.ini',encoding='utf-8')
res=config.items('section1')
print(res)
结果：[('k1', 'v1'), ('k2', 'v2'), ('db', 'pymysql+mysql://egon:123@192.168.2.3/db1'), ('max_conn', '30'), ('enable', '0')]

3.获取指定节点下的所有key

config=configparser.ConfigParser()
config.read('a.ini',encoding='utf-8')
res=config.options('section1')
print(res)
结果：['k1', 'k2', 'db', 'max_conn', 'enable']

4 获取指定节点下指定key的值

import configparser
config=configparser.ConfigParser()
config.read('test.ini',encoding='utf-8')
res1=config.get('bitbucket.org','user')

res2=config.getint('topsecret.server.com','port')
res3=config.getfloat('topsecret.server.com','port')
res4=config.getboolean('topsecret.server.com','ForwardX11')

print(res1)
print(res2)
print(res3)
print(res4)

'''
打印结果:
hg
50022.0
False
'''

5 检查、删除、添加节点

import configparser
#检查节点
config=configparser.ConfigParser()
config.read('a.ini',encoding='utf-8')
has_sec=config.has_section('section1')检查是否存在节点

print(config.has_option('section1','enable'))检查某个节点下是否存在key

print(has_sec)#

#添加修改
config.add_section('egon')
config.set('egon','han','18')#如果存在修改
config['egon']['han']='28'
config.write(open('m.ini','w'))

#删除节点
config.remove_section('egon')
config.write(open('test.ini','w'))

6 检查、删除、设置指定组内的键值对　

import configparser
config=configparser.ConfigParser()
config.read('m.ini',encoding='utf-8')

#检查
has_sec=config.has_option('section2','k1') #bsection2下有一个键k1
print(has_sec) #打印True

#删除节点下的key
config.remove_option('section2','k1')删除section2节点下的k1
config.write(open('m.ini','w'))

#设置
config.set('section1','k1','han')#设置section1节点下k1 的值位han
config.write(open('m.ini','w'))

十 hashlib模块

hash：一种算法 ,3.x里代替了md5模块和sha模块，主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ，MD5 算法
三个特点：
1.内容相同则hash运算结果相同，内容稍微改变则hash值则变
2.不可逆推
3.相同算法：无论校验多长的数据，得到的哈希值长度固定。

1.加密

m=hashlib.md5()
m.update('123456'.encode('utf-8'))
print(m.hexdigest())

得到的md5值：e10adc3949ba59abbe56e057f20f883e

import hashlib
m=hashlib.md5()
m.update('12'.encode('utf-8'))#累加进去的
m.update('3456'.encode('utf-8'))
print(m.hexdigest()) #e10adc3949ba59abbe56e057f20f883e

2.将文件内容加密

import hashlib
m=hashlib.md5()
with open('md','rb') as f:
    for line in f:
        print(line.strip())
        m.update(line.strip())#相当于把文档的内容追加是累加
        md5_num=m.hexdigest()
        print(md5_num)#

结果：
b'1'
c4ca4238a0b923820dcc509a6f75849b
b'2'
c20ad4d76fe97759aa27a0c99bff6710
b'3'
202cb962ac59075b964b07152d234b70
b'4'
81dc9bdb52d04dc20036dbd8313ed055
b'5'
827ccb0eea8a706c4c34a16891f84e7b
b'6'
e10adc3949ba59abbe56e057f20f883e

3.sha256 hashlib的升级比原有的为数要长

import hashlib
s=hashlib.sha256()
s.update('123456'.encode('utf-8'))
print(s.hexdigest())

以上加密算法虽然依然非常厉害，但时候存在缺陷，即：通过撞库可以反解。所以，有必要对加密算法中添加自定义key再来做加密。

4.撞库

import hashlib
passwod=['123456','12345678','123456789']
def make_pass(passwod):
    dic={}
    for pwd in passwod:
        m=hashlib.md5()
        m.update(pwd.encode('utf-8'))
        dic[pwd]=m.hexdigest()
    return dic

def code(c,pwd_dic):
    for k,v in pwd_dic.items():
        if v ==c:
            print('密码是%s'%k)
code('e10adc3949ba59abbe56e057f20f883e',make_pass(passwod))

十一 suprocess模块

运行python的时候，我们都是在创建并运行一个进程。像Linux进程那样，一个进程可以fork一个子进程，并让这个子进程exec另外一个程序。在Python中，我们通过标准库中的subprocess包来fork一个子进程，并运行一个外部的程序。
subprocess包中定义有数个创建子进程的函数，这些函数分别以不同的方式创建子进程，所以我们可以根据需要来从中选取一个使用。另外subprocess还提供了一些管理标准流(standard stream)和管道(pipe)的工具，从而在进程间使用文本通信。

# import subprocess
#
# res=subprocess.Popen('dir',shell=True,stdout=subprocess.PIPE)
# print(res)
# print(res.stdout.read().decode('gbk'))


import subprocess

# res=subprocess.Popen('diasdfasdfr',shell=True,
#                      stderr=subprocess.PIPE,
#                      stdout=subprocess.PIPE)

# print('=====>',res.stdout.read())
# print('=====>',res.stderr.read().decode('gbk'))


#ls |grep txt$
res1=subprocess.Popen(r'dir E:\wupeiqi\s17\day06',shell=True,stdout=subprocess.PIPE)
# print(res1.stdout.read())

res=subprocess.Popen(r'findstr txt*',shell=True,
                     stdin=res1.stdout,
                     stderr=subprocess.PIPE,
                     stdout=subprocess.PIPE)

print('===>',res.stdout.read().decode('gbk'))#管道取一次就空了
print('===>',res.stdout.read().decode('gbk'))
print('===>',res.stdout.read().decode('gbk'))
print('===>',res.stdout.read().decode('gbk'))
print('===>',res.stdout.read().decode('gbk'))
print('===>',res.stdout.read().decode('gbk'))
print('===>',res.stdout.read().decode('gbk'))

十二 logging模块

import logging
'''
一:如果不指定filename,则默认打印到终端
二:指定日志级别:
    指定方式:
        1:level=10
        2:level=logging.ERROR

    日志级别种类:
        CRITICAL = 50
        FATAL = CRITICAL
        ERROR = 40
        WARNING = 30
        WARN = WARNING
        INFO = 20
        DEBUG = 10
        NOTSET = 0

三:指定日志级别为ERROR,则只有ERROR及其以上级别的日志会被打印
'''


logging.basicConfig(filename='access.log',
                    format='%(asctime)s - %(name)s - %(levelname)s -%(module)s:  %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S %p',
                    level=10)

logging.debug('debug')
logging.info('info')
logging.warning('warning')
logging.error('error')
logging.critical('critical')
logging.log(10,'log') #如果level=40,则只有logging.critical和loggin.error的日志会被打印

可在logging.basicConfig()函数中通过具体参数来更改logging模块默认行为，可用参数有
filename：用指定的文件名创建FiledHandler（后边会具体讲解handler的概念），这样日志会被存储在指定的文件中。
filemode：文件打开方式，在指定了filename时使用这个参数，默认值为“a”还可指定为“w”。
format：指定handler使用的日志显示格式。
datefmt：指定日期时间格式。
level：设置rootlogger（后边会讲解具体概念）的日志级别
stream：用指定的stream创建StreamHandler。可以指定输出到sys.stderr,sys.stdout或者文件，默认为sys.stderr。若同时列出了filename和stream两个参数，则stream参数会被忽略。点击查看更详细

日志格式

%(name)s	Logger的名字，并非用户名，详细查看
%(levelno)s	数字形式的日志级别
%(levelname)s	文本形式的日志级别
%(pathname)s	调用日志输出函数的模块的完整路径名，可能没有
%(filename)s	调用日志输出函数的模块的文件名
%(module)s	调用日志输出函数的模块名
%(funcName)s	调用日志输出函数的函数名
%(lineno)d	调用日志输出函数的语句所在的代码行
%(created)f	当前时间，用UNIX标准的表示时间的浮点数表示
%(relativeCreated)d	输出日志信息时的，自Logger创建以来的毫秒数
%(asctime)s	字符串形式的当前时间。默认格式是 “2003-07-08 16:49:45,896”。逗号后面的是毫秒
%(thread)d	线程ID。可能没有
%(threadName)s	线程名。可能没有
%(process)d	进程ID。可能没有
%(message)s	用户输出的消息

十三 re模块

一：什么是正则？

　正则就是用一些具有特殊含义的符号组合到一起（称为正则表达式）来描述字符或者字符串的方法。或者说：正则就是用来描述一类事物的规则。（在Python中）它内嵌在Python中，并通过 re 模块实现。正则表达式模式被编译成一系列的字节码，然后由用 C 编写的匹配引擎执行。

生活中处处都是正则：

比如我们描述：4条腿

　　你可能会想到的是四条腿的动物或者桌子，椅子等

继续描述：4条腿，活的

就只剩下四条腿的动物这一类了

二：常用匹配模式(元字符)

1.\w匹配字符数字下划线
    print(re.findall('\w','as213df_*|'))
    
结果：
['a', 's', '2', '1', '3', 'd', 'f', '_']

2.\W匹配非字符数字下划线
    print(re.findall('\W','as213df_*|'))
结果：
['*', '|']

3.print(re.findall('a\wb','a_b a3b aEb a*b'))#匹配a b直接的匹配字符数字下划线
结果：
['a_b', 'a3b', 'aEb']

4.匹配任意空白字符 等价于【\t\n\r\f】
print(re.findall('\s','a b\nc\td'))
结果
[' ', '\n', '\t']
5.\S匹配任意非空字符
print(re.findall('\S','a b\nc\td'))
结果
['a', 'b', 'c', 'd']
6.匹配任意数字，等价于[0-9]
print(re.findall('\d','a123bcdef'))
结果
['1', '2', '3']
7.\D匹配任意非数字
print(re.findall('\D','a123bcdef'))
结果
['a', 'b', 'c', 'd', 'e', 'f']
8.匹配字符\n
print(re.findall('\n','a123\nbc\ndef'))
结果
['\n', '\n']
9.匹配\t
print(re.findall('\t','a123\tbc\td\tef'))
结果
['\t', '\t', '\t']
10.匹配字符 ‘h’
print(re.findall('^h','hello egon hao123'))
结果
[‘h’]

print(re.findall('^h','hello egon hao123'))
结果
[]
11.匹配字符串末尾
print(re.findall('3$','e3ll3o e3gon hao123'))
结果
['3']
12 匹配任意字符除了换行
#匹配a和b中件除了换行的任意字符
print(re.findall('a.c','abc a1c a*c a|c abd aed ac'))
结果
['abc', 'a1c', 'a*c', 'a|c']
print(re.findall('a.c','abc a1c a*c a|c abd aed a\nc',re.S)) #让点能够匹配到换行符
结果
['abc', 'a1c', 'a*c', 'a|c', 'a\nc']

13.
#匹配a开头包含“1”,"2",“\n” 并且以c结尾的字符
print(re.findall('a[1,2\n]c','a2c a,c abc a1c a*c a|c abd aed a\nc'))
结果
['a2c', 'a,c', 'a1c', 'a\nc']
14.匹配a开头包含0-9并且c结尾的字符

print(re.findall('a[0-9]c','a2c a11c abc a1c a*c a|c abd aed a\nc'))

结果['a2c', 'a1c']

15.匹配开头a包含0-9 a-z A-Z * -并且以c结尾的字符

print(re.findall('a[0-9a-zA-Z*-]c','a1c abc a*c a-c aEc a-1c'))
结果：['a1c', 'abc', 'a*c', 'a-c', 'aEc']

16.a开头b结尾不包含0-9数字的字符

print(re.findall('a[^0-9]c','a1c a2c a*c a-c aEc'))
结果：['a*c', 'a-c', 'aEc']

17.匹配以a开头b最少0最大无线

print(re.findall('ab*','a ab b'))

结果

print(re.findall('ab*','a ab baaa abbbbbbbbbbbbbbbb  ac'))

print(re.findall('ab*','bbbbbb'))
结果[]

18.“+” a开头b最少得有一个

print(re.findall('ab+','a'))
结果：[]

print(re.findall('ab+','bbbbbb'))
结果：[]

19.ab开头结尾包含1或者2或者3

print(re.findall('ab[123]','ab1 ab2 ab3 bbbbb1'))
结果['ab1', 'ab2', 'ab3']

20.ab开头1、2、3结尾最少包含1个

print(re.findall('ab[123]+','ab11111111 ab2 ab3 abc1'))
结果：['ab11111111', 'ab22222222', 'ab3']

print(re.findall('ab[123]+','ab1 ab2 ab3 ab4 ab123'))

结果：['ab1', 'ab2', 'ab3', 'ab123']

print(re.findall('ab[123][123][123]','ab1 ab2 ab3 ab4 ab123'))

结果：

['ab123']

21.

print(re.findall('ab{3}','ab1 abbbbbbbb2 abbbbb3 ab4 ab122'))#限制b的长度
结果：['abbb', 'abbb']

print(re.findall('ab{0,}','a123123123 ab1 abbb123 abbbbbbb123 abbbbbtb'))#b 0个到无穷大

结果：['a', 'ab', 'abbb', 'abbbbbbb', 'abbbbb']

print(re.findall('ab{2,}','a123123123 ab1 abb123 abbbb123 abbbbbt'))
结果：['abb', 'abbbb', 'abbbbb']

22.

print(re.findall('a.*c','a2c abc aec a1c'))#贪婪方式匹配a和c之间除了换行任何字符
结果：['a2c abc aec a1c']

23.

print(re.findall('a.*?c','ac abc aec a1c'))#a到c之间任意字符
结果：['ac', 'abc', 'aec', 'a1c']
print(re.findall('a.*?c','ac abc a111111111c a\nc a1c',re.S))#\n需要转意
结果：['ac', 'abc', 'a111111111c', 'a\nc', 'a1c']

24.（）分组

#（y|ies)找到y和ies结尾的

print(re.findall('compan(?:y|ies)','Too many companies have gone bankrupt, and the next one is my company'))
结果：['companies', 'company'print(re.findall('(ab)+123','ababab123')) #['ab']，匹配到末尾的ab123中的ab

结果：['ab']
print(re.findall('(?:ab)+123','ababab123'))

结果：['ababab123']

print(re.findall(r'a\\c','a\c')) #r代表告诉解释器使用rawstring，即原生字符串，把我们正则内的所有符号都当普通字符处理，不要转义
结果：['a\\c']

print(re.findall('a\\\\c','a\c')) #r代表告诉解释器使用rawstring，即原生字符串，把我们正则内的所有符号都当普通字符处理，不要转义

print(re.findall(r'a\\c','a\c')) #r代表告诉解释器使用rawstring，即原生字符串，把我们正则内的所有符号都当普通字符处理，不要转义
print(re.findall('a\\\\c','a\c')) #同上面的意思一样，和上面的结果一样都是['a\\c']

re.search:会扫描整个字符串,不会从头开始,找到第一个匹配的结果就会返回

print(re.match('a','aleaa make love').group())

content='Extra strings Hello 123 456 World_This is a Regex Demo Extra strings'
res=re.search('Hello.*?(\d+).*?Demo',content) #
print(res.group(1)) #输出结果为

res=re.findall('a','aadddacccaa')#findall找到所以符合条件的结果
for i in res:
    print(i)

print(re.split('[ab]','abcd'))

#re.sub:字符串替换
import re
content='Extra strings Hello 123 456 World_This is a Regex Demo Extra strings'

# content=re.sub('\d+','',content)
# print(content)

print(re.sub('^a','A','alex make love'))#以a为开头的替换

print(re.sub('a','A','alex make love'))#替换所以a

print(re.sub('^(\w+)(\s)(\w+)(\s)(\w+)',r'\5\2\3\4\1','alex make love'))#love make alex
# print(re.sub('^(\w+)(\s+)(\w+)(\s+)(\w+)',r'\5','alex    make       love'))
# print(re.sub('^(\w+)(\W+)(\w+)(\W+)(\w+)',r'\5\2\3\4\1','alex " \ + = make ----/==     love'))#

跟findall区别就是search找到companies就返回就不会再找了只返回一个
# print(re.search('companies|company','my company is already done,all companies will be done').group())

#用\1取得第一个括号的内容
#用法:将123与456换位置
# import re
# content='Extra strings Hello 123 456 World_This is a Regex Demo Extra strings'
#
# # content=re.sub('(Extra.*?)(\d+)(\s)(\d+)(.*?strings)',r'\1\4\3\2\5',content)
# content=re.sub('(\d+)(\s)(\d+)',r'\3\2\1',content)
# print(content)

# print(re.findall(r'\-?\d+\.\d+|(\-?\d+)',"1-2*(60+(-40.35/5.3+1.2)-(-4*3))"))

s='''
你好啊 alex 大傻逼
alex的邮箱是3781231123@qq.com
010-18211413111
378533872

'''

print(re.findall(r"[1-9][0-9]{4,}",s))

posted @ 2017-06-07 15:17 hanjialong 阅读(182) 评论(0) 收藏举报

刷新页面返回顶部

佛系宅男