python re 正则匹配 split sub

import re

编译：

motif=‘([ST])Q’

seq="SQAAAATQ"

regrex=re.compile(motif) #编译成正则对象

regrex=re.compile(motif,re.IGNORECASE) #编译成正则对象,忽略大小写

匹配：

sea=regrex..search(seq) #返回第一次的匹配对象

mat=regrex.match(seq)#从序列开始位置寻找正则匹配对象

all=regrex.findall(seq)#返回包含所有匹配的子字符串的列表。如果匹配有两个以上的有括号()group，返回的是由group元组的列表 [(group1,group2),(...)]，每个元组是一个匹配项。如果只有一个括号组，则返回匹配括号组成的列表见例子

ier=regrx.finditer(seq)#返回所有匹配对象的迭代器

for i in iter:

print i.group() #group()返回匹配对象字符串内容

print i.group(1) #返回匹配到的子组

print i.span() #返回匹配对象的包含对象的元组

print i.start() #返回匹配对象的起始位置

print i.end() #返回匹配对象的终止位置

匹配到后修改字符串

1.split字符串

separator=re.compile('\|')

anno=''A|B|C"

col=separator.split(anno)

2.替换内容

new=separator.sub("@",anno) # sub(r,s,[c]) 将s中匹配到的前c个'|'替换成@，默认全部替换

sublist=separator.subn("@",anno) #subn(r,s,[c]) 返回元组（新的字符串，替代的数量）

finall 2个以上grop例子：

  str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
  tuples = re.findall(r'([\w\.-]+)@([\w\.-]+)', str)
  print tuples  ## [('alice', 'google.com'), ('bob', 'abc.com')]
  for tuple in tuples:
    print tuple[0]  ## username
    print tuple[1]  ## host

输出：

[('alice', 'google.com'), ('bob', 'abc.com')]
alice
google.com
bob
abc.com

findall 1个括号组

import re
str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
tuples = re.findall(r'([\w\.-]+)@[\w\.-]+', str)
print tuples  ## 
for tuple in tuples:
    print tuple[0]  ## username
    print tuple[1]  ## host

输出：　

['alice', 'bob']
a
l
b
o

忽视大小写：

re.search(pat, str, re.IGNORECASE)　

soapnuke 例子：

p = re.compile('\S+_(1|2).(fq|fastq).gz')
if p.search(i):
    key = 'fq' + str(p.search(i).group(1))
    fqs[key] = i

识别样本名（HG或NA开头）

p=re.compile('\S+\/([HG|NA].*)\/\S+\/(\S+\.fq.gz)')
p.search("/zfssz6/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT5/P17H10200N0283_Temp/HG00403*1/200713_SEQ012_FP100001181BR_L01_SP2007010096/FP100001181BR_L01_595_1.fq.gz").group(2)
'FP100001181BR_L01_595_1.fq.gz'
p.search("/zfssz6/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT5/P17H10200N0283_Temp/HG00403*1/200713_SEQ012_FP100001181BR_L01_SP2007010096/FP100001181BR_L01_595_1.fq.gz").group(1)
'HG00403*1'

　字符串替换：

>text = "JGood is a handsome boy, he is cool, clever, and so on..."

>print re.sub(r'\s+','-',text)
JGood-is-a-handsome-boy,-he-is-cool,-clever,-and-so-on...

posted on 2017-12-19 23:24 BioinformaticsMaster 阅读(1260) 评论(0) 编辑收藏举报

刷新页面返回顶部

路随时间

python re 正则匹配 split sub

导航

公告