python 学习之 fasta/fastq 处理利器--pyfastx

 

001、 fasta序列迭代

 

(base) root@PC1:/home/test2# cat a.fasta         ## 测试fasta文件
>gene1 myc
AGCTGCCTAAGC
GGCATAGCTAATCG
>gene2 jun
ACCGAATCGGAGCGATG
GGCATTAAAGATCTAGCT
>gene3 malat1
AGGCTAGCGAG
GCGCGAG
GATTAGGCG
>>> import pyfastx                             ## 导入包
>>> fa = pyfastx.Fastx('a.fasta')              ## 读取fasta文件
>>> type(fa)
<class 'Fastx'>
>>> for i,j,k in fa:                           ## 迭代, i默认那么; j序列; k注释。
...     print(i)
...     print(j)
...     print(k)
...
gene1
AGCTGCCTAAGCGGCATAGCTAATCG
myc
gene2
ACCGAATCGGAGCGATGGGCATTAAAGATCTAGCT
jun
gene3
AGGCTAGCGAGGCGCGAGGATTAGGCG
malat1

 

002、如果含有小写字母,指定输出为大写字母

(base) root@PC1:/home/test2# cat a.fasta          ## 测试fasta文件
>JZ822577.1 contig1 cDNA library of flower petals in tree peony by suppression subtractive hybridization Paeonia suffruticosa cDNA, mRNA sequence
CTctagcttaaaTTACTTCTTCACATTCCAGATCACTCAGGCTCTTTGTCATTTTAGTTTGACTAGGATATCGAGTATTCAAGCTCATCGCTTTTGGTAATCTTTGCGGTGCATGCCTTTGCATGCTGTATTGCTGCTTCATCATCCCCTTTGACTTGTGTGGCGGTGGCAAGACATCCGAAGAGTTAAGCGATGCTTGTCTAGTCAATTTCCCCATGTACAGAATCATTGTTGTCAATTGGTTGTTTCCTTGATGGTGAAGGGGCTTCAATACATGAGTTCCAAACTAACATTTCTTGACTAACACTTGAGGAAGAAGGACAAGGGTCCCCATGT
>>> for item in pyfastx.Fastx('a.fasta', uppercase=True):      ## 读取数据, 全部以大写输出
...     print(item)
...
('JZ822577.1', 'CTCTAGCTTAAATTACTTCTTCACATTCCAGATCACTCAGGCTCTTTGTCATTTTAGTTTGACTAGGATATCGAGTATTCAAGCTCATCGCTTTTGGTAATCTTTGCGGTGCATGCCTTTGCATGCTGTATTGCTGCTTCATCATCCCCTTTGACTTGTGTGGCGGTGGCAAGACATCCGAAGAGTTAAGCGATGCTTGTCTAGTCAATTTCCCCATGTACAGAATCATTGTTGTCAATTGGTTGTTTCCTTGATGGTGAAGGGGCTTCAATACATGAGTTCCAAACTAACATTTCTTGACTAACACTTGAGGAAGAAGGACAAGGGTCCCCATGT', 'contig1 cDNA library of flower petals in tree peony by suppression subtractive hybridization Paeonia suffruticosa cDNA, mRNA sequence')

 

003、fastq序列迭代

(base) root@PC1:/home/test2# cat b.fastq                           ## 测试fastq文件
@WT_rep1_BAF155.1 SALLY:291:C149WACXX:2:1101:2579:1951 length=51
CTGNCCAAGGTAATTTATAGATTCAATGCCATCCCCATCAAGCTACCAANG
+WT_rep1_BAF155.1 SALLY:291:C149WACXX:2:1101:2579:1951 length=51
BCC#4ADDHHBFHIJJIIJJJIIIIJHIJIJIIJGGIJJJJIGJJJJJJ##
>>> fq = pyfastx.Fastx('b.fastq')      ## 读取数据
>>> for i,j,k,l in fq:                 ## 迭代
...     print(i)
...     print(j)
...     print(k)
...     print(l)
...
WT_rep1_BAF155.1
CTGNCCAAGGTAATTTATAGATTCAATGCCATCCCCATCAAGCTACCAANG
BCC#4ADDHHBFHIJJIIJJJIIIIJHIJIJIIJGGIJJJJIGJJJJJJ##
SALLY:291:C149WACXX:2:1101:2579:1951 length=51

 

posted @ 2022-08-12 12:39  小鲨鱼2018  阅读(290)  评论(0编辑  收藏  举报