python 学习之 fasta/fastq 处理利器--pyfastx

 

001、 fasta序列迭代

 

复制代码
(base) root@PC1:/home/test2# cat a.fasta         ## 测试fasta文件
>gene1 myc
AGCTGCCTAAGC
GGCATAGCTAATCG
>gene2 jun
ACCGAATCGGAGCGATG
GGCATTAAAGATCTAGCT
>gene3 malat1
AGGCTAGCGAG
GCGCGAG
GATTAGGCG
>>> import pyfastx                             ## 导入包
>>> fa = pyfastx.Fastx('a.fasta')              ## 读取fasta文件
>>> type(fa)
<class 'Fastx'>
>>> for i,j,k in fa:                           ## 迭代, i默认那么; j序列; k注释。
...     print(i)
...     print(j)
...     print(k)
...
gene1
AGCTGCCTAAGCGGCATAGCTAATCG
myc
gene2
ACCGAATCGGAGCGATGGGCATTAAAGATCTAGCT
jun
gene3
AGGCTAGCGAGGCGCGAGGATTAGGCG
malat1
复制代码

 

002、如果含有小写字母,指定输出为大写字母

(base) root@PC1:/home/test2# cat a.fasta          ## 测试fasta文件
>JZ822577.1 contig1 cDNA library of flower petals in tree peony by suppression subtractive hybridization Paeonia suffruticosa cDNA, mRNA sequence
CTctagcttaaaTTACTTCTTCACATTCCAGATCACTCAGGCTCTTTGTCATTTTAGTTTGACTAGGATATCGAGTATTCAAGCTCATCGCTTTTGGTAATCTTTGCGGTGCATGCCTTTGCATGCTGTATTGCTGCTTCATCATCCCCTTTGACTTGTGTGGCGGTGGCAAGACATCCGAAGAGTTAAGCGATGCTTGTCTAGTCAATTTCCCCATGTACAGAATCATTGTTGTCAATTGGTTGTTTCCTTGATGGTGAAGGGGCTTCAATACATGAGTTCCAAACTAACATTTCTTGACTAACACTTGAGGAAGAAGGACAAGGGTCCCCATGT
>>> for item in pyfastx.Fastx('a.fasta', uppercase=True):      ## 读取数据, 全部以大写输出
...     print(item)
...
('JZ822577.1', 'CTCTAGCTTAAATTACTTCTTCACATTCCAGATCACTCAGGCTCTTTGTCATTTTAGTTTGACTAGGATATCGAGTATTCAAGCTCATCGCTTTTGGTAATCTTTGCGGTGCATGCCTTTGCATGCTGTATTGCTGCTTCATCATCCCCTTTGACTTGTGTGGCGGTGGCAAGACATCCGAAGAGTTAAGCGATGCTTGTCTAGTCAATTTCCCCATGTACAGAATCATTGTTGTCAATTGGTTGTTTCCTTGATGGTGAAGGGGCTTCAATACATGAGTTCCAAACTAACATTTCTTGACTAACACTTGAGGAAGAAGGACAAGGGTCCCCATGT', 'contig1 cDNA library of flower petals in tree peony by suppression subtractive hybridization Paeonia suffruticosa cDNA, mRNA sequence')

 

003、fastq序列迭代

(base) root@PC1:/home/test2# cat b.fastq                           ## 测试fastq文件
@WT_rep1_BAF155.1 SALLY:291:C149WACXX:2:1101:2579:1951 length=51
CTGNCCAAGGTAATTTATAGATTCAATGCCATCCCCATCAAGCTACCAANG
+WT_rep1_BAF155.1 SALLY:291:C149WACXX:2:1101:2579:1951 length=51
BCC#4ADDHHBFHIJJIIJJJIIIIJHIJIJIIJGGIJJJJIGJJJJJJ##
复制代码
>>> fq = pyfastx.Fastx('b.fastq')      ## 读取数据
>>> for i,j,k,l in fq:                 ## 迭代
...     print(i)
...     print(j)
...     print(k)
...     print(l)
...
WT_rep1_BAF155.1
CTGNCCAAGGTAATTTATAGATTCAATGCCATCCCCATCAAGCTACCAANG
BCC#4ADDHHBFHIJJIIJJJIIIIJHIJIJIIJGGIJJJJIGJJJJJJ##
SALLY:291:C149WACXX:2:1101:2579:1951 length=51
复制代码

 

posted @   小鲨鱼2018  阅读(351)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 单元测试从入门到精通
· 上周热点回顾(3.3-3.9)
· winform 绘制太阳,地球,月球 运作规律
历史上的今天:
2021-08-12 c primer plus 5编程练习
点击右上角即可分享
微信分享提示