python 学习之 fasta/fastq 处理利器--pyfastx
001、 fasta序列迭代
(base) root@PC1:/home/test2# cat a.fasta ## 测试fasta文件 >gene1 myc AGCTGCCTAAGC GGCATAGCTAATCG >gene2 jun ACCGAATCGGAGCGATG GGCATTAAAGATCTAGCT >gene3 malat1 AGGCTAGCGAG GCGCGAG GATTAGGCG >>> import pyfastx ## 导入包 >>> fa = pyfastx.Fastx('a.fasta') ## 读取fasta文件 >>> type(fa) <class 'Fastx'> >>> for i,j,k in fa: ## 迭代, i默认那么; j序列; k注释。 ... print(i) ... print(j) ... print(k) ... gene1 AGCTGCCTAAGCGGCATAGCTAATCG myc gene2 ACCGAATCGGAGCGATGGGCATTAAAGATCTAGCT jun gene3 AGGCTAGCGAGGCGCGAGGATTAGGCG malat1
002、如果含有小写字母,指定输出为大写字母
(base) root@PC1:/home/test2# cat a.fasta ## 测试fasta文件 >JZ822577.1 contig1 cDNA library of flower petals in tree peony by suppression subtractive hybridization Paeonia suffruticosa cDNA, mRNA sequence CTctagcttaaaTTACTTCTTCACATTCCAGATCACTCAGGCTCTTTGTCATTTTAGTTTGACTAGGATATCGAGTATTCAAGCTCATCGCTTTTGGTAATCTTTGCGGTGCATGCCTTTGCATGCTGTATTGCTGCTTCATCATCCCCTTTGACTTGTGTGGCGGTGGCAAGACATCCGAAGAGTTAAGCGATGCTTGTCTAGTCAATTTCCCCATGTACAGAATCATTGTTGTCAATTGGTTGTTTCCTTGATGGTGAAGGGGCTTCAATACATGAGTTCCAAACTAACATTTCTTGACTAACACTTGAGGAAGAAGGACAAGGGTCCCCATGT >>> for item in pyfastx.Fastx('a.fasta', uppercase=True): ## 读取数据, 全部以大写输出 ... print(item) ... ('JZ822577.1', 'CTCTAGCTTAAATTACTTCTTCACATTCCAGATCACTCAGGCTCTTTGTCATTTTAGTTTGACTAGGATATCGAGTATTCAAGCTCATCGCTTTTGGTAATCTTTGCGGTGCATGCCTTTGCATGCTGTATTGCTGCTTCATCATCCCCTTTGACTTGTGTGGCGGTGGCAAGACATCCGAAGAGTTAAGCGATGCTTGTCTAGTCAATTTCCCCATGTACAGAATCATTGTTGTCAATTGGTTGTTTCCTTGATGGTGAAGGGGCTTCAATACATGAGTTCCAAACTAACATTTCTTGACTAACACTTGAGGAAGAAGGACAAGGGTCCCCATGT', 'contig1 cDNA library of flower petals in tree peony by suppression subtractive hybridization Paeonia suffruticosa cDNA, mRNA sequence')
003、fastq序列迭代
(base) root@PC1:/home/test2# cat b.fastq ## 测试fastq文件 @WT_rep1_BAF155.1 SALLY:291:C149WACXX:2:1101:2579:1951 length=51 CTGNCCAAGGTAATTTATAGATTCAATGCCATCCCCATCAAGCTACCAANG +WT_rep1_BAF155.1 SALLY:291:C149WACXX:2:1101:2579:1951 length=51 BCC#4ADDHHBFHIJJIIJJJIIIIJHIJIJIIJGGIJJJJIGJJJJJJ##
>>> fq = pyfastx.Fastx('b.fastq') ## 读取数据 >>> for i,j,k,l in fq: ## 迭代 ... print(i) ... print(j) ... print(k) ... print(l) ... WT_rep1_BAF155.1 CTGNCCAAGGTAATTTATAGATTCAATGCCATCCCCATCAAGCTACCAANG BCC#4ADDHHBFHIJJIIJJJIIIIJHIJIJIIJGGIJJJJIGJJJJJJ## SALLY:291:C149WACXX:2:1101:2579:1951 length=51