linux 中 shell 将fasta文件依据scafold 拆分为单独的文件

 

001、

(base) root@PC1:/home/test2# ls
a.fasta
(base) root@PC1:/home/test2# sed '$a tag_tag' a.fasta -i       ## 在fasta末尾添加一个标记tag_tag
(base) root@PC1:/home/test2# cat a.fasta                       ## fasta文件
>scaffold_1
CCCGGGTAAAACGGGTCTTCAAGAAAACGCTCCTCCGTTAATGCCGGCCGATTCAAATAA
CCTCTGGCAACACCCGCTCCGGCAATGTATAGTTCACCGATACATCCAACAGGCAGCATC
CGCTGATTCTGATTCAGGATATACAATCTGACATGATGAACAGGTTTTCCAATTGGAATC
CGTTCAAGTTTTTCTTGCGGCGGACAATCAAAGAATGCAGCTTCTACGGTTGCTTCCGTT
GGCCCATAGGAATTGGTTATTGAAACATTTGGAAGCAACACGTGAAATCGGGAGACAAGA
>scaffold_2
CACGCCGCCAGCGTTCGTCCTGAGCCAGGATCAAACTCTCCGATAAATGGATCACAGGTT
AAGTTCACCGCATCCTGCGGCGACACCTGTGTGGCCTGCGTCGTGCAGGCCCTAGTTTGA
CTGACTACGCACATCGCTGTGCGATTTATAAAAATGAATTAACAGGTACGTTTTGTCTTG
TTTAGTTTTCAAAGAACTTTGCGTGCTTCTCTCGAAGCGACTACTTAATAGTAACATTTT
TAGTTAACTAGGTCAATACTTTTTTGAAAAAGTTTTTACTAGTCATAATGGTCATGTTTG
>scaffold_3
TTGATCCAGTGGCTCCGGTTACTCCAGTTGATCCTGTTGCGCCTGTTGCTCCAGTTTCTC
CGGTTGGTCCGGTTGATCCGGTTGCACCTGTTACTCCAGTGGCTCCGGTTACTCCCGTCG
CACCAGTTTCTCCTGTCGCACCAGTTGATCCTGTTGCGCCTGTTGGTCCTGTATCTCCAG
TTGCACCAGTTACTCCCGTTACTCCTGTTGGACCGGTTGCGCCTGTTACTCCGGTTGCGC
CTGTTGCTCCTGTTGCTCCTGTTGATCCCGTTGCACCTGTTGGTCCAGTCGGTCCAATTC
>scaffold_4
CCTGAGCCAGGATCAAACTCTCCGATAAATGGATCACAGGTTAAGTTCACCGCATCCTGC
GGCGACACCTGTGTGACCTGCGTCGTGCAGGCCCTAGTTTGACTGACTACGCACATCGCT
GTGCGATTTTTAAAAACTGAATTAACAGGTACGTTTTGTCTTGTTTAGTTTTCAAAGATC
ATTTTCGCTTCTTGTTGAAGCGACTTTATTAATATAACATTTTGACTTTCTTTTGTCAAA
TGTTTTTTTGATTTATTTTCCCGCCGCTGTGAGCTTGTTTTCTCAGAAGCGCATCAGCGA
>scaffold_5
TCACCCCGGAATCAGCTGACATAGAAGCACTGAAATCAGCACTGAAGGAAACCCTGCCGG
tag_tag                                                            ## 拆分脚本
(base) root@PC1:/home/test2# grep ">" a.fasta | paste - <(grep ">" a.fasta | sed -n '2, $p' | sed '$a tag_tag') | awk '{split($1,a, ">"); print a[2], $0}' | while read {i,j,k}; do sed -n "/$j/,/$k/{/$k/b; p}" a.fasta > $i; done
(base) root@PC1:/home/test2# ls
a.fasta  scaffold_1  scaffold_2  scaffold_3  scaffold_4  scaffold_5
(base) root@PC1:/home/test2# cat scaffold_1                         ## 查看运行结果
>scaffold_1
CCCGGGTAAAACGGGTCTTCAAGAAAACGCTCCTCCGTTAATGCCGGCCGATTCAAATAA
CCTCTGGCAACACCCGCTCCGGCAATGTATAGTTCACCGATACATCCAACAGGCAGCATC
CGCTGATTCTGATTCAGGATATACAATCTGACATGATGAACAGGTTTTCCAATTGGAATC
CGTTCAAGTTTTTCTTGCGGCGGACAATCAAAGAATGCAGCTTCTACGGTTGCTTCCGTT
GGCCCATAGGAATTGGTTATTGAAACATTTGGAAGCAACACGTGAAATCGGGAGACAAGA
(base) root@PC1:/home/test2# cat scaffold_5
>scaffold_5
TCACCCCGGAATCAGCTGACATAGAAGCACTGAAATCAGCACTGAAGGAAACCCTGCCGG

 

posted @ 2022-08-10 01:05  小鲨鱼2018  阅读(76)  评论(0编辑  收藏  举报