[笔记] sam文件挑取uniq reads
本来呢,samtools本来就可以一句话就能提取,只是没得网络没得下载samtools,只能保留自传脚本。
输入文件为raw.sam
输出为提取uniq reads后的uniq.sam(仍然保留着sam格式)
终端里会打印出简单的报告
1 use strict; 2 use warnings; 3 4 my ($in, $out) = @ARGV; 5 die <DATA> unless (@ARGV == 2); 6 open IN, "$in"; 7 open OUT, ">$out"; 8 9 my ($sum, $uniq, $xnd, $other) = (0,0,0,0); 10 my $header; 11 while (<IN>){ 12 13 my ($a, $b) = (0, 0); 14 chomp; 15 if ($_ =~ /^\@/){ 16 print OUT "$_\n"; 17 $header ++; 18 } 19 else {$sum ++}; 20 $a = 1 if (/AS:i/); 21 $b = 1 if (/XS:i/); 22 if ($a ==1 ){ 23 if ($b == 0){ 24 $uniq ++; 25 print OUT "$_\n"; 26 } 27 if ($b == 1){ 28 $xnd ++; 29 } 30 } 31 elsif ($a == 0){ 32 $other ++; 33 #header number will counted in this list 34 } 35 } 36 print "input sam is : $in\n"; 37 print "output uniq sam is : $out\n"; 38 print "all reads num: $sum\n"; 39 print "uniq reads num: $uniq\n"; 40 print "2nd reads num: $xnd\n"; 41 print "error reads num: " . ($other-$header) . "\n"; 42 my $ratio = $uniq/$sum; 43 printf "uniq percentage: %.1f\n", ($ratio*100); 44 45 __DATA__ 46 usage: ./sam_uniq_reads.pl <sam_input.sam> <uniq_reads_output.sam>