多维哈希寻找gene上的SNP($ling_1[1]<=$ling_2[1]<=$ling_1[2])
我想比较两个文件,它们的元素是不对等的,第二个文件中第二列($ling_2[1])和第一个文件的第二第三列为数字($ling_1[1]\$ling_1[2]),两个文件的第一列均为染色体编号,一个编号对应多组数字。
我的需求是:我想找到第一个文件上和第二个文件的染色体编号相同,且第二个文件的数字是在第一个文件两个数字之间的,即$ling_1[1]<=$ling_2[1]<=$ling_1[2].把符合条件的输出到新的文件,怎么搞?
————————————————————————————————————————————————————————————————————————
我的输入文件:
1、filter_allgene.txt chr1 107543234 107546786 (ABCA1|NM_005502|-|3-UTR3).u-50d+50e;(ABCA1|NM_005502|-|CDS49).u-50d+50e chr1 107547627 107547970 (ABCA1|NM_005502|-|CDS48).u-50d+50e chr1 107548529 107548721 (ABCA1|NM_005502|-|CDS47).u-50d+50e chr1 107549104 107549307 (ABCA1|NM_005502|-|CDS46).u-50d+50e chr9 107550151 107550385 (ABCA1|NM_005502|-|CDS45).u-50d+50e chr9 107550657 107550898 (ABCA1|NM_005502|-|CDS44).u-50d+50e chr9 107553153 107553359 (ABCA1|NM_005502|-|CDS43).u-50d+50e chr9 107554167 107554329 (ABCA1|NM_005502|-|CDS42).u-50d+50e chr9 107555017 107555237 (ABCA1|NM_005502|-|CDS41).u-50d+50e 2、dbSNP_hg19.chr.All chr1 93617546 1 1 0 0 0.888 0.112 0 rs546 chr1 15546825 1 1 0 0.261 0 0 0.739 rs549 chr1 203713133 1 1 0 0.181 0 0 0.819 rs568 chr1 24181041 1 1 0 0.007 0 0 0.993 rs665 chr1 53679329 0 0 0 0.01 0 0 0.01 rs672 chr1 173876561 1 1 0 0 0.5 0 0.5 rs677 chr1 161191522 1 1 0 0 0.302 0.698 0 rs685 chr1 230845794 0 1 0 0.01 0 0 0.01 rs699 chr1 233971983 1 1 0 0 0.01 1.0 0 rs701 chr1 32372139 1 0 0 0 1.0 0.01 0 rs717 chr1 34061688 0 1 0 0 0.01 0 0.01 rs737 chr1 173120583 1 1 0 0 0 0.75 0.25 rs750 chr1 87857969 1 1 0 0.162 0 0 0.838 rs751 chr1 214859676 1 1 0 0.433 0 0 0.567 rs759 chr9 107543239 0 1 0 0 0.01 0.01 0 rs171 chr9 107549144 1 1 0 0 0.833 0.167 0 rs538 ________________________________________________________________________________________________
代码:
use strict;
use warnings;
open IN1,"< filter_allgene.txt" or die"$!";
open IN2,"< dbSNP_hg19.chr.All" or die"$!";
open OUT,"> result.txt" or die"$!";
my $ref = {};
while () {
chomp;
my @arr = split/\t/;
$ref->{$arr[0]}->{$arr[1]}->{$arr[2]} = $.;
}
close IN1;
local $" = "\t";
while () {
chomp;
my @arr = split(/\t/,$_,3);
next unless (defined($ref->{$arr[0]}));
foreach my $start (sort {$a<=>$b} keys %{$ref->{$arr[0]}}) {
last if ($arr[1] < $start);
foreach my $end (sort {$a<=>$b} keys %{$ref->{$arr[0]}->{$start}}) {
if ($arr[1]>=$start and $arr[1]<=$end) {
print OUT "@arr\n";
}
}
}
}
close IN2;
close OUT;
浙公网安备 33010602011771号