多维哈希寻找gene上的SNP($ling_1[1]<=$ling_2[1]<=$ling_1[2])
我想比较两个文件,它们的元素是不对等的,第二个文件中第二列($ling_2[1])和第一个文件的第二第三列为数字($ling_1[1]\$ling_1[2]),两个文件的第一列均为染色体编号,一个编号对应多组数字。
我的需求是:我想找到第一个文件上和第二个文件的染色体编号相同,且第二个文件的数字是在第一个文件两个数字之间的,即$ling_1[1]<=$ling_2[1]<=$ling_1[2].把符合条件的输出到新的文件,怎么搞?
————————————————————————————————————————————————————————————————————————
我的输入文件:
1、filter_allgene.txt chr1 107543234 107546786 (ABCA1|NM_005502|-|3-UTR3).u-50d+50e;(ABCA1|NM_005502|-|CDS49).u-50d+50e chr1 107547627 107547970 (ABCA1|NM_005502|-|CDS48).u-50d+50e chr1 107548529 107548721 (ABCA1|NM_005502|-|CDS47).u-50d+50e chr1 107549104 107549307 (ABCA1|NM_005502|-|CDS46).u-50d+50e chr9 107550151 107550385 (ABCA1|NM_005502|-|CDS45).u-50d+50e chr9 107550657 107550898 (ABCA1|NM_005502|-|CDS44).u-50d+50e chr9 107553153 107553359 (ABCA1|NM_005502|-|CDS43).u-50d+50e chr9 107554167 107554329 (ABCA1|NM_005502|-|CDS42).u-50d+50e chr9 107555017 107555237 (ABCA1|NM_005502|-|CDS41).u-50d+50e 2、dbSNP_hg19.chr.All chr1 93617546 1 1 0 0 0.888 0.112 0 rs546 chr1 15546825 1 1 0 0.261 0 0 0.739 rs549 chr1 203713133 1 1 0 0.181 0 0 0.819 rs568 chr1 24181041 1 1 0 0.007 0 0 0.993 rs665 chr1 53679329 0 0 0 0.01 0 0 0.01 rs672 chr1 173876561 1 1 0 0 0.5 0 0.5 rs677 chr1 161191522 1 1 0 0 0.302 0.698 0 rs685 chr1 230845794 0 1 0 0.01 0 0 0.01 rs699 chr1 233971983 1 1 0 0 0.01 1.0 0 rs701 chr1 32372139 1 0 0 0 1.0 0.01 0 rs717 chr1 34061688 0 1 0 0 0.01 0 0.01 rs737 chr1 173120583 1 1 0 0 0 0.75 0.25 rs750 chr1 87857969 1 1 0 0.162 0 0 0.838 rs751 chr1 214859676 1 1 0 0.433 0 0 0.567 rs759 chr9 107543239 0 1 0 0 0.01 0.01 0 rs171 chr9 107549144 1 1 0 0 0.833 0.167 0 rs538 ________________________________________________________________________________________________
代码:
use strict; use warnings; open IN1,"< filter_allgene.txt" or die"$!"; open IN2,"< dbSNP_hg19.chr.All" or die"$!"; open OUT,"> result.txt" or die"$!"; my $ref = {}; while () { chomp; my @arr = split/\t/; $ref->{$arr[0]}->{$arr[1]}->{$arr[2]} = $.; } close IN1; local $" = "\t"; while () { chomp; my @arr = split(/\t/,$_,3); next unless (defined($ref->{$arr[0]})); foreach my $start (sort {$a<=>$b} keys %{$ref->{$arr[0]}}) { last if ($arr[1] < $start); foreach my $end (sort {$a<=>$b} keys %{$ref->{$arr[0]}->{$start}}) { if ($arr[1]>=$start and $arr[1]<=$end) { print OUT "@arr\n"; } } } } close IN2; close OUT;