awk和perl对多文本进行统计(求他们的并集,并且如果前三列相同第四列的数字相加,第五列信息合并)
我手头上有三个文件,他们的格式都是一样的,总共5列.如果我想求他们的并集,并且如果前三列相同第四列的数字相加,第五列信息合并.我尝试用多维哈希去做,可是结果并不齐全.应该怎么做呢?
以下是文件格式:
____________________________________________________________________________________
1.505.txt
WINGS 1000 4000 3 3/20_505
WINGS 5000 6000 8 8/20_505
SANLY 2000 4000 9 9/20_505
TINAG 8000 10000 11 11/20_505
2.707.txt
WINGS 1000 4000 3 3/18_707
ANNY 4000 7000 4 4/18_707
MOLLY 3000 4300 5 5/18_707
TINAG 8000 10000 6 6/18_707
3.808.txt
VEELY 2000 4000 4 4/20_808
WINGS 5000 6000 5 5/20_808
ANNY 4000 7000 9 9/20_808
TINAG 8000 10000 4 4/20_808
__________________________________________________________________________________________
结果是:
WINGS 1000 4000 6 3/20_505;3/18_707
WINGS 5000 6000 13 8/20_505;5/20_808
SANLY 2000 4000 9 9/20_505
TINAG 8000 10000 2111/20_505;6/18_707;4/20_808
ANNY 4000 7000 13 4/18_707;9/20_808
MOLLY 3000 4300 5 5/18_707
VEELY 2000 4000 4 4/20_808
_____________________________________________________________________________________________
1 perl comb.pl 505.txt 707.txt 808.txt 2 3 use strict; 4 use warnings; 5 6 my %hSeg; 7 my @aKey; 8 9 my $sCnt = 0; 10 while(<>){ 11 $sCnt++; 12 chomp; 13 my @aData = split; 14 if(@aData != 5){ 15 print "Line $sCnt error: $_\n"; 16 next; 17 } 18 my $sKey = "@aData[0..2]"; 19 if(exists $hSeg{$sKey}){ 20 $hSeg{$sKey}{val} += $aData[3]; 21 $hSeg{$sKey}{str} .= ";$aData[4]"; 22 } 23 else{ 24 push @aKey, $sKey; 25 $hSeg{$sKey}{val} = $aData[3]; 26 $hSeg{$sKey}{str} = $aData[4]; 27 } 28 } 29 30 foreach(@aKey){ 31 print "$_ $hSeg{$_}{val} $hSeg{$_}{str}\n"; 32 } 33 ----------------------------------------------------------------------- 34 也可以: 35 awk '{n=$1FS$2FS$3; a[n]+=$4; b[n]?b[n]=b[n]";"$5:b[n]=$5}END{for (i in a)print i,a[i],b[i]}' 505.txt 707.txt 808.txt