awk分类匹配问题--比较每一子类和其总类的区别

像下面这种文本,我想以第一列作为类,比如都是以*阿拉伯数字开头后面接字母的,归为一类,如*1和*1XN、*1A是一类,*2和*2A、*2D是一类,*1是种类,*1XN、*1A是子类(如果没有子类则不理会),子类每一行和它的总类比较,判断每一列和总类对应列的差异,如果不同则输出该列的title,并且输出不同的情况。比如*1的第二列是C,它的子类*1XN和*1A的第二列分别是C和A,那么就在对应位置输出位于首行的title:rs769258:C>A;

----------------------------------------------------------------------------

CYP2D6        rs769258                rs28371696        rs1065852
*1                C                      C                      G
*1XN              C                      G                      T
*1A               A                      C                      G
*2                A                      C                      G
*2A               C                      T                      G
*2D               C                      C                      G
*3                C                      C                      G
*4                C                      C                      G

--------------------------------------------------------------------------------

1 [root@localhost ~]# awk 'NR==1{for(i=2;i<=NF;i++)a[i]=$i}NR>1{if($1~/^\*[0-9]+$/)for(i=2;i<=NF;i++)b[i]=$i;else{printf $1;for(i=2;i<=NF;i++)if($i!=b[i])printf "\t"a[i]":"b[i]">"$i;print ""}}' i
2 *1XN    rs28371696:C>G  rs1065852:G>T
3 *1A     rs769258:C>A
4 *2A     rs769258:A>C    rs28371696:C>T
5 *2D     rs769258:A>C

 

posted on 2013-12-13 10:46  三川  阅读(429)  评论(0编辑  收藏  举报