在SAS中,使用 SET 语句进行数据集纵向合并,用 MERGE 语句进行横向合并:

DATA new_dataset;
  SET dataset_1 dataset_n;
DATA new_dataset;   MERGE dataset_1 dataset_n;
BY variable_list;

 纵向合并后,new_dataset 的行数等于每个数据集行数的加总。If one of the data sets has a variable not contained in the other data sets, then the observations from the other data sets will have missing values for that variable. 

横向合并中的 by variable list 是所有数据集共同的变量。

一、纵向合并 

例一:合并两个数据集 southentrance 和 northentrance, 合并后数据集观测值的顺序维持各自不变

DATA both;
SET southentrance northentrance;
    IF Age = . THEN AmountPaid = .;
    ELSE IF Age < 3 THEN AmountPaid = 0;
    ELSE IF Age < 65 THEN AmountPaid = 35;
    ELSE AmountPaid = 27;
    PROC PRINT DATA = both;
    TITLE 'Both Entrances';
RUN;

                      

例二:合并后数据集观测值按照 PassNumer 排序

DATA interleave;
    SET northentrance southentrance;
    BY PassNumber;
    PROC PRINT DATA = interleave;
    TITLE 'Both Entrances, By Pass Number';
RUN;

 

 

二、横向合并

 例一:合并两个数据集 salesdata 和 descriptions,合并后的数据集包含两个数据集的所有观测值,相当于 full join

/*Merge之前必须先对两个数据集按照 By variables 排序*/
DATA chocolates; MERGE sales descriptions; BY CodeNum; PROC PRINT DATA = chocolates; TITLE ”Today's Chocolate Sales”; RUN;

                

合并后的数据集 chocolates 包含两个数据集的所有 observations, 如果某条 observation 在另外一个数据集中没有,则对应的variable展示为缺失值。

 

 例二:一对多数据集合并,仍然是 full join

DATA prices;
    MERGE shoes discount;
    BY ExerciseType;
    NewPrice = ROUND(RegularPrice - (RegularPrice * Adjustment), .01);
PROC PRINT DATA = prices;
TITLE ’Price List for May’;
RUN;

                       

 

例三:数据集合并 - Merge vs. left join/ right join/ inner join

假设有两个数据集 ICF 和 DM:

ICF data:                             DM data:

             

left join :

data New;
    merge ICF(in=a) DM(in=b);  
    by cn dn;
    if a;    
run;      

相当于
ICF a
left join DM b
on a.cn=b.cn and a.dn=b.dn      

 

right join :

data New;
    merge ICF(in=a) DM(in=b);
    by cn dn;
    if b  ;
run;

  

 

 inner join :

data New;
    merge ICF(in=a) DM(in=b);
    by cn dn;
    if a and b ;
run;

相当于
ICF a
inner join DM b
on a.cn=b.cn and a.dn=b.dn

 

IN= option 生成临时变量,满足条件时等于1,否则为0。"If a;" equal to "if a=1". "If a=0" means to select observations not in dataset a but only in dataset b.  这个选项可以用在任何数据集合并中(SET, MERGE, UPDATE)。

posted on 2022-02-20 16:57  MOZY  阅读(2966)  评论(0编辑  收藏  举报