Combine vectors of different length in R

当我们用R进行数据合并的时候,有时候会遇到最复杂的情况:被合并的两个数据在行上不完全匹配,在列上也不完全匹配。为了保证合并后的数据不遗漏不重复,我们需要对数据进行拆分,打散,再分步合并。

 

思路如下:

1 找出两个table中共有的定位变量。

2 按照定位变量是否共有把其中一个table打散。

3 对于sub-table1进行横向合并,只保留新增列。

4 对于sub-table2进行纵向合并,保留所有列。

 

写了一个函数。

 

 1 yt=function(data1,data2,id="id"){
 2   #columns in data1
 3   col1=unique(colnames(data1))
 4   row1=unique(data1[[id]])
 5   #columns in data2
 6   col2=unique(colnames(data2))
 7   row2=unique(data2[[id]])
 8   #subset the data2 into 2 tables:
 9   #1.id in data1, keep only significant columns (could be whatever columns)
10   #2.id not in data1, keep all columns
11   inter.col=intersect(col1,col2)
12   inter.row=intersect(row1,row2)
13   
14   data2.1=data.frame(data2[(data2[[id]] %in% inter.row),!(colnames(data2) %in% inter.col)],
15                      data2[(data2[[id]] %in% inter.row),id])
16   colnames(data2.1)=c(col2[-match(inter.col,col2)],id)
17   data2.2=data2[!(data2[[id]] %in% inter.row),]
18   
19   #merge sub-data1
20   want1.1=merge(data1,data2.1,by=id,all.x=T)
21   
22   #merge sub-data2
23   ##fill all the uncommon columns with NA
24   want1.1.na=colnames(data2.2)[-match(intersect(colnames(want1.1),colnames(data2)),colnames(data2.2))]
25   data2.2.na=colnames(want1.1)[-match(intersect(colnames(want1.1),colnames(data2)),colnames(want1.1))]
26   if (length(want1.1.na)>=1){
27     for (i in 1:length(want1.1.na)){
28       want1.1[[want1.1.na[i]]]=NA
29     }
30   }
31   if (length(data2.2.na)>=1){
32     for (j in 1:length(data2.2.na)){
33       data2.2[[data2.2.na[[j]]]]=NA
34     } 
35   }
36   want=rbind(want1.1,data2.2)
37 
38 
39   print(data2.2.na)
40   print(want1.1.na)
41   return(want)
42 }
43 
44 
45 #testing
46 test=data.frame(         id=c(paste0("a",seq(100))),
47                          x1=rnorm(100,mean=0,sd=1),
48                          x2=rexp(100,rate=1),
49                          x3=rpois(100,lambda=50)
50 )
51 test1=test[1:50,c("id","x1")]
52 test2=test[30:100,c("id","x2")]
53 
54 
55       id           x1         x2
56 1     a1  0.133454784         NA
57 2    a10 -0.240987935         NA
58 3    a11 -0.898845789         NA
59 ...
60 23    a3 -2.340300232         NA
61 24   a30  0.083681099 0.06912847
62 25   a31  0.750677968 2.64913789
63 26   a32 -0.184669801 1.16523179
64 ...

 

by yant07

posted @ 2017-10-29 21:38  yant07  阅读(315)  评论(0编辑  收藏  举报