项目遇到非4的倍数的sample size,百度知网均没有找到现成的‘轮子’,尝试自己造。几名统计师讨论之后得到的结论,望大家批评指正。
首先,常规的随机区组使用 proc plan,没有异议。但是只能在factors 设置相同长度的区组长度,如24例受试者,factors block=6 length=4,
cite:SAS proc plan的内部逻辑为先随机出6个区组,后给每个区组出4个随机数
(
The procedure first generates a random permutation of the integers 1 to 4 and then, for each of these, generates a random permutation of the integers 1 to 3. You can think of factor Two
as being nested within factor One
, where the levels of factor One
are to be randomly assigned to 4 units.
)
本例逻辑是基于完全随机思想,id=1 2 3 4 5 ...,rand=uniform(seed),
见 code
/*上文第3步*/
举例:若sample size为30,逻辑为:
- 设置区组长度length为 4 4 4 4 4 4 6
- 给length随机排序,eg: 4 4 4 4 6 4 4
- 创建block_ID,如第二区组中的四个受试者为24 24 24 24 ,第五个区组的受试者为56 56 56 56 56 56(此步为了方便后面proc rank 的by statement)
- 给每个受试者一个随机数
- 在区组内排序、分组
code如下:
data core;
input blockl blockID;
cards;
4 1
4 2
4 3
4 4
4 5
4 6
6 7
;/*保证sum(blockl)=30*/
run;
%macro rd_blk;
/*先区组随机排序*/
data random;
set core;
rand=uniform(2020);
output;
run;
proc rank data=random out=rank;
var rand;
ranks r_rank;
run;
proc sort data=rank out=result;
by r_rank;
run;
/*上文第3步*/
data core1;
set result;
do i=1 to blockl;
lent=blockl;
block_id=input(compress(r_rank)||compress(lent),8.);
output;
end;
run;
data random1;
set core1;
rand1=uniform(111);
run;
proc rank data=random1 out=rank1;
by block_id ;
var rand1;
ranks r_rank1;
run;
%mend rd_blk;
%rd_blk;
data result1(keep=blockID r_rank1 SUBJid g);
set rank1;
retain SUBJid 100;
SUBJid+1;
id=input(compress(lent)||compress(r_rank1),8.);
if id in (41 42 61 62 63) then g='A(T-R)';
if id in (43 44 64 65 66) then g='B(R-T)';
label blockID='区组号' r_rank1='随机数' SUBJid='受试者编号' g='组别';
run;
proc sql;
select count(*) as numA from result1
where g='A';
quit;
proc sql;
select count(*) as numB from result1
where g='B';
quit;
其中需要自定义的有:区组长度和等于sample size、种子数、最后的分组,若blockl=8,
if id in (81 82 83 84) then g='A';
if id in (85 86 87 88) then g='B';