章节
5.1 Adding Labels to Your Variables 71
5.2 Using Formats to Enhance Your Output 73
5.3 Regrouping Values Using Formats 76
5.4 More on Format Ranges 78
5.5 Storing Your Formats in a Format Library 79
5.6 Permanent Data Set Attributes 80
5.7 Accessing a Permanent SAS Data Set with User-Defined Formats 82
5.8 Displaying Your Format Definitions 83
5.1 Adding Labels to Your Variables
用 LABEL 语句给给变量创建 label. Labels 长度不超过 256 个字符.
libname learn 'c:\books\learning'; data learn.test_scores; length ID $ 3 Name $ 15; input ID $ Score1-Score3; label ID = 'Student ID' Score1 = 'Math Score' Score2 = 'Science Score' Score3 = 'English Score'; datalines; 1 90 95 98 2 78 77 75 3 88 91 92 ;
在 print 上述结果时,label 也会输出出来:the results of PROC MEANS
如果在 DATA 步中添加 label,则 label 会一直对相应的变量有效;如果在 PROC 步中添加 label, 则 lebel 只对该 procedure 有效。
5.2 Using Formats to Enhance Your Output
使用 PROC FORMAT 创建自定义格式。用 Format_name. 应用格式。
图1 Raw data format
/*创建格式*/
proc format; value $gender 'M' = 'Male' 'F' = 'Female' ' ' = 'Not entered' other = 'Miscoded'; value age low-29 = 'Less than 30' 30-50 = '30 to 50' 51-high = '51+'; value $likert '1' = 'Strongly disagree' '2' = 'Disagree' '3' = 'No opinion' '4' = 'Agree' '5' = 'Strongly agree'; run;
VALUE 语句创建自定义格式,对于字符型变量,前面加 $ 符号。第一个创建的 format 是 $gender, format name 可以是任意名字(不超过8个字符),这里叫做 gender 只是便于记忆。数据中,Gender 的值是以 M, F 存储的,format $gender 让 M 展示为 Male, F 展示为 Female, 缺失值展示为 Not entered. 关键词 other 使得除 M,F, missing value 以外的值展示为 Miscoded.
第二个创建的 format 是 age. 它聚合了不同的年龄值并显示为3组。关键词 LOW and HIGH 分别表示年龄最小的非缺失和最大的非缺失值。[the keywords LOW and HIGH refer to the lowest nonmissing value and the highest value, respectively]
第三个创建的 format 是 $likert, 分别将 1-5 表示为5个字符串。
Format 并不改变数据存储的原始值,只改变输出格式。
在 PROC PRINT 中使用上面创建的 format :
title "Data Set SURVEY with Formatted Values"; proc print data=learn.survey; id ID; var Gender Age Salary Ques1-Ques5; format Gender $gender. /*每个format后面加句点*/ Age age. Ques1-Ques5 $likert. Salary dollar11.2; run;
Salary 的格式 DOLLAR11.2 是SAS 内置格式,表示美元数字并一共11位(包括2位小数点)。The largest value for Salary using the DOLLAR11.2 format would be: $999,999.99
在不确定实际数据最大的数值占位多少时,将格式的最长保留长度写大一点,避免因为长度不够而发生值的截取。
在 PROC PRINT 中使用 ID 语句会使变量在 Output 中作为第一列出现,取代SAS默认的 obs 列作为第一列。不能将 ID var 同时放在 VAR var,这会使 var 出现两次。一般地,如果数据中有可以作为ID的变量,则建议对该变量使用 ID statement 。
图2 Output using format
5.3 Regrouping Values Using Formats
此外,还可以用 PROC FORMAT 对原始值进行分组展示。
proc format; value $three '1','2' = 'Disagreement' '3' = 'No opinion' '4','5' = 'Agreement'; run; proc freq data=learn.survey; title "Question Frequencies Using the $three Format"; tables Ques1-Ques5; format Ques1-Ques5 $three.; run;
5.4 More on Format Ranges
FORMAT 有很多种灵活的用法。例如:
Consider that you have a variable called Grade with values of A, B, C, D, F, I, and W. The following VALUE statement creates a format that places these grades into six categories:
value $gradefmt 'A' – 'C' = 'Passing' 'D' = 'Borderline' 'F' = 'Failing' 'I','W' = 'Incomplete or withdrew' ' ' = 'Not recorded' other = 'Miscoded';
在5.2 code 中,如果年龄不是整数,则需要定义年龄区间:
value age low-<30 = 'Less than 30' <30 30-<51 = '30 to less than 51' [30,51) 51-high = '51+'; [51,high] value age low-30 = 'Less than or equal to 30' [low,30] 30<-51 = 'Greater than 30 to 51' (30,51] 51<-high = 'Greater than 51'; (51,high]
5.5-5.7 Storing Your Formats in a Format Library
FORMAT 可以保存起来以便永久调用,方法如下:
1. Create a library reference (libref) to indicate where you want to store your SAS formats. This can be the same library where you store your data sets.
2. Use the option LIBRARY=libref when you run PROC FORMAT. (Remember, you have to run this procedure only once.)
libname myfmts 'c:\books\learning\formats'; proc format library=myfmts; value $gender 'M' = 'Male' 'F' = 'Female' ' ' = 'Not entered' other = 'Miscoded'; value age low-29 = 'Less than 30' 30-50 = '30 to 50' 51-high = '51+'; value $likert '1' = 'Strongly disagree' '2' = 'Disagree' '3' = 'No opinion' '4' = 'Agree' '5' = 'Strongly agree'; run;
一般地,SAS 默认查找内置 format, or format in WORK library, or format in a special name library,如果要对 dataset 应用之前自定义的 format, 必须要告诉 SAS 去哪里找到这个 format,用FMTSEARCH= option. For example, if you want to use the formats you placed in the Myfmts library, you would need to submit the following code:
options fmtsearch=(myfmts);
If you do this, SAS first looks in the Work library, then the library called Library, and then the Myfmts library. If you want SAS to look in the Myfmts library before it looks in either of the other two libraries, you can name them on the FMTSEARCH statement like this:
options fmtsearch=(myfmts work library);
libname learn 'c:\books\learning'; libname myfmts 'c:\books\learning\formats'; options fmtsearch=(myfmts); data learn.survey; infile 'c:\books\learning\survey.txt' pad; input ID : $3. Gender : $1. Age Salary (Ques1-Ques5)(: $1.); format Gender $gender. Age age. Ques1-Ques5 $likert. Salary dollar10.0; label ID = 'Subject ID' Gender = 'Gender' Age = 'Age as of 1/1/2006' Salary = 'Yearly Salary' Ques1 = 'The governor is doing a good job?' Ques2 = 'The property tax should be lowered' Ques3 = 'Guns should be banned' Ques4 = 'Expand the Green Acre program' Ques5 = 'The school needs to be expanded'; run;
用 PROC CONTENTS 列出来数据所用的所有 format.
title "Data set SURVEY"; proc contents data=learn.survey varnum; run;
类似地,如果要调用一个已经使用了自定义 format 的 dataset,则必须首先加载这个 format 的路径. 如果你给了别人一份这个数据的 copy, 也要把这个 format 及它所在的 library copy 一份给他。例子:
libname learn 'c:\books\learning'; libname myfmts 'c:\books\learning\formats'; options fmtsearch=(myfmts); /*error if missing this statement because data learn.survey use the user-defined format myfmts */ title "Using User-defined Formats"; proc freq data=learn.survey; tables Ques1-Ques5 / nocum; run;
Once you submit the FMTSEARCH= option, you can use your own formats just as if they were built-in SAS formats.