R for Data Science
R语言实战 学习笔记
chapter 1 R语言介绍
1.1 帮助函数
函数 |
功能 |
help.start() |
打开帮助文档首页 |
help("foo")或?foo |
查看函数foo的帮助(引号可以省略) |
help.search("foo")或??foo |
以foo未关键词搜索本地帮助文档 |
example("foo") |
函数foo的使用实例(引号可以省略) |
RSiteSearch("foo") |
以foo为关键词搜索在线文档和邮件列表存档 |
apropos("foo",mode="function") |
列出名称中含有foo的所有可用函数 |
data() |
列出当前已加载包中所有可用示例数据集 |
vignette() |
列出当前已安装包中所有可用的vignette文档 |
vignette("foo") |
为主题foo显示指定的vignette文档 |
1.2 管理R工作空间的函数
函数 |
功能 |
getwd() |
显示当前的工作目录 |
setwd("mydirectory") |
修改当前的工作目录为mydirectory |
ls() |
列出当前工作空间中的对象 |
rm(objectlist) |
移除(删除)一个或多个对象 |
help(options) |
显示可用选项的说明 |
options() |
显示或设置当前选项 |
history(#) |
显示最近使用过的#个命令(默认值为25) |
savehistory("myfile") |
保存命令历史到文件myfile中(默认值为.Rhistory) |
loadhistory("myfile") |
载入一个命令历史文件(默认值为.Rhistory) |
save.image("myfile") |
保存工作空间到文件myfile中(默认值为.RData) |
save(objectlist, file="myfile") |
保存指定对象到一个文件中 |
load("myfile") |
读取一个工作空间到当前会话中(默认值为.RData) |
q() |
退出R。将会询问你是否保存工作空间 |
1.3 输入和输出
输入
source("filename")可在当前会话中执行一个脚本
文本输出
sink("filename")将输出重定向到文件filename,append=TRUE,可追加;split=TRUE可将输出同时发送到屏幕和输出文件中;不加参数调用仅向屏幕返回输出结果。
图形输出
函数 |
功能 |
bmp("filename.bmp") |
BMP文件 |
jpeg("filename.jpg") |
JPEG文件 |
pdf("filename.pdf") |
PDF文件 |
png("filename.png") |
PNG文件 |
postscript("filename.ps") |
PostScript文件 |
svg("filename.svg") |
SVG文件 |
win.metafile("filename.wmf") |
Windows图元文件 |
1.4 包
R语言包下载
| |
| install.packages() |
| |
| update.packages() |
| |
| installed.packages() |
| |
| library() |
| |
| |
| lm(mpg~wt, data=mtcars) |
| |
| lmfit <- lm(mpg~wt, data=mtcars) |
| |
| summary(lmfit) |
| |
| plot(lmfit) |
chapter 2 创建数据集
| |
| y <- matrix(1:20, nrow=5, ncol=4) |
| |
| cells <- c(1,26,24,68) |
| rnames <- c("R1", "R2") |
| cnames <- c("C1", "C2") |
| |
| mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE, dimnames=list(rnames, cnames)) |
| |
| |
| mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=FALSE, dimnames=list(rnames, cnames)) |
| |
| |
| dim1 <- c("A1", "A2") |
| dim2 <- c("B1", "B2", "B3") |
| dim3 <- c("C1", "C2", "C3", "C4") |
| z <- array(1:24, c(2, 3, 4), dimnames=list(dim1, dim2, dim3)) |
| |
| |
| patientID <- c(1, 2, 3, 4) |
| age <- c(25, 34, 28, 52) |
| diabetes <- c("Type1", "Type2", "Type1", "Type1") |
| status <- c("Poor", "Improved", "Excellent", "Poor") |
| patientdata <- data.frame(patientID, age, diabetes, status) |
| |
| |
| patientdata[1:2] |
| patientdata[c("diabetes","status")] |
| |
| |
| patientdata$age |
| |
| |
| table(patientdata$diabetes, patientdata$status) |
| |
| |
| summary(mtcars$mpg) |
| |
| plot(mtcars$mpg, mtcars$disp) |
| |
| plot(mtcars$mpg, mtcars$wt) |
| |
| |
| |
| |
| |
| attach(mtcars) |
| summary(mpg) |
| plot(mpg, disp) |
| plot(mpg, wt) |
| detach(mtcars) |
| |
| |
| with(mtcars, { |
| print(summary(mpg)) |
| plot(mpg, disp) |
| plot(mpg, wt) |
| }) |
变量可归结为名义型、有序型或连续型变量。
- 名义型变量是没有顺序之分的类别变量,eg,patientdata$diabetes。
- 有序型变量表示一种顺序关系,而非数量关系,eg,patientdata$status。
- 连续型变量可以呈现为某个范围内的任意值,并同时表示了顺序和数量,eg,patientdata$age。
类别(名义型)变量和有序类别(有序型)变量在R中称为因子(factor)
| diabetes <- c("Type1", "Type2", "Type1", "Type1") |
| status <- c("Poor", "Improved", "Excellent", "Poor") |
| |
| |
| status <- factor(status, ordered=TRUE) |
| |
| status <- factor(status, order=TRUE, levels=c("Poor", "Improved", "Excellent")) |
| |
| |
| g <- "My First List" |
| h <- c(25, 26, 18, 39) |
| j <- matrix(1:10, nrow=5) |
| k <- c("one", "two", "three") |
| mylist <- list(title=g, ages=h, j, k) |
| |
| |
| |
| |
| mydata <- data.frame(age=numeric(0), gender=character(0), weight=numeric(0)) |
| |
| mydata <- edit(mydata) |
| |
| mydatatxt <- " |
| age gender weight |
| 25 m 166 |
| 30 f 115 |
| 18 f 120 |
| " |
| mydata <- read.table(header=TRUE, text=mydatatxt) |
| |
| |
2.1 函数read.table()的选项
选项 |
描述 |
header |
一个表示文件是否在第一行包含了变量名的逻辑型变量 |
sep |
分开数据值的分隔符。默认是 sep="",这表示了一个或多个空格、制表符、换行或回车。使用 sep=","来读取用逗号来分隔行内数据的文件,使用 sep="\t"来读取使用制表符来分割行内数据的文件 |
row.names |
一个用于指定一个或多个行标记符的可选参数 |
col.names |
如果数据文件的第一行不包括变量名(header=FASLE),你可以用 col.names 去指定一个包含变量名的字符向量。如果 header=FALSE 以及 col.names 选项被省略了,变量会被分别命名为 V1、V2,以此类推 |
na.strings |
可选的用于表示缺失值的字符向量。比如说,na.strings=c("-9", "?")把-9 和?值在读取数据的时候转换成 NA |
colClasses |
可选的分配到每一列的类向量。比如说,colClasses=c("numeric", "numeric", "character", "NULL", "numeric")把前两列读取为数值型变量,把第三列读取为字符型向量,跳过第四列,把第五列读取为数值型向量。如果数据有多余五列,colClasses 的值会被循环。当你在读取大型文本文件的时候,加上 colClasses 选项可以可观地提升处理的速度 |
quote |
用于对有特殊字符的字符串划定界限的字符串。默认值是双引号(")或单引号(') |
skip |
读取数据前跳过的行的数目。这个选项在跳过头注释的时候比较有用 |
stringAsFactors |
一个逻辑变量,标记处字符向量是否需要转化成因子。默认值是 TRUE,除非它被 colClases所覆盖。当你在处理大型文本文件的时候,设置成 stringsAsFactors=FALSE 可以提升处理速度 |
text |
一个指定文字进行处理的字符串。如果 text 被设置了,file 应该被留空。 |
| |
| StudentID,First,Last,Math,Science,Social Studies |
| 011,Bob,Smith,90,80,67 |
| 012,Jane,Weary,75,,80 |
| 010,Dan,"Thornton, III",65,75,70 |
| 040,Mary,"O'Leary",90,95,92 |
| |
| |
| grades <- read.table("studentgrades.csv", header=TRUE, row.names="StudentID", sep=",") |
| |
| grades <- read.table("studentgrades.csv", header=TRUE, row.names="StudentID", sep=",", colClasses=c("character", "character", "character", "numeric", "numeric", "numeric")) |
| |
2.2 使用连接到导入数据
例如,函数file(),gzfile(),bzfile(),xzfile(),unz(),url()
2.3 导入数据
需要xlsx、xlsxjars、rJava包
==>安装存在问题(to be continued)
XML的相关文档
readLines()
grep()
gsub()
RCurl包和XML包
Programming With R
twitterR获取Twitter数据
Rfacebook获取Facebook数据
Rflickr获取Flicker数据
Web Technologies and Services全面的列表
使用foregign包的read.spss();使用Hmisc包的spss.get()
| install.packages("Hmisc") |
| library(Hmisc) |
| mydataframe <- spss.get("mydata.sav", use.value.labels=TRUE) |
使用foregign包的read.ssd();使用Hmisc包的sas.get();使用sas7dbat包的read.sas7bdat();SAS中的sas.get()
| |
| library(Hmisc) |
| datadir <- "C:/mydata" |
| sasexe <- "C:/Program Files/SASHome/SASFoundation/9.4/sas.exe" |
| mydata <- sas.get(libraryName=datadir, member="clients", sasprog=sasexe) |
| library(foreign) |
| mydataframe <- read.dta("mydata.dta") |
| library(ncdf) |
| nc <- nc_open("mynetCDFfile") |
| myarray <- get.var.ncdf(nc, myvar) |
| source("http://bioconductor.org/biocLite.R") |
| biocLite("rhdf5") |
- 9 访问数据库管理系统
| library(RODBC) |
| myconn <-odbcConnect("mydsn", uid="Rob", pwd="aardvark") |
| crimedat <- sqlFetch(myconn, Crime) |
| pundat <- sqlQuery(myconn, "select * from Punishment") |
| close(myconn) |
- DBI相关包
RMySQL\ROracle\RPOSTgreSQL\RSQLite
函数 |
描述 |
odbcConnect(dsn,uid="",pwd="") |
建立一个到 ODBC 数据库的连接 |
sqlFetch(channel,sqltable) |
读取 ODBC数据库中的某个表到一个数据框中 |
sqlQuery(channel,query) |
向 ODBC 数据库提交一个查询并返回结果 |
sqlSave(channel,mydf,tablename=sqtable,append=FALSE) |
将数据框写入或更新(append=TRUE)到 ODBC数据库的某个表中 |
sqlDrop(channel,sqtable) |
删除 ODBC 数据库中的某个表 |
close(channel) |
关闭连接 |
2.4 数据集的标注
| names(patientdata)[2] <- "Age at hospitalization (in years)" |
| patientdata$gender <- factor(patientdata$gender, levels = c(1,2), labels = c("male", "female")) |
2.5 处理数据对象的使用函数
函数 |
用途 |
length(object) |
显示对象中元素/成分的数量 |
dim(object) |
显示某个对象的维度 |
str(object) |
显示某个对象的结构 |
class(object) |
显示某个对象的类或类型 |
mode(object) |
显示某个对象的模式 |
names(object) |
显示某对象中各成分的名称 |
c(object, object,...) |
将对象合并入一个向量 |
cbind(object, object, ...) |
按列合并对象 |
rbind(object, object, ...) |
按行合并对象 |
object |
输出某个对象 |
head(object) |
列出某个对象的开始六行 |
tail(object) |
列出某个对象的最后六行 |
ls() |
显示当前的对象列表 |
rm(object, object, ...) |
删除一个或更多个对象。语句 rm(list = ls())将删除当前工作环境中的几乎所有对象 |
newobject <- edit(object) |
编辑对象并另存为 newobject |
fix(object) |
直接编辑对象 |
chapter 3 图形初阶
| |
| attach(mtcars) |
| plot(wt, mpg) |
| abline(lm(mpg~wt)) |
| title("Regression of MPG on Weight") |
| detach(mtcars) |
| |
| |
| pdf("mygraph.pdf") |
| attach(mtcars) |
| plot(wt, mpg) |
| abline(lm(mpg~wt)) |
| title("Regression of MPG on Weight") |
| detach(mtcars) |
| dev.off() |
| |
| |
| dev.new() |
| statements to create graph 1 |
| dev.new() |
| statements to create a graph 2 |
| etc. |
| |
| |
| |
| dose <- c(20, 30, 40, 45, 60) |
| drugA <- c(16, 20, 27, 40, 60) |
| drugB <- c(15, 18, 25, 31, 40) |
| plot(dose, drugA, type="b") |
| |
| |
| |
| opar <- par(no.readonly=TRUE) |
| par(lty=2, pch=17) |
| plot(dose, drugA, type="b") |
| par(opar) |
3.1 图例符号和线条
参数 |
描述 |
图例 |
pch |
指定绘制点时使用的符号 |
data:image/s3,"s3://crabby-images/0302d/0302d1d777913c7930c7ed404cde6f136e0c94ff" alt="image" |
cex |
指定符号的大小。cex 是一个数值,表示绘图符号相对于默认大小的缩放倍数。默认大小为 1,1.5 表示放大为默认值的 1.5 倍,0.5 表示缩小为默认值的 50%,等等 |
|
lty |
指定线条类型 |
data:image/s3,"s3://crabby-images/45de3/45de37f540e97f1a0422606942a0b02d2bbdb928" alt="image" |
lwd |
指定线条宽度。lwd 是以默认值的相对大小来表示的(默认值为 1)。例如,lwd=2 将生成一条两倍于默认宽度的线条 |
|
3.2 图形颜色
参数 |
描述 |
col |
默认的绘图颜色。某些函数(如 lines 和 pie)可以接受一个含有颜色值的向量并自动循环使用。例如,如果设定 col=c("red", "blue")并需要绘制三条线,则第一条线将为红色,第二条线为蓝色,第三条线又将为红色 |
col.axis |
坐标轴刻度文字的颜色 |
col.lab |
坐标轴标签(名称)的颜色 |
col.main |
标题颜色 |
col.sub |
副标题颜色 |
fg |
图形的前景色 |
bg |
图形的背景色 |
创建连续型颜色向量的函数,包括rainbow()、heat.colors()、terrain.colors()、topo.colors()以及cm.colors()
| install.packages("RColorBrewer") |
| library(RColorBrewer) |
| n <- 7 |
| mycolors <- brewer.pal(n, "Set1") |
| barplot(rep(1,n), col=mycolors) |
| |
| |
| brewer.pal.info |
| display.brewer.all() |
| |
| n <- 10 |
| mycolors <- rainbow(n) |
| pie(rep(1, n), labels=mycolors, col=mycolors) |
| mygrays <- gray(0:n/n) |
| pie(rep(1, n), labels=mygrays, col=mygrays) |
3.3 文本属性
参数 |
描述 |
cex |
表示相对于默认大小缩放倍数的数值。默认大小为 1,1.5 表示放大为默认值的 1.5 倍,0.5 表示缩小为默认值的 50%,等等 |
cex.axis |
坐标轴刻度文字的缩放倍数。类似于 cex |
cex.lab |
坐标轴标签(名称)的缩放倍数。类似于 cex |
cex.main |
标题的缩放倍数。类似于 cex |
cex.sub |
副标题的缩放倍数。类似于 cex |
font |
整数。用于指定绘图使用的字体样式。1=常规,2=粗体,3=斜体,4=粗斜体,5=符号字体(以 Adobe符号编码表示) |
font.axis |
坐标轴刻度文字的字体样式 |
font.lab |
坐标轴标签(名称)的字体样式 |
font.main |
标题的字体样式 |
font.sub |
副标题的字体样式 |
ps |
字体磅值(1 磅约为 1/72 英寸)。文本的最终大小为 ps × cex |
family |
绘制文本时使用的字体族。标准的取值为 serif(衬线)、sans(无衬线)和 mono(等宽) |
pin |
以英寸表示的图形尺寸(宽和高) |
mai |
以数值向量表示的边界大小,顺序为“下、左、上、右”,单位为英寸 |
mar |
以数值向量表示的边界大小,顺序为“下、左、上、右”,单位为英分①。默认值为 c(5, 4, 4, 2) + 0.1 |
| par(font.lab=3, cex.lab=1.5, font.main=4, cex.main=2) |
| |
| windowsFonts( |
| A=windowsFont("Arial Black"), |
| B=windowsFont("Bookman Old Style"), |
| C=windowsFont("Comic Sans MS") |
| ) |
| |
| opar <- par(no.readonly=TRUE) |
| par(pin=c(2, 3)) |
| par(lwd=2, cex=1.5) |
| par(cex.axis=.75, font.axis=3) |
| plot(dose, drugA, type="b", pch=19, lty=2, col="red") |
| plot(dose, drugB, type="b", pch=23, lty=6, col="blue", bg="green") |
| par(opar) |
| |
| |
| |
| plot(dose, drugA, type="b", col="red", lty=2, pch=2, lwd=2, main="Clinical Trials for Drug A", sub="This is hypothetical data", xlab="Dosage", ylab="Drug Response", xlim=c(0, 60), ylim=c(0, 70)) |
| |
| |
| title(main="main title", sub="subtitle", xlab="x-axis label", ylab="y-axis label") |
| |
| |
| |
选项 |
描述 |
side |
一个整数,表示在图形的哪边绘制坐标轴(1=下,2=左,3=上,4=右) |
at |
一个数值型向量,表示需要绘制刻度线的位置 |
labels |
一个字符型向量,表示置于刻度线旁边的文字标签(如果为 NULL,则将直接使用 at 中的值) |
pos |
坐标轴线绘制位置的坐标(即与另一条坐标轴相交位置的值) |
lty |
线条类型 |
col |
线条和刻度线颜色 |
las |
标签是否平行于(=0)或垂直于(=2)坐标轴 |
tck |
刻度线的长度,以相对于绘图区域大小的分数表示(负值表示在图形外侧,正值表示在图形内侧,0表示禁用刻度,1 表示绘制网格线);默认值为–0.01 |
| |
| |
| |
| |
| |
| x <- c(1:10) |
| y <- x |
| z <- 10/x |
| opar <- par(no.readonly=TRUE) |
| |
| |
| par(mar=c(5, 4, 4, 8) + 0.1) |
| |
| plot(x, y, type="b", |
| pch=21, col="red", |
| yaxt="n", lty=3, ann=FALSE) |
| |
| lines(x, z, type="b", pch=22, col="blue", lty=2) |
| |
| |
| axis(2, at=x, labels=x, col.axis="red", las=2) |
| axis(4, at=z, labels=round(z, digits=2), col.axis="blue", las=2, cex.axis=0.7, tck=-.01) |
| |
| mtext("y=1/x", side=4, line=3, cex.lab=1, las=2, col="blue") |
| title("An Example of Creative Axes",xlab="X values", ylab="Y=X") |
| par(opar) |
| |
| |
| library(Hmisc) |
| minor.tick(nx=n, ny=n, tick.ratio=n) |
| |
| minor.tick(nx=2, ny=3, tick.ratio=0.5) |
| |
| |
| |
| abline(h=c(1,5,7)) |
| |
| abline(v=seq(1, 10, 2), lty=2, col="blue") |
| |
| |
选项 |
描述 |
location |
有许多方式可以指定图例的位置。你可以直接给定图例左上角的 x、y 坐标,也可以执行 locator(1),然后通过鼠标单击给出图例的位置,还可以使用关键字 bottom、bottomleft、left、topleft、top、topright、right、bottomright 或 center 放置图例。如果你使用了以上某个关键字,那么可以同时使用参数 inset=指定图例向图形内侧移动的大小(以绘图区域大小的分数表示) |
title |
图例标题的字符串(可选) |
legend |
图例标签组成的字符型向量 |
... |
其他选项。如果图例标示的是颜色不同的线条,需要指定 col=加上颜色值组成的向量。如果图例标示的是符号不同的点,则需指定 pch=加上符号的代码组成的向量。如果图例标示的是不同的线条宽度或线条类型,请使用 lwd=或 lty=加上宽度值或类型值组成的向量。要为图例创建颜色填充的盒形(常见于条形图、箱线图或饼图),需要使用参数 fill=加上颜色值组成的向量 |
| dose <- c(20, 30, 40, 45, 60) |
| drugA <- c(16, 20, 27, 40, 60) |
| drugB <- c(15, 18, 25, 31, 40) |
| opar <- par(no.readonly=TRUE) |
| |
| par(lwd=2, cex=1.5, font.lab=2) |
| |
| |
| plot(dose, drugA, type="b", |
| pch=15, lty=1, col="red", ylim=c(0, 60), |
| main="Drug A vs. Drug B", |
| xlab="Drug Dosage", ylab="Drug Response") |
| lines(dose, drugB, type="b", pch=17, lty=2, col="blue") |
| abline(h=c(30), lwd=1.5, lty=2, col="gray") |
| |
| |
| minor.tick(nx=3, ny=3, tick.ratio=0.5) |
| |
| legend("topleft", inset=.05, title="Drug Type", c("A","B"), lty=c(1, 2), pch=c(15, 17), col=c("red", "blue")) |
| par(opar) |
| |
| |
| |
| |
| |
选项 |
描述 |
location |
文本的位置参数。可为一对 x、y 坐标,也可通过指定 location 为 locator(1)使用鼠标交互式地确定摆放位置 |
pos |
文本相对于位置参数的方位。1=下,2=左,3=上,4=右。如果指定了 pos,就可以同时指定参数 offset=作为偏移量,以相对于单个字符宽度的比例表示 |
side |
指定用来放置文本的边。1=下,2=左,3=上,4=右。你可以指定参数 line=来内移或外移文本,随着值的增加,文本将外移。也可使用 adj=0 将文本向左下对齐,或使用 adj=1 右上对齐 |
| attach(mtcars) |
| plot(wt, mpg, main="Mileage vs. Car Weight", xlab="Weight", ylab="Mileage", pch=18, col="blue") |
| text(wt, mpg, row.names(mtcars), cex=0.6, pos=4, col="red") |
| detach(mtcars) |
| |
| |
| opar <- par(no.readonly=TRUE) |
| par(cex=1.5) |
| plot(1:7,1:7,type="n") |
| text(3,3,"Example of default text") |
| text(4,4,family="mono","Example of mono-spaced text") |
| text(5,5,family="serif","Example of serif text") |
| par(opar) |
| |
| |
| |
| attach(mtcars) |
| opar <- par(no.readonly=TRUE) |
| par(mfrow=c(2,2)) |
| plot(wt,mpg, main="Scatterplot of wt vs. mpg") |
| plot(wt,disp, main="Scatterplot of wt vs. disp") |
| hist(wt, main="Histogram of wt") |
| boxplot(wt, main="Boxplot of wt") |
| par(opar) |
| detach(mtcars) |
| |
| |
| attach(mtcars) |
| opar <- par(no.readonly=TRUE) |
| par(mfrow=c(3,1)) |
| hist(wt) |
| hist(mpg) |
| hist(disp) |
| par(opar) |
| detach(mtcars) |
| |
| |
| attach(mtcars) |
| layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE)) |
| hist(wt) |
| hist(mpg) |
| hist(disp) |
| detach(mtcars) |
| |
| |
| attach(mtcars) |
| layout(matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE), |
| widths=c(3, 1), heights=c(1, 2)) |
| hist(wt) |
| hist(mpg) |
| hist(disp) |
| detach(mtcars) |
| |
| |
| opar <- par(no.readonly=TRUE) |
| par(fig=c(0, 0.8, 0, 0.8)) |
| plot(mtcars$wt, mtcars$mpg, |
| xlab="Miles Per Gallon", |
| ylab="Car Weight") |
| par(fig=c(0, 0.8, 0.55, 1), new=TRUE) |
| boxplot(mtcars$wt, horizontal=TRUE, axes=FALSE) |
| par(fig=c(0.65, 1, 0, 0.8), new=TRUE) |
| boxplot(mtcars$mpg, axes=FALSE) |
| mtext("Enhanced Scatterplot", side=3, outer=TRUE, line=-3) |
| par(opar) |
| |
chapter 4 基本数据管理
| |
| manager <- c(1, 2, 3, 4, 5) |
| date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09") |
| country <- c("US", "US", "UK", "UK", "UK") |
| gender <- c("M", "F", "F", "M", "F") |
| age <- c(32, 45, 25, 39, 99) |
| q1 <- c(5, 3, 3, 3, 2) |
| q2 <- c(4, 5, 5, 3, 2) |
| q3 <- c(5, 2, 5, 4, 1) |
| q4 <- c(5, 5, 5, NA, 2) |
| q5 <- c(5, 5, 2, NA, 1) |
| leadership <- data.frame(manager, date, country, gender, age, |
| q1, q2, q3, q4, q5, stringsAsFactors=FALSE) |
| |
| |
| |
| mydata<-data.frame(x1 = c(2, 2, 6, 4), |
| x2 = c(3, 4, 2, 8)) |
| mydata$sumx <- mydata$x1 + mydata$x2 |
| mydata$meanx <- (mydata$x1 + mydata$x2)/2 |
| |
| attach(mydata) |
| mydata$sumx <- x1 + x2 |
| mydata$meanx <- (x1 + x2)/2 |
| detach(mydata) |
| |
| mydata <- transform(mydata, sumx = x1 + x2, meanx = (x1 + x2)/2) |
| |
| |
| leadership$age[leadership$age == 99] <- NA |
| |
| leadership$agecat[leadership$age > 75] <- "Elder" |
| leadership$agecat[leadership$age >= 55 & |
| leadership$age <= 75] <- "Middle Aged" |
| leadership$agecat[leadership$age < 55] <- "Young" |
| |
| leadership <- within(leadership,{ |
| agecat <- NA |
| agecat[age > 75] <- "Elder" |
| agecat[age >= 55 & age <= 75] <- "Middle Aged" |
| agecat[age < 55] <- "Young" }) |
| |
| |
| |
| names(leadership)[2] <- "testData" |
| names(leadership)[6:10] <- c("item1", "item2", "item3", "item4", "item5") |
| |
| |
| |
| install.packages("plyr") |
| library(plyr) |
| leadership <- rename(leadership, c(manager="managerID", date="testDate")) |
| |
| |
| |
| is.na(leadership[,6:10]) |
| |
| |
| x <- c(1, 2, NA, 3) |
| y <- sum(x, na.rm=TRUE) |
| |
| |
| |
| newdata <- na.omit(leadership) |
| newdata |
日期值
符号 |
含义 |
示例 |
%d |
数字表示的日期(0-31) |
01~31 |
%a |
缩写的星期名 |
Mon |
%A |
非缩写的星期名 |
Monday |
%m |
月份(00~12) |
00~12 |
%b |
缩写的月份 |
Jan |
%B |
非缩写的月份 |
January |
%y |
两位数的年份 |
23 |
%Y |
四位数的年份 |
2023 |
| |
| |
| |
| today <- Sys.Date() |
| format(today, format="%B %d %Y") |
| format(today, format="%A") |
| |
| |
| startdate <- as.Date("2004-02-13") |
| enddate <- as.Date("2011-01-22") |
| days <- enddate - startdate |
| days |
| |
| today <- Sys.Date() |
| dob <- as.Date("1956-10-12") |
| difftime(today, dob, units="weeks") |
| |
| |
| strDates <- as.character(dates) |
类型转换
判断 |
转换 |
is.numeric() |
as.numeric() |
is.character() |
as.character() |
is.vector() |
as.vector() |
is.matrix() |
as.matrix() |
is.data.frame() |
as.data.frame() |
is.factor() |
as.factor() |
is.logical() |
as.logical() |
| |
| |
| attach(leadership) |
| newdata <-leadership[order(gender, -age),] |
| detach(leadership) |
| |
| |
| |
| |
| |
| |
| |
| |
| newdata <- leadership[, c(6:10)] |
| |
| myvars <- c("q1", "q2", "q3", "q4", "q5") |
| newdata <-leadership[myvars] |
| |
| myvars <- paste("q", 1:5, sep="") |
| newdata <-leadership[myvars] |
| |
| |
| |
| myvars <- names(leadership) %in% c("q3", "q4") |
| newdata <- leadership[!myvars] |
| |
| |
| newdata <- subset(leadership, age >= 35 | age < 24, select=c(q1, q2, q3, q4)) |
| |
| |
| |
| |
| |
| mysample <- leadership[sample(1:nrow(leadership), 3, replace=FALSE),] |
| |
| |
| install.packages("sqldf") |
| library(sqldf) |
| |
| newdf <- sqldf("select * from mtcars where carb=1 order by mpg", row.names=TRUE) |
| |
| sqldf("select avg(mpg) as avg_mpg, avg(disp) as avg_disp, gear |
| from mtcars where cyl in (4, 6) group by gear") |
| |
chapter 5 高级数据管理
数学函数
函数 |
描述 |
abs(x) |
绝对值 |
sqrt(x) |
平方根,与x^(0.5)等价 |
ceiling(x) |
不小于x的最小整数 |
floor(x) |
不大于x的最大整数 |
trunc(x) |
向0 的方向截取的 x 中的整数部分 |
round(x,digits=n) |
将 x 舍入为指定位的小数 |
signif(x,digits=n) |
将 x 舍入为指定的有效数字位数 |
cos(x)、sin(x)、tan(x) |
余弦、正弦和正切 |
acos(x)、asin(x)、atant(x) |
反余弦、反正弦和反正切 |
cosh(x)、sinh(x)、tanh(x) |
双曲余弦、双曲正弦和双曲正切 |
acosh(x)、asinh(x)、atanh(x) |
反双曲余弦、反双曲正弦和反双曲正切 |
log(x,base=n) |
对x取以n为底的对数;log(x)自然对数;log10(x)为常用对数 |
exp(x) |
指数函数 |
统计函数
函数 |
描述 |
mean(x) |
平均数 |
median(x) |
中位数 |
sd(x) |
标准差 |
var(x) |
方差 |
mad(x) |
绝对中位差(median absolute deviation) |
quantile(x,probs) |
求分位数。其中 x 为待求分位数的数值型向量,probs 为一个由[0,1]之间的概率值组成的数值向量 |
range(x) |
求值域 |
sum(x) |
求和 |
diff(x, lag=n) |
滞后差分,lag 用以指定滞后几项。默认的 lag 值为 1 |
min(x) |
最小值 |
max(x) |
最大值 |
scale(x,center=TRUE, scale=TRUE) |
为数据对象 x 按列进行中心化(center=TRUE)或标准化(center=TRUE,scale=TRUE); |
| |
| newdata <- scale(mydata) |
| |
| newdata <- scale(mydata)*SD + M |
| |
| newdata <- transform(mydata, myvar = scale(myvar)*10+50) |
概率函数
[dpqr]distribution_abbreviation()
- d = 密度函数(density)
- p = 分布函数(distribution function)
- q = 分位数函数(quantile function)
- r = 生成随机数(随机偏差)
分布名称 |
缩写 |
分部名称 |
缩写 |
Beta 分布 |
beta |
Logistic 分布 |
logis |
二项分布 |
binom |
多项分布 |
multinom |
柯西分布 |
cauchy |
负二项分布 |
nbinom |
(非中心)卡方分布 |
chisq |
正态分布 |
norm |
指数分布 |
exp |
泊松分布 |
pois |
F分布 |
f |
Wilcoxon 符号秩分布 |
signrank |
Gamma 分布 |
gamma |
t 分布 |
t |
几何分布 |
geom |
均匀分布 |
unif |
超几何分布 |
geom |
Weibull 分布 |
weibull |
对数正态分布 |
lnorm |
Wilcoxon 秩和分布 |
wilcox |
| |
| x <- pretty(c(-3,3), 30) |
| y <- dnorm(x) |
| plot(x, y, |
| type = "l", |
| xlab = "Normal Deviate", |
| ylab = "Density", |
| yaxs = "i" |
| ) |
| |
| |
| |
| pnorm(1.96) |
| |
| qnorm(.9, mean=500, sd=100) |
| |
| rnorm(50, mean=50, sd=10) |
| |
| |
| |
| |
| library(MASS) |
| options(digits=3) |
| |
| set.seed(1234) |
| |
| mean <- c(230.7, 146.7, 3.6) |
| sigma <- matrix(c(15360.8, 6721.2, -47.1, |
| 6721.2, 4700.9, -16.5, |
| -47.1, -16.5, 0.3), nrow=3, ncol=3) |
| |
| mydata <- mvrnorm(500, mean, sigma) |
| mydata <- as.data.frame(mydata) |
| names(mydata) <- c("y","x1","x2") |
| |
| |
| dim(mydata) |
| head(mydata, n=10) |
字符处理函数
函数 |
描述 |
nchar(x) |
计算 x 中的字符数量 |
substr(x,start,stop) |
提取或替换一个字符向量中的子串 |
grep(pattern,x,ignore.case=FALSE, fixed=FALSE) |
在 x 中搜索某种模式。若 fixed=FALSE,则 pattern 为一个正则表达式。若fixed=TRUE,则 pattern 为一个文本字符串。返回值为匹配的下标 |
sub(pattern, replacement, x, ignore.case=FALSE, fixed=FALSE) |
在 x 中搜索 pattern,并以文本 replacement 将其替换。若 fixed=FALSE,则pattern 为一个正则表达式。若 fixed=TRUE,则 pattern 为一个文本字符串。 |
strsplit(x, split, fixed=FALSE) |
在 split 处分割字符向量 x 中的元素。若 fixed=FALSE,则 pattern 为一个正则表达式。若 fixed=TRUE,则 pattern 为一个文本字符串 |
paste(…, sep="") |
连接字符串,分隔符为 sep;paste("x", 1:3,sep="")返回值为 c("x1", "x2", "x3") |
toupper(x) |
大写转换 |
tolower(x) |
小写转换 |
length(x) |
对象 x 的长度 |
seq(from, to, by) |
生成一个序列;indices <- seq(1,10,2),indices 的值为 c(1, 3, 5, 7, 9) |
rep(x, n) |
将x 重复 n 次 |
cut(x, n) |
将连续型变量 x 分割为有着 n 个水平的因子,使用选项 ordered_result = TRUE 以创建一个有序型因子 |
pretty(x, n) |
创建美观的分割点。通过选取 n+1 个等间距的取整值,将一个连续型变量 x分割为 n 个区间。 |
cat(... , file ="myfile", append =FALSE) |
连接...中的对象,并将其输出到屏幕上或文件中(如果声明了一个的话) |
\n表示新行,\t为制表符,'为单引号,\b为退格,等等。(键入?Quotes以了解更多。)
| |
| mydata <- matrix(rnorm(30), nrow=6) |
| |
| |
| apply(mydata, 1, mean) |
| |
| apply(mydata, 2, mean) |
| |
| apply(mydata, 2, mean, trim=0.2) |
| |
| |
| |
| options(digits=2) |
| |
| Student <- c("John Davis", "Angela Williams", "Bullwinkle Moose", |
| "David Jones", "Janice Markhammer", "Cheryl Cushing", |
| "Reuven Ytzrhak", "Greg Knox", "Joel England", |
| "Mary Rayburn") |
| Math <- c(502, 600, 412, 358, 495, 512, 410, 625, 573, 522) |
| Science <- c(95, 99, 80, 82, 75, 85, 80, 95, 89, 86) |
| English <- c(25, 22, 18, 15, 20, 28, 15, 30, 27, 18) |
| roster <- data.frame(Student, Math, Science, English, stringsAsFactors=FALSE) |
| |
| z <- scale(roster[,2:4]) |
| |
| score <- apply(z, 1, mean) |
| |
| roster <- cbind(roster, score) |
| |
| y <- quantile(score, c(.8,.6,.4,.2)) |
| |
| roster$grade[score >= y[1]] <- "A" |
| roster$grade[score < y[1] & score >= y[2]] <- "B" |
| roster$grade[score < y[2] & score >= y[3]] <- "C" |
| roster$grade[score < y[3] & score >= y[4]] <- "D" |
| roster$grade[score < y[4]] <- "F" |
| |
| name <- strsplit((roster$Student), " ") |
| |
| Lastname <- sapply(name, "[", 2) |
| Firstname <- sapply(name, "[", 1) |
| |
| roster <- cbind(Firstname,Lastname, roster[,-1]) |
| |
| roster <- roster[order(Lastname,Firstname),] |
控制流
- 语句statement
- 条件cond
- 表达式expr
- 序列seq
| |
| |
| for (i in 1:10) print("Hello") |
| |
| i <- 10 |
| while (i > 0) {print("Hello"); i <- i - 1} |
| |
| |
| |
| if (is.character(grade)) grade <- as.factor(grade) |
| if (!is.factor(grade)) grade <- as.factor(grade) else print("Grade already is a factor") |
| |
| ifelse(score > 0.5, print("Passed"), print("Failed")) |
| outcome <- ifelse (score > 0.5, "Passed", "Failed") |
| |
| feelings <- c("sad", "afraid") |
| for (i in feelings) |
| print( |
| switch(i, |
| happy = "I am glad you are happy", |
| afraid = "There is nothing to fear", |
| sad = "Cheer up", |
| angry = "Calm down now" |
| ) |
| ) |
| |
| |
| mystats <- function(x, parametric=TRUE, print=FALSE) { |
| if (parametric) { |
| center <- mean(x); spread <- sd(x) |
| } else { |
| center <- median(x); spread <- mad(x) |
| } |
| if (print & parametric) { |
| cat("Mean=", center, "\n", "SD=", spread, "\n") |
| } else if (print & !parametric) { |
| cat("Median=", center, "\n", "MAD=", spread, "\n") |
| } |
| result <- list(center=center, spread=spread) |
| return(result) |
| } |
| set.seed(1234) |
| x <- rnorm(500) |
| y <- mystats(x) |
| y <- mystats(x, parametric=FALSE, print=TRUE) |
| |
| mydate <- function(type="long") { |
| switch(type, |
| long = format(Sys.time(), "%A %B %d %Y"), |
| short = format(Sys.time(), "%m-%d-%y"), |
| cat(type, "is not a recognized type\n") |
| ) |
| } |
| mydate("long") |
| mydate("short") |
| mydate() |
| mydate("medium") |
| |
| |
| |
| |
| |
| |
| |
| cars <- mtcars[1:5, 1:4] |
| cars |
| t(cars) |
| |
| aggdata <-aggregate(mtcars, by=list(cyl,gear), FUN=mean, na.rm=TRUE) |
| |
| install.packages("reshape2") |
| library(reshape2) |
| ID <- c(1,1,2,2) |
| Time <- c(1,2,1,2) |
| X1 <- c(5,3,6,2) |
| X2 <- c(6,5,1,4) |
| mydata <- data.frame(ID, Time, X1, X2) |
| |
| md <- melt(mydata,id=c("ID","Time")) |
| |
data:image/s3,"s3://crabby-images/0c50c/0c50cbef04c1d770225c372b332d2ecda5adeb58" alt="image"
chapter 6 基本图形
| |
| install.packages("vcd") |
| library(vcd) |
| counts <- table(Arthritis$Improved) |
| |
| barplot(counts, |
| main="Simple Bar Plot", |
| xlab="Improvement", ylab="Frequency") |
| |
| barplot(counts, |
| main="Horizontal Bar Plot", |
| xlab="Frequency", ylab="Improvement", |
| horiz=TRUE) |
| |
| counts <- table(Arthritis$Improved, Arthritis$Treatment) |
| |
| barplot(counts, |
| main="Stacked Bar Plot", |
| xlab="Treatment", ylab="Frequency", |
| col=c("red", "yellow","green"), |
| legend=rownames(counts)) |
| |
| barplot(counts, |
| main="Grouped Bar Plot", |
| xlab="Treatment", ylab="Frequency", |
| col=c("red", "yellow", "green"), |
| legend=rownames(counts), beside=TRUE) |
| |
| |
| states <- data.frame(state.region, state.x77) |
| means <- aggregate(states$Illiteracy, by=list(state.region), FUN=mean) |
| |
| means <- means[order(means$x),] |
| barplot(means$x, names.arg=means$Group.1) |
| |
| |
| |
| par(mar=c(5,8,4,2)) |
| |
| par(las=2) |
| counts <- table(Arthritis$Improved) |
| |
| barplot(counts, |
| main="Treatment Outcome", |
| horiz=TRUE, |
| cex.names=0.8, |
| names.arg=c("No Improvement", "Some Improvement", |
| "Marked Improvement")) |
| |
| |
| library(vcd) |
| attach(Arthritis) |
| counts <- table(Treatment, Improved) |
| spine(counts, main="Spinogram Example") |
| detach(Arthritis) |
| |
| |
| install.packages("plotrix") |
| par(mfrow=c(2, 2)) |
| slices <- c(10, 12,4, 16, 8) |
| lbls <- c("US", "UK", "Australia", "Germany", "France") |
| pie(slices, labels = lbls, |
| main="Simple Pie Chart") |
| pct <- round(slices/sum(slices)*100) |
| lbls2 <- paste(lbls, " ", pct, "%", sep="") |
| pie(slices, labels=lbls2, col=rainbow(length(lbls2)), |
| main="Pie Chart with Percentages") |
| library(plotrix) |
| pie3D(slices, labels=lbls,explode=0.1, |
| main="3D Pie Chart ") |
| mytable <- table(state.region) |
| lbls3 <- paste(names(mytable), "\n", mytable, sep="") |
| pie(mytable, labels = lbls3, |
| main="Pie Chart from a Table\n (with sample sizes)") |
| |
| |
| library(plotrix) |
| slices <- c(10, 12,4, 16, 8) |
| lbls <- c("US", "UK", "Australia", "Germany", "France") |
| fan.plot(slices, labels = lbls, main="Fan Plot") |
| |
| |
| par(mfrow=c(2,2)) |
| |
| hist(mtcars$mpg) |
| |
| hist(mtcars$mpg, |
| breaks=12, |
| col="red", |
| xlab="Miles Per Gallon", |
| main="Colored histogram with 12 bins") |
| |
| hist(mtcars$mpg, |
| freq=FALSE, |
| breaks=12, |
| col="red", |
| xlab="Miles Per Gallon", |
| main="Histogram, rug plot, density curve") |
| rug(jitter(mtcars$mpg)) |
| lines(density(mtcars$mpg), col="blue", lwd=2) |
| |
| |
| x <- mtcars$mpg |
| h<-hist(x, |
| breaks=12, |
| col="red", |
| xlab="Miles Per Gallon", |
| main="Histogram with normal curve and box") |
| xfit<-seq(min(x), max(x), length=40) |
| yfit<-dnorm(xfit, mean=mean(x), sd=sd(x)) |
| yfit <- yfit*diff(h$mids[1:2])*length(x) |
| lines(xfit, yfit, col="blue", lwd=2) |
| box() |
| |
| |
| par(mfrow=c(2,1)) |
| d <- density(mtcars$mpg) |
| |
| plot(d) |
| d <- density(mtcars$mpg) |
| |
| plot(d, main="Kernel Density of Miles Per Gallon") |
| |
| polygon(d, col="red", border="blue") |
| |
| rug(mtcars$mpg, col="brown") |
| |
| |
| |
| library(sm) |
| attach(mtcars) |
| |
| cyl.f <- factor(cyl, levels= c(4,6,8), |
| labels = c("4 cylinder", "6 cylinder", |
| "8 cylinder")) |
| |
| sm.density.compare(mpg, cyl, xlab="Miles Per Gallon") |
| title(main="MPG Distribution by Car Cylinders") |
| |
| colfill<-c(2:(1+length(levels(cyl.f)))) |
| legend(locator(1), levels(cyl.f), fill=colfill) |
| detach(mtcars) |
| |
| |
data:image/s3,"s3://crabby-images/4bcff/4bcfff65eca25915b65452e58c9430c6c84b96f8" alt="image"
| |
| boxplot(mpg ~ cyl, data=mtcars, |
| main="Car Mileage Data", |
| xlab="Number of Cylinders", |
| ylab="Miles Per Gallon") |
| |
| |
| boxplot(mpg ~ cyl, data=mtcars, |
| notch=TRUE, |
| varwidth=TRUE, |
| col="red", |
| main="Car Mileage Data", |
| xlab="Number of Cylinders", |
| ylab="Miles Per Gallon") |
| |
| |
| |
| mtcars$cyl.f <- factor(mtcars$cyl, |
| levels=c(4,6,8), |
| labels=c("4","6","8")) |
| |
| mtcars$am.f <- factor(mtcars$am, |
| levels=c(0,1), |
| labels=c("auto", "standard")) |
| |
| boxplot(mpg ~ am.f *cyl.f, |
| data=mtcars, |
| varwidth=TRUE, |
| col=c("gold","darkgreen"), |
| main="MPG Distribution by Auto Type", |
| xlab="Auto Type", ylab="Miles Per Gallon") |
| |
| |
| library(zoo) |
| library(vioplot) |
| x1 <- mtcars$mpg[mtcars$cyl==4] |
| x2 <- mtcars$mpg[mtcars$cyl==6] |
| x3 <- mtcars$mpg[mtcars$cyl==8] |
| vioplot(x1, x2, x3, |
| names=c("4 cyl", "6 cyl", "8 cyl"), |
| col="gold") |
| title("Violin Plots of Miles Per Gallon", ylab="Miles Per Gallon", |
| xlab="Number of Cylinders") |
| |
| |
| dotchart(mtcars$mpg, labels=row.names(mtcars), cex=.7, |
| main="Gas Mileage for Car Models", |
| xlab="Miles Per Gallon") |
| |
| |
| |
| x <- mtcars[order(mtcars$mpg),] |
| |
| x$cyl <- factor(x$cyl) |
| |
| x$color[x$cyl==4] <- "red" |
| x$color[x$cyl==6] <- "blue" |
| x$color[x$cyl==8] <- "darkgreen" |
| |
| dotchart(x$mpg, |
| labels = row.names(x), |
| cex=.7, |
| groups = x$cyl, |
| gcolor = "black", |
| color = x$color, |
| pch=19, |
| main = "Gas Mileage for Car Models\ngrouped by cylinder", |
| xlab = "Miles Per Gallon") |
| |
chapter 7 基本统计分析
可参考文献McCall(2000)和Kirk(2007)
| myvars <- c("mpg", "hp", "wt") |
| |
| head(mtcars[myvars]) |
| |
| summary(mtcars[myvars]) |
| |
| mystats <- function(x, na.omit=FALSE){ |
| if (na.omit) |
| x <- x[!is.na(x)] |
| m <- mean(x) |
| n <- length(x) |
| s <- sd(x) |
| skew <- sum((x-m)^3/s^3)/n |
| kurt <- sum((x-m)^4/s^4)/n - 3 |
| return(c(n=n, mean=m, stdev=s, skew=skew, kurtosis=kurt)) |
| } |
| sapply(mtcars[myvars], mystats) |
| |
| |
| |
| |
| library(Hmisc) |
| describe(mtcars[myvars]) |
| |
| |
| |
| |
| |
| |
| |
| |
| install.packages("pastecs") |
| library(pastecs) |
| stat.desc(mtcars[myvars]) |
| |
| install.packages("psych") |
| library(psych) |
| describe(mtcars[myvars]) |
| |
| |
| aggregate(mtcars[myvars], by=list(am=mtcars$am), mean) |
| aggregate(mtcars[myvars], by=list(am=mtcars$am), sd) |
| |
| dstats <- function(x)sapply(x, mystats) |
| by(mtcars[myvars], mtcars$am, dstats) |
| |
| |
| install.packages("doBy") |
| |
| library(doBy) |
| summaryBy(mpg+hp+wt~am, data=mtcars, FUN=mystats) |
| |
| myvars <- c("mpg", "hp", "wt") |
| describeBy(mtcars[myvars], list(am=mtcars$am)) |
频数
函数 |
描述 |
table(var1, var2, ..., varN) |
使用 N 个类别型变量(因子)创建一个 N 维列联表 |
xtabs(formula, data) |
根据一个公式和一个矩阵或数据框创建一个 N 维列联表 |
prop.table(table, margins) |
依 margins 定义的边际列表将表中条目表示为分数形式 |
margin.table(table, margins) |
依 margins 定义的边际列表计算表中条目的和 |
addmargins(table, margins) |
将概述边 margins(默认是求和结果)放入表中 |
ftable(table) |
创建一个紧凑的“平铺”式列联表 |
| |
| mytable <- with(Arthritis, table(Improved)) |
| |
| prop.table(mytable) |
| |
| |
| mytable <- xtabs(~ Treatment+Improved, data=Arthritis) |
| |
| |
| margin.table(mytable, 1) |
| prop.table(mytable, 1) |
| |
| addmargins(mytable) |
| |
| library(gmodels) |
| CrossTable(Arthritis$Treatment, Arthritis$Improved) |
| |
| |
| mytable <- xtabs(~ Treatment+Sex+Improved, data=Arthritis) |
| |
| ftable(mytable) |
| |
| margin.table(mytable, 1) |
| margin.table(mytable, c(1, 3)) |
| ftable(prop.table(mytable, c(1, 2))) |
| ftable(addmargins(prop.table(mytable, c(1, 2)), 3)) |
| |
| |
| |
| library(vcd) |
| mytable <- xtabs(~Treatment+Improved, data=Arthritis) |
| chisq.test(mytable) |
| |
| mytable <- xtabs(~Improved+Sex, data=Arthritis) |
| chisq.test(mytable) |
| |
| |
| |
| |
| |
| mytable <- xtabs(~Treatment+Improved, data=Arthritis) |
| fisher.test(mytable) |
| |
| |
| |
| mytable <- xtabs(~Treatment+Improved+Sex, data=Arthritis) |
| mantelhaen.test(mytable) |
| |
| |
| |
| library(vcd) |
| mytable <- xtabs(~Treatment+Improved, data=Arthritis) |
| assocstats(mytable) |
| |
| |
| |
参考资料
[1] John Cook - R programming for those coming from other languages
[2] Google’s R Style Guide
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 10亿数据,如何做迁移?
· 推荐几款开源且免费的 .NET MAUI 组件库
· 清华大学推出第四讲使用 DeepSeek + DeepResearch 让科研像聊天一样简单!
· c# 半导体/led行业 晶圆片WaferMap实现 map图实现入门篇
· 易语言 —— 开山篇