R向量化操作(Data Transformations)
Data Transformations
split(x, f, drop = FALSE, …),x是待划分的向量或数据框,f是定义了组关系的因子变量。
library(MASS)
g <- split(Cars93$MPG.city, Cars93$Origin)
class(g)
## [1] "list"
names(g)
## [1] "USA" "non-USA"
c(median(g[[1]]), median(g[[2]]))
## [1] 20 22
lapply(lst, fun)和sapply(lst, fun),前者返回一个列表,后者优先返回向量,sapply里的s表示simplify。
Lst <- list(a = rnorm(100), b = rnorm(100), c = rnorm(100))
lapply(Lst, range)
## $a
## [1] -2.859 2.976
##
## $b
## [1] -1.961 2.906
##
## $c
## [1] -2.403 3.363
sapply(Lst, range)
## a b c
## [1,] -2.859 -1.961 -2.403
## [2,] 2.976 2.906 3.363
对矩阵或数据框的行或列应用函数
- 对矩阵行apply(mat, 1, fun)。
- 对矩阵列apply(mat, 2, fun)。
- 由于data frame是由其列为元素的列表,所以有lapply(dfrm, fun)和sapply(dfrm, fun)。
- 假设resp是响应变量(response variable),pred是一个数据框,每列为一个predictor。cors <- sapply(pred, cor, y = resp)会计算pred的每列和y的相关系数。
resp <- rnorm(n = 10, mean = 0, sd = 1)
pred <- as.data.frame(matrix(rnorm(n = 10 * 100, mean = 0, sd = 1), 10, 100))
cors <- sapply(pred, cor, y = resp)
mask <- (rank(-abs(cors)) <= 10) #函数rank给出从小到大的排序
best.pred <- pred[, mask]
sapply(Cars93, class)
## Manufacturer Model Type
## "factor" "factor" "factor"
## Min.Price Price Max.Price
## "numeric" "numeric" "numeric"
## MPG.city MPG.highway AirBags
## "integer" "integer" "factor"
## DriveTrain Cylinders EngineSize
## "factor" "factor" "numeric"
## Horsepower RPM Rev.per.mile
## "integer" "integer" "integer"
## Man.trans.avail Fuel.tank.capacity Passengers
## "factor" "numeric" "integer"
## Length Wheelbase Width
## "integer" "integer" "integer"
## Turn.circle Rear.seat.room Luggage.room
## "integer" "numeric" "integer"
## Weight Origin Make
## "integer" "factor" "factor"
tapply:对数据向量按因子分组应用函数
对数据框的行按因子分组应用函数,by(dfrm, fact, fun)
非向量化函数的向量化,mapply(f, vec1, vec2, …, vecN),f有N个参数。
mapply(rep, 1:4, 4:1)
## [[1]]
## [1] 1 1 1 1
##
## [[2]]
## [1] 2 2 2
##
## [[3]]
## [1] 3 3
##
## [[4]]
## [1] 4
mapply(rep, times = 1:4, x = 4:1)
## [[1]]
## [1] 4
##
## [[2]]
## [1] 3 3
##
## [[3]]
## [1] 2 2 2
##
## [[4]]
## [1] 1 1 1 1
gcd <- function(a, b) {
if (b == 0)
return(a) else return(gcd(b, a%%b))
}
mapply(gcd, c(1, 2, 3), c(9, 6, 3))
## [1] 1 2 3
参考文献
R cookbook 第六章