R中的apply族函数和多线程计算
一.apply族函数
1.apply 应用于矩阵和数组
1 2 3 4 5 6 7 8 9 10 11 12 | # apply # 1代表行,2代表列 # create a matrix of 10 rows x 2 columns m <- matrix ( c (1:10, 11:20), nrow = 10, ncol = 2) # mean of the rows apply (m, 1, mean) [1] 6 7 8 9 10 11 12 13 14 15 # mean of the columns apply (m, 2, mean) [1] 5.5 15.5 # divide all values by 2 apply (m, 1:2, function (x) x/2) |
2.eapply 应用于环境中的变量
1 2 3 4 5 6 7 8 9 10 11 12 | # a new environment e <- new.env () # two environment variables, a and b e$a <- 1:10 e$b <- 11:20 # mean of the variables eapply (e, mean) $b [1] 15.5 $a [1] 5.5 |
3.lapply应用于列表,返回列表,实际data.frame也是一种list,一种由多个长度相同的向量cbind一起的list:lapply(list, function)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | sapply (iris[,1:4],mean) Sepal.Length Sepal.Width Petal.Length Petal.Width 5.843333 3.057333 3.758000 1.199333 lapply (iris[,1:4],mean) $Sepal.Length [1] 5.843333 $Sepal.Width [1] 3.057333 $Petal.Length [1] 3.758 $Petal.Width [1] 1.199333 |
4.sapply 是lapply的友好形式.lapply和sapply都可应用于list,data.frame。只是返回的对象类型不一样,前者是list,后者看情况,如果是每一个list下面的元素长度都一样,返回的结果就会被就会简化。举例说明。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | # 下面两个返回的结果是一样一样的,都是list sapply (iris,unique) lapply (iris,unique) # 下面两个前者返回向量,后者返回list sapply (iris[,1:4],mean) lapply (iris[,1:4],mean) #下面两个前者返回data.frame,后者反回list sapply (iris[,1:4], function (x) x/2) lapply (iris[,1:4], function (x) x/2) # sapply会根据返回结果,选最合适的对象类型来存放对象,而list反悔的统统都是list # 以下两者返回结果一样 library (magrittr) lapply (iris[,1:4],mean)%>% unlist () sapply (iris[,1:4],mean) |
5.vapply要求提供第三个参数,即输出的格式
1 2 3 4 5 6 7 8 9 10 11 12 13 | l <- list (a = 1:10, b = 11:20) # fivenum of values using vapply l.fivenum <- vapply (l, fivenum, c (Min.=0, "1st Qu." =0, Median=0, "3rd Qu." =0, Max.=0)) class (l.fivenum) [1] "matrix" # let's see it l.fivenum a b Min. 1.0 11.0 1st Qu. 3.0 13.0 Median 5.5 15.5 3rd Qu. 8.0 18.0 Max. 10.0 20.0 |
6.replicate
Description: “replicate is a wrapper for the common use of sapply for repeated evaluation of an expression (which will usually involve random number generation).”
1 | replicate (10, rnorm (10)) |
7.mapply可传递多个参数进去.
mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.
1 2 3 4 5 6 7 8 9 10 11 12 13 | l1 <- list (a = c (1:10), b = c (11:20)) l2 <- list (c = c (21:30), d = c (31:40)) # sum the corresponding elements of l1 and l2 mapply (sum, l1$a, l1$b, l2$c, l2$d) [1] 64 68 72 76 80 84 88 92 96 100 #mapply像是可以传递多个参数的saply mapply (rep, 1:4, 5) [,1] [,2] [,3] [,4] [1,] 1 2 3 4 [2,] 1 2 3 4 [3,] 1 2 3 4 [4,] 1 2 3 4 [5,] 1 2 3 4 |
8.rapply
Description: “rapply is a recursive version of lapply.”
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | # let's start with our usual simple list example l <- list (a = 1:10, b = 11:20) # log2 of each value in the list rapply (l, log2) a1 a2 a3 a4 a5 a6 a7 a8 0.000000 1.000000 1.584963 2.000000 2.321928 2.584963 2.807355 3.000000 a9 a10 b1 b2 b3 b4 b5 b6 3.169925 3.321928 3.459432 3.584963 3.700440 3.807355 3.906891 4.000000 b7 b8 b9 b10 4.087463 4.169925 4.247928 4.321928 # log2 of each value in each list rapply (l, log2, how = "list" ) $a [1] 0.000000 1.000000 1.584963 2.000000 2.321928 2.584963 2.807355 3.000000 [9] 3.169925 3.321928 $b [1] 3.459432 3.584963 3.700440 3.807355 3.906891 4.000000 4.087463 4.169925 [9] 4.247928 4.321928 # what if the function is the mean? rapply (l, mean) a b 5.5 15.5 rapply (l, mean, how = "list" ) $a [1] 5.5 $b [1] 15.5 |
二.多线程计算
下面用欧拉问题14,来演示R中的向量化编程(利用apply组函数)和多线程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | #-----Longest Collatz sequence Problem 14 func <- function (x) { n = 1 raw <- x while (x > 1) { x <- ifelse (x%%2==0,x/2,3*x+1) n = n + 1 } return ( c (raw,n)) } #方法1 向量化编程 library (magrittr) system.time ({ x <- 1:1e5 res1 <- sapply (x, func)%>% t () }) 用户 系统 流逝 37.960 0.360 41.315 #方法2 向量化编程 system.time ({ x <- 1:1e5 res2 <- do.call ( 'rbind' , lapply (x,func)) }) 用户 系统 流逝 36.031 0.181 36.769 #方法3 多线程计算 library (parallel) # 用system.time来返回计算所需时间 system.time ({ x <- 1:1e5 cl <- makeCluster (4) # 初始化四核心集群 results <- parLapply (cl,x,func) # lapply的并行版本 res.df <- do.call ( 'rbind' ,results) # 整合结果 stopCluster (cl) # 关闭集群 }) 用户 系统 流逝 0.199 0.064 20.038 # 方法4 for 循环 system.time ({ m <- matrix (nrow = 0,ncol = 2) for (i in 1:1e5){ m <- rbind (m, func (i)) } }) #方法4用时太长 |
以上。
参考:
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:基于图像分类模型对图像进行分类
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 25岁的心里话
· 闲置电脑爆改个人服务器(超详细) #公网映射 #Vmware虚拟网络编辑器
· 基于 Docker 搭建 FRP 内网穿透开源项目(很简单哒)
· 零经验选手,Compose 一天开发一款小游戏!
· 通过 API 将Deepseek响应流式内容输出到前端