解决R速度太慢问题

R的速度慢一直被人诟病,最近做一个比较大的dataset的分析,跑得实在太慢,发现症结是R的data frame的index太慢:

以下为测试:

gene_list = 1:100000
eQTL_mat = matrix(nrow = length(gene_list), ncol = 7) # 创建一个matrix
eQTL_df = as.data.frame(matrix(nrow = length(gene_list), ncol = 7)) # 创建一个data frame
eQTL_list = replicate(length(gene_list), list()) # 创建一个list

try_func = function() return(1:7)
# test eQTL
system.time(
        sapply(gene_list, function(x) return (try_func()))
)

 ### user system elapsed

 ### 0.108 0.001 0.108

system.time(
        for (gene_ind in 1:length(gene_list)){
                eQTL_mat[gene_ind, ] = try_func()
        }
)


### user system elapsed

 ### 0.137 0.000 0.138


system.time(
for (gene_ind in 1:length(gene_list)){ eQTL_df[gene_ind, ] = try_func() } )

  ### user system elapsed 

  ### 90.623 165.868 259.065


system.time(
        for (gene_ind in 1:length(gene_list)){
        eQTL_list[[gene_ind]] = 1:7
        }
)

  ### user system elapsed
  ### 0.089 0.000 0.090

 

结果看到了吗? 太震精了!data frame真的不适合大数据!

posted on 2015-08-15 02:23  Forever_YCC  阅读(1739)  评论(0编辑  收藏  举报

导航