R语言操作mysql上亿数据量(ff包ffbase包和ETLUtils包)
平时都是几百万的数据量,这段时间公司中了个大标,有上亿的数据量。
现在情况是数据已经在数据库里面了,需要用R分析,但是完全加载不进来内存。
面对现在这种情况,R提供了ff, ffbase , ETLUtils 的解决方案。
它可以很简单的加载,转换数据库的数据进入R内存,ETLUtils 包现在已经扩展了read.odbc.ffdf
方法用来查询Oracle, MySQL, PostgreSQL & sqlite databases.。
下面我们就来展示一个例子。
require(ETLUtils) login <- list() login$user <- "bnosac" login$password <- "YourPassword" login$dbname <- "YourDB" login$host <- "localhost/IPaddress" require(RMySQL) x <- read.dbi.ffdf( query = "select * from semetis.keywords_performance_endofday", dbConnect.args = list(drv = dbDriver("MySQL"), dbname = login$dbname, user = login$user, password = login$password, host = login$host), VERBOSE=TRUE) 1> dim(x) [1] 14969674 27 login <- list() login$dsn <- "YourDSN" login$uid <- "bnosac" login$pwd <- "YourPassword" require(RODBC) x <- read.odbc.ffdf( query = "select * from semetis.keywords_performance_endofday where date = CURRENT_DATE-1", odbcConnect.args = list(dsn = login$dsn, uid = login$uid, pwd = login$pwd), x = x, VERBOSE=TRUE) 1> dim(x) [1] 15062904 27
指定本地ff文件路径
save.ffdf(ffdfname, dir=”/PATH/TO/STORE/FF/FILES”)
https://www.rdocumentation.org/packages/ffbase/versions/0.12.3/topics/save.ffdf
load.ffdf(dir=”/PATH/TO/STORE/FF/FILES”)
https://www.rdocumentation.org/packages/ffbase/versions/0.12.3/topics/load.ffdf
read.dbi.ffdf 更详细的介绍
https://www.rdocumentation.org/packages/ETLUtils/versions/1.3/topics/read.dbi.ffdf