攻城狮科学家

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

Sys.setenv(SPARK_HOME="/usr/spark")

.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

library(SparkR)

sc<-sparkR.init(master="spark://Master.Hadoop:7077")

 

people <- read.df(sqlContext, "/people.json", "json")       read json file

 

read csv file:

https://github.com/databricks/spark-csv

in shell:

sparkR --packages com.databricks:spark-csv_2.10:1.0.3

df <- read.df(sqlContext, "/test.csv", source = "com.databricks.spark.csv", inferSchema = "true")    // read data in HDFS

in RStudio:

 

Sys.setenv(SPARK_HOME="/usr/spark")

.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

library(SparkR)

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.3.0" "sparkr-shell"')

sc<-sparkR.init(master="spark://Master.Hadoop:7077")

sqlContext <- sparkRSQL.init(sc)

df <- read.df(sqlContext, "/test.csv", source = "com.databricks.spark.csv", inferSchema = "true")

 

 

write.df(df, "newcars.csv", "com.databricks.spark.csv", "overwrite")

Using R file
./sparkR --pakcages
com.databricks:spark-csv_2.10:1.0.3 *.R (有时不灵。。some times cannot)

sparkR read files in one directory in HDFS:

df <- read.df(sqlContext, "/tdir/*.csv", source="com.databricks.spark.csv", interSchema="true")

or :

in the code:
#!/usr/bin/Rscript

directly run R code
./*.R

http://thirteen-01.stat.iastate.edu/snoweye/hpsc/?item=rscript

 

posted on 2016-01-07 11:21  攻城狮科学家  阅读(801)  评论(0编辑  收藏  举报