随笔分类 - Spark大数据
摘要:val rddFromFile = spark.sparkContext.textFile("test.txt").collect().mkString("\n") 注:本地文件的话,这里用相对路径和绝对路径都可以,或直接传hdfs路径 取Array[String]的第一个元素: val rddFr
阅读全文
摘要:df.withColumn("Test", lit(null)).show() + + + + + |Hour|Category|Value|Test| + + + + + | 0| cat26| 30.9|null| | 1| cat67| 28.5|null| | 2| cat56| 39.6|
阅读全文
摘要:查看所有的表 :list 查看表中所有数据:scan 'staff' 前10条: scan 'test-table',{'LIMIT' => 10} 后10条: scan 'test-table',{'LIMIT' => 10, REVERSED => TRUE} 查看表结构:desc 'staff
阅读全文
摘要:scan 'test-table',{'LIMIT' => 10, REVERSED => TRUE}
阅读全文
摘要:scan 'test-table', {'LIMIT' => 10}
阅读全文
摘要:创建DataFrameF示例 val df = sc.parallelize(Seq( | (0,"cat26","cat26"), | (1,"cat67","cat26"), | (2,"cat56","cat26"), | (3,"cat8","cat26"))).toDF("Hour", "
阅读全文
摘要:一,创建Dataframe scala> val df = sc.parallelize(Seq( | | (0,"cat26",30.9), | | (1,"cat67",28.5), | | (2,"cat56",39.6), | | (3,"cat8",35.6))).toDF("Hour",
阅读全文
摘要:构造一个dataframe import org.apache.spark.sql._ import org.apache.spark.sql.types._ val data = Array(List("Category A", 100, "This is category A"), List("
阅读全文
摘要:.na.drop("all", Seq("create_time"))
阅读全文
摘要:将以下内容保存为small_zipcode.csv id,zipcode,type,city,state,population 1,704,STANDARD,,PR,30100 2,704,,PASEO COSTA DEL SUR,PR, 3,709,,BDA SAN LUIS,PR,3700 4,
阅读全文
摘要:删除表中全部为NaN的行 df.na.drop("all") 删除表任一列中有NaN的行 df.na.drop("any") 示例: scala> df.show + + + + + + + | id|zipcode| type| city|state|population| + + + + + +
阅读全文
摘要:scala> val a = Seq(("a", 2),("b",3)).toDF("name","score") a: org.apache.spark.sql.DataFrame = [name: string, score: int] scala> a.show() + + + |name|s
阅读全文
摘要:library(datasets) summary(iris) ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 ## 1st Qu.:5.1
阅读全文
摘要:val goalsDF = Seq( ("messi", 2), ("messi", 1), ("pele", 3), ("pele", 1) ).toDF("name", "goals") goalsDF.show() + + + | name|goals| + + + |messi| 2| |m
阅读全文
摘要:val goalsDF = Seq( ("messi", 2), ("messi", 1), ("pele", 3), ("pele", 1) ).toDF("name", "goals") goalsDF.show() + + + | name|goals| + + + |messi| 2| |m
阅读全文
摘要:import org.apache.spark.sql.functions.{row_number, max, broadcast} import org.apache.spark.sql.expressions.Window val df = sc.parallelize(Seq( (0,"cat
阅读全文
摘要:scala> val df = sc.parallelize(Seq( (0,"cat26",30.9), (1,"cat67",28.5), (2,"cat56",39.6), (3,"cat8",35.6))).toDF("Hour", "Category", "Value") scala> d
阅读全文
摘要:scala> val df = sc.parallelize(Seq( | (0,"cat26",30.9), | (1,"cat67",28.5), | (2,"cat56",39.6), | (3,"cat8",35.6))).toDF("Hour", "Category", "Value")
阅读全文
摘要:val df = sc.parallelize(Seq( (0,"cat26",30.9), (1,"cat67",28.5), (2,"cat56",39.6), (3,"cat8",35.6))).toDF("Hour", "Category", "Value") //或者从文件读取成List
阅读全文