使用SparkSQL编写wordCount的词频统计
使用SparkSQL编写wordCount的词频统计:
word.txt 文件:
hello hello scala spark
java sql html java hello
jack jack tom tom you he he sql
IDEA编写的 spark 代码:
object WordCount { def main(args: Array[String]): Unit = { val spark: SparkSession = SparkSession.builder() .appName("wordCount") .master("local[*]") .getOrCreate() //读取数据 val ds: Dataset[String] = spark.read.textFile("文件路径/word.txt") //引包,不然无法调用 flatMap() import spark.implicits._ //整理数据 (切分压平) val ds1: Dataset[String] = ds.flatMap(_.split(" ")) //构建临时表 ds1.createTempView("word") //执行 SQL 语句,结果倒序 val df: DataFrame = spark.sql("select value,count(*) count from word group by value order by count desc") //展示 df.show() //关闭 spark.stop() } }
运行结果:
+-----+-----+ |value|count| +-----+-----+ |hello| 3| | tom| 2| | java| 2| | sql| 2| | he| 2| | jack| 2| | you| 1| | html| 1| |spark| 1| |scala| 1| +-----+-----+
如果哪里有错误,欢迎大家指出...