liudehaos

还历史以真诚,还生命以过程。 ——余秋雨
  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

scala WordCount案例

Posted on 2022-07-20 20:42  liudehaos  阅读(36)  评论(0编辑  收藏  举报
数据样例:
java,spark,hadoop,python,datax
java,spark,hadoop,spark,python,datax
java,spark,hadoop,python,datax
java,spark,hadoop,spark,python
java,spark,hadoop,spark,python,datax
java,spark,hadoop,python,datax
java,spark,hadoop,python,datax
java,spark,hadoop,spark,python,datax
java,spark,hadoop,python,datax
java,spark,hadoop,spark,python,datax
hadoop,spark,spark,python

package
com.shujia import scala.io.Source object Test1wordcount { def main(args: Array[String]): Unit = { //读取文件,并将转换成list集合 val list: List[String] = Source.fromFile("data/words.txt").getLines().toList //将list集合按照分隔键进行展开 val words: List[String] = list.flatMap((lines: String) => lines.split(",")) //分组 val groupBy: Map[String, List[String]] = words.groupBy((word: String) =>word) //统计单词数量 val wordCount: Map[String, Int] = groupBy.map((kv: (String, List[String])) => { //分组单词 val word: String = kv._1 //组内所有单词 val value: List[String] = kv._2 //求出数组内的长度 val count: Int = value.length //返回单词的数量 (word, count) }) wordCount.foreach(println) } }
输出结果:
F:\soft\java\jdk\bin\java.exe "-javaagent:F:\soft\IDEA\IntelliJ 
(datax,36)
(java,40)
(hadoop,44)
(spark,68)
(python,44)

Process finished with exit code 0