Spark套接字监听

#导入库
from __future__ import print_function import sys from pyspark import SparkContext from pyspark.streaming import StreamingContext if __name__ == "__main__": if len(sys.argv) != 3: print("Usage: network_wordcount.py <hostname> <port>", file=sys.stderr) exit(-1) #配置创建StreamingContext对象
  sc = SparkContext(appName="PythonStreamingNetworkWordCount") ssc = StreamingContext(sc, 1)   #套接字流 lines = ssc.socketTextStream(sys.argv[1], int(sys.argv[2]))
  #进行词频统计操作 counts = lines.flatMap(lambda line: line.split(" "))\ .map(lambda word: (word, 1))\ .reduceByKey(lambda a, b: a+b) counts.pprint() ssc.start() ssc.awaitTermination()


sudo nc -lk 9999  指定端口
  cd /usr/local/spark/mycode/streaming

  1. python3 NetworkWordCount.py localhost 99 按照指定的端口执行


在nc第一个终端窗口窗口中随意输入一些单词,监听窗口就会自动获得单词数据流信息,在监听窗口每隔1秒就会打印出词频统计信息,大概会再屏幕上出现类似如下的结果:


-------------------------------------------
Time: 1479431100000 ms
-------------------------------------------
(hello,1)
(world,1)
-------------------------------------------
Time: 1479431120000 ms
-------------------------------------------
(hadoop,1)
-------------------------------------------
Time: 1479431140000 ms
-------------------------------------------
(spark,1)
 
posted @ 2018-08-09 16:34  Bean_zheng  阅读(226)  评论(0编辑  收藏  举报