一个简单的例子:
代码:
from pyspark.sql import SparkSession logFile = "G:\\spark\\Spark\\spark-2.2.0-bin-hadoop2.7\\README.md" spark=SparkSession.builder.appName('hello').master('local[2]').getOrCreate() #(1)appName 为名称 (2)master local[2]为本地调用2个线程 logData = spark.read.text(logFile).cache() numAs = logData.filter(logData.value.contains('a')).count() print(numAs) # 61 numBs = logData.filter(logData.value.contains('b')).count() print(numBs)
截图:
![](https://images2017.cnblogs.com/blog/1276964/201802/1276964-20180201131840062-755185942.png)
可以进入SparkUI 地址:默认为 localhost:4040