Spark学习笔记——使用PySpark

1.启动pyspark

 

2.读取文件

>>> from pyspark.sql import SparkSession
>>>
>>> spark = SparkSession.builder.appName("myjob").getOrCreate()
>>> df = spark.read.text("hdfs:///user/lintong/logs/test")
>>> df.show()
+-----+
|value|
+-----+
|    1|
|    2|
|    3|
|    4|
+-----+

  

3.退出pyspark使用exit()

 

4.使用spark-submit提交pyspark任务pi.py

spark2-submit --master local[*] /opt/cloudera/parcels/SPARK2/lib/spark2/examples/src/main/python/pi.py

 

posted @ 2016-05-05 23:30  tonglin0325  阅读(183)  评论(0编辑  收藏  举报