2.安装Spark与Python练习
一、安装Spark
1. 检查基础环境hadoop,jdk
2.下载spark(已有spark,略过)
3.解压,文件夹重命名、权限(略过)
4.配置文件
5.配置环境
在gedit ~/.bashrc加入代码
export SPARK_HOME=/usr/local/spark export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.9-src.zip:PYTHONPATH export PYSPARK_PYTHON=python3 export PATH=$PATH:$SPARK_HOME/bin
启动spark
运行python代码
二、Python编程练习:英文文本的词频统计
插入代码
path='/home/hadoop/a/1.txt' with open(path) as f: text=f.read() words = text.split() a={} for word in words: a[word]=a.get(word,0)+1 alist=list(a.items()) alist.sort(key=lambda x:x[1],reverse=True) print(alist)