2.安装Spark与Python练习

一、安装Spark

　　1. 检查基础环境hadoop,jdk

　　2. 配置文件

　　3. 配置环境变量

　　用 gedit ~/.bashrc 打开配置文件

　　插入以下代码

　　然后，用 source ~/.bashrc 使刚配置环境变量生效

　　4. 试运行Python代码

　　先运行spark

　　然后用python命令测试运行

二、Python编程练习：英文文本的词频统计

　　1. 准备文本文件

gedit hy.txt

　　2. 编写python代码：读文件，预处理：大小写，标点符号，停用词，分词，统计每个单词出现的次数，按词频大小排序，结果写文件

path='/home/hadoop/hy.txt'
with open(path) as f:
    text=f.read()
words = text.split()
wc={}
for word in words:
    wc[word]=wc.get(word,0)+1
wclist=list(wc.items())
wclist.sort(key=lambda x:x[1],reverse=True)
print(wclist)

　　3. 结果输出

posted @ 2022-03-02 14:53 不知道我什么名字阅读(59) 评论(0) 编辑收藏举报

不知道我什么名字

2.安装Spark与Python练习

公告