7.Python Spark安装

7.1Scla安装

下载:wget https://scala-lang.org/files/archive/scala-2.13.0.tgz

解压:tar xvf scala-2.13.0.tgz

移动到/usr/local 目录 :sudo mv scala-2.13.0 /usr/local/scala

设置Scala环境变量:

sudo gedit ~/.bashrc

 source ~/.bashrc

启动scala

 :q退出

7.2安装Spark

wget https://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-3.4.2/spark-3.4.2-bin-hadoop3.tgz

解压:tar zxf spark-3.4.2-bin-hadoop3.tgz

移动到/usr/local下:sudo mv spark-3.4.2-bin-hadoop3 /usr/local/spark/

编辑~/.bashrc:sudo gedit ~/.bashrc

source ~/.bashrc

7.3启动pyspark交互式界面

pyspark

 8.5设置pyspark显示信息

cd /usr/local/spark/conf

cp log4j2.properties.template log4j2.properties

sudo gedit log4j2.properties,把橙色的地方改成WARN

 7.5 创建测试文件

 启动Hadoop Multi-Node Cluster,依次执行:

cp /usr/local/hadoop/LICENSE.txt ~/wordcount/input

ll ~/wordcount/input

hadoop fs -mkdir -p /user/hduser/wordcount/input

cd ~/wordcount/input

hadoop fs -copyFromLocal LICENSE.txt /user/hduser/wordcount/input

hadoop fs -ls /user/hduser/wordcount/input

 7.6本地运行pyspark程序

pyspark --master local[4]

读取本地文件

 读取HDFS文件

 7.7在Hadoop YARN 运行pyspark

关闭hadoop安全模式

HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client

 读取HDFS文件:textFile=sc.textFile("hdfs://master:9000/user/hduser/wordcount/input/LICENSE.txt")

可以在Hadoop Web界面查看PySparkShell App

 7.8构建Spark Standalone Cluster运行环境

复制模板文件:

cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh

编辑spark-env.sh:sudo gedit /usr/local/spark/conf/spark-env.sh

连接data1:ssh data1

创建spark目录:sudo mkdir /usr/local/spark

更改所有者为hduser:sudo chown hduser:hduser /usr/local/spark

使用scp把master的spark程序复制到data1:sudo scp -r /usr/local/spark hduser@data1:/usr/local

 data2同样操作一遍。

编辑spark/conf/workers文件

cp /usr/local/spark/conf/workers.template /usr/local/spark/conf/workers

sudo gedit /usr/local/spark/conf/workers

 7.9在Spark Standalone 运行 pyspark

/usr/local/spark/sbin/start-all.sh

 z在Spark Standalone 运行 pyspark

pyspark --master spark://master:7077 --num-executors 1 --total-executor-cores 3 --executor-memory 512m

 读取文件

 7.9Spark Web UI 界面

http://master:8080/

可以查看worker和job详细信息

 

 

posted @   Miburo  阅读(73)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 开源Multi-agent AI智能体框架aevatar.ai,欢迎大家贡献代码
· Manus重磅发布:全球首款通用AI代理技术深度解析与实战指南
点击右上角即可分享
微信分享提示