Mac下搭建pyspark环境
https://blog.csdn.net/wapecheng/article/details/108071538
1.安装Java JDK
https://www.oracle.com/java/technologies/javase-downloads.html
然后点击安装即可
后面发现要下载jdk8版本才行,否则下面会报错,可从这https://www.cr173.com/mac/122803.html下载
2.
brew install scala brew install apache-spark brew install hadoop
3.
vim ~/.bash_profile
加入下面的环境变量配置(记得jdk路径改成jdk8的)
# HomeBrew export HOMEBREW_BOTTLE_DOMAIN=https://mirrors.tuna.tsinghua.edu.cn/homebrew-bottles export PATH="/usr/local/bin:$PATH" export PATH="/usr/local/sbin:$PATH" # HomeBrew END #Scala SCALA_HOME="/usr/local/Cellar/scala/2.13.3" export PATH="$PATH:$SCALA_HOME/bin" # Scala END # Hadoop HADOOP_HOME="/usr/local/Cellar/hadoop/3.3.0" export PATH="$PATH:$HADOOP_HOME/bin" # Hadoop END # spark export SPARK_PATH="/usr/local/Cellar/apache-spark/3.0.1" export PATH="$SPARK_PATH/bin:$PATH" # Spark End # JDK JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home" export PATH="$PATH:$JAVA_HOME/bin" # JDK END
4.安装pyspark
pip install pyspark
查看下载的JDK位置:
$ /usr/libexec/java_home -V Matching Java Virtual Machines (1): 16.0.1, x86_64: "Java SE 16.0.1" /Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home /Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home
测试:
import os os.environ['JAVA_HOME'] = '/Library/Java/JavaVirtualMachines/jdk-16.0.1.jdk/Contents/Home' import findspark findspark.init() from pyspark import SparkContext, SparkConf sc = SparkContext() from pyspark.sql import SparkSession # 初始化spark会话 spark = SparkSession.builder.getOrCreate()
报错:
Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.spark.unsafe.array.ByteArrayMethods.<clinit>(ByteArrayMethods.java:54) at org.apache.spark.internal.config.package$.<init>(package.scala:1006) at org.apache.spark.internal.config.package$.<clinit>(package.scala) at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$3(SparkSubmitArguments.scala:157) at scala.Option.orElse(Option.scala:447) at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:157) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:115) at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$3.<init>(SparkSubmit.scala:990) at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:990) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:85) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make private java.nio.DirectByteBuffer(long,int) accessible: module java.base does not "opens java.nio" to unnamed module @40f9161a at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:357) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:188) at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:181) at org.apache.spark.unsafe.Platform.<clinit>(Platform.java:56) ... 13 more Traceback (most recent call last): File "delete.py", line 14, in <module> sc = SparkContext() File "/usr/local/opt/apache-spark/libexec/python/pyspark/context.py", line 133, in __init__ SparkContext._ensure_initialized(self, gateway=gateway, conf=conf) File "/usr/local/opt/apache-spark/libexec/python/pyspark/context.py", line 325, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway(conf) File "/usr/local/opt/apache-spark/libexec/python/pyspark/java_gateway.py", line 105, in launch_gateway raise Exception("Java gateway process exited before sending its port number") Exception: Java gateway process exited before sending its port number
改成使用jdk8就没事了:
os.environ['JAVA_HOME'] = '/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home'
返回:
21/05/10 11:19:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).