运行spark exmaple 代码
以管理员权限运行eclipse
以JavaSparkHiveExample为例
package :org.apache.spark.examples.sql
搭建代码环境
Figure 1新建maven项目,名称为spark2.1.1example
修改jdk版本,取消Enable project specific settings
修改jdk库为1.8。选中JRE System Library[J2SE-1.5],点击remove,点击Add Library/JRE System Library
改后
替换pom.xml为
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>spark2.1.1example</groupId> <artifactId>spark2.1.1example</artifactId> <version>0.0.1-SNAPSHOT</version> <properties> <java.version>1.7</java.version> </properties> <dependencies> <!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 --> <!----> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka_2.10</artifactId> <version>0.10.2.1</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-flume_2.10 --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-flume_2.10</artifactId> <version>2.1.1</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka_2.10 -->
<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka_2.11</artifactId> <version>1.6.3</version>1.5.2 </dependency> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10_2.11 --> <!--<dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka-0-10_2.11</artifactId> <version>2.1.1</version> </dependency> -->
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java --> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.35</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.zookeeper/zookeeper --> <dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.4.8</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10_2.10 --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql-kafka-0-10_2.10</artifactId> <version>2.1.1</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-flume_2.10 --> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-flume_2.10</artifactId> <version>2.1.1</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.flume/flume-ng-embedded-agent --> <dependency> <groupId>org.apache.flume</groupId> <artifactId>flume-ng-embedded-agent</artifactId> <version>1.6.0</version> </dependency> <!-- https://mvnrepository.com/artifact/org.apache.derby/derby --> <dependency> <groupId>org.apache.derby</groupId> <artifactId>derby</artifactId> <version>10.13.1.1</version> </dependency>
</dependencies> <build> <sourceDirectory>src</sourceDirectory> <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>3.1</version> <configuration> <source/> <target/> </configuration> </plugin> </plugins> </build> </project>
|
下载spark-2.1.1-bin-hadoop2.7.tgz
http://spark.apache.org/downloads.html
解压缩spark-2.1.1-bin-hadoop2.7.tgz
Figure 2新建User libraries
鼠标单击选中spark2.1.1jars,单机Add External JARS
打开刚才解压缩的目录
Figure 3添加jars下所有文件
Figure 4添加examples/jars下所有
Figure 5 libraries下包含JDK,Maven,spark2.1.1jars三类
下载winutils
https://github.com/steveloughran/winutils
我们只需要其中hadoop-2.7.1部分。
Figure 6解压缩后:
Figure 7右键 Run AS/Java Application
忽略报错。这一步创建运行配置文件,下一步修改运行配置文件后报错自动消失。
Figure 8右键Run As/Run Configuration
Figure 9切换到Environment标签
Figure 10新建HADOOP_HOME指向yourdir\winutils-master\hadoop-2.7.1
Figure 11选中replace native environment
在project下新建三层目录
examples/src/main/resources
Figure 12拷贝此目录下文件到刚新建的目录下
Figure 13为了在eclipse中运行,修改了标记//HERE的行
Figure 14查看运行结果