|NO.Z.00064|——————————|^^ 部署 ^^|——|Hadoop&MapReduce.V35|——|Hadoop.v35||Hadoop二次开发环境|搭建示例|
一、Hadoop二次开发环境搭建
### --- 系统环境
~~~ 系统:linux122: CentOS-7_x86_64
protobuf: protoc-2.5.0
maven: maven-3.6.0
hadoop: hadoop-2.9.2
java: jdk1.8.0_231
cmake: cmake-2.8.12.2
OpenSSL: OpenSSL 1.0.2k-fips
findbugs: findbugs-1.3.9
### --- 准备工作
~~~ # 安装编译需要的依赖库
~~~ # linux122主机执行
[root@linux122 ~]# yum install -y lzo-devel zlib-devel autoconf automake libtool cmake openssldevel cmake gcc gcc-c++
二、安装Maven
### --- 上传maven二进制包并解压安装
~~~ # 上传maven安装包、解压缩
[root@linux122 software]# tar -zxvf apache-maven-3.6.3-bin.tar.gz -C /usr/local/
### --- 配置系统环境变量
~~~ # 配置到系统环境变量
[root@linux122 software]# vim /etc/profile
# MAVEN_HOME
export MAVEN_HOME=/usr/local/apache-maven-3.6.3
export PATH=$PATH:$MAVEN_HOME/bin
~~~ # 刷新配置文件
[root@linux122 software]# source /etc/profile
### --- 验证maven是否安装成功
[root@linux122 software]# mvn -version
Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /usr/local/apache-maven-3.6.3
Java version: 1.8.0_231, vendor: Oracle Corporation, runtime: /opt/yanqi/servers/jdk1.8.0_231/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-957.el7.x86_64", arch: "amd64", family: "unix"
三、安装protobuf:序列化的框架机制
### --- 安装依赖环境
[root@linux122 ~]# yum groupinstall Development tools -y
### --- 下载安装包
~~~ # 下载
[root@linux122 software]# wget https://github.com/protocolbuffers/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
~~~ #上传protobuf安装包
~~~ # 解压缩
[root@linux122 software]# tar -zxvf protobuf-2.5.0.tar.gz
### --- 编译安装
~~~ # 进入解压目录 配置安装路径(--prefix=/usr/local/protobuf-2.5.0)
[root@linux122 software]# cd protobuf-2.5.0
[root@linux122 protobuf-2.5.0]# ./configure --prefix=/usr/local/protobuf-2.5.0
~~~ # 编译
[root@linux122 protobuf-2.5.0]# make
~~~ # 验证编译文件
[root@linux122 protobuf-2.5.0]# make check
~~~ # 安装
[root@linux122 protobuf-2.5.0]# make install
### --- 配置环境变量
~~~ # 配置protobuf环境变量
[root@linux122 ~]# vim /etc/profile
##PROTOBUF_HOME
export PROTOCBUF_HOME=/usr/local/protobuf-2.5.0
export PATH=$PATH:$PROTOCBUF_HOME/bin
~~~ # 刷新配置文件
[root@linux122 ~]# source /etc/profile
~~~ # 验证是否安装成功
[root@linux122 ~]# protoc --version
libprotoc 2.5.0
四、安装Findbugs
### --- 下载软件包
~~~ # 下载
~~~ # 上传安装包
$ https://jaist.dl.sourceforge.net/project/findbugs/findbugs/1.3.9/findbugs-1.3.9.tar.gz
### --- 安装软件包
~~~ # 解压缩
[root@linux122 software]# tar -zxvf findbugs-1.3.9.tar.gz -C /usr/local/
### --- 配置环境变量
~~~ # 配置系统环境变量
[root@linux122 ~]# vim /etc/profile
## FINDBUGS_HOME
export FINDBUGS_HOME=/usr/local/findbugs-1.3.9
export PATH=$PATH:$FINDBUGS_HOME/bin
~~~ # 刷新配置文件
[root@linux122 ~]# source /etc/profile
### --- 验证是否部署成功
[root@linux122 ~]# findbugs -version
1.3.9
五、添加aliyun镜像
### --- 找到maven环境下的settings.xml文件,添加镜像代理
[root@linux122 ~]# vim /usr/local/apache-maven-3.6.3/conf/settings.xml
<mirror>
<id>nexus</id>
<mirrorOf>*</mirrorOf>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
</mirror>
<mirror>
<id>nexus-public-snapshots</id>
<mirrorOf>public-snapshots</mirrorOf>
<url>http://maven.aliyun.com/nexus/content/repositories/snapshots/</url>
</mirror>
六、上传源码文件
### --- 将自己编写的配置文件导入到源码包中;进行封装
~~~ # wordcount项目中
// MergeInputFormat
// MergeRecordReader
~~~ # 源码包位置
// hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoopmapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input
~~~ # 将自己写的代码放到源码包中进行封装

### --- 进入代码文件目标路径
~~~ # 进入源码编译目录下
[root@linux122 ~]# cd /root/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/
~~~ # 将MergeInputFormat.java和MergeRecordReader.java上传到该目录下
[root@linux122 input]# pwd
/root/hadoop-2.9.2-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input
[root@linux122 input]# ll
-rw-r--r-- 1 root root 1867 Aug 19 23:01 MergeInputFormat.java
-rw-r--r-- 1 root root 2843 Aug 19 23:01 MergeRecordReader.java
### --- 编译
~~~ # 进入Hadoop源码目录
[root@linux122 input]# cd /root/hadoop-2.9.2-src
~~~ # 执行编译命令
[root@linux122 hadoop-2.9.2-src]# mvn package -Pdist,native -DskipTests -Dtar
~~~ # 编译后生成的jar包位置及jar包
[root@linux122 hadoop-2.9.2-src]# ls hadoop-dist/
pom.xml
~~~ # 编译生成的jar包
[root@linux122 hadoop-dist]# pwd
/root/hadoop-2.9.2-src/hadoop-dist
[root@linux122 hadoop-dist]# pwd
/root/hadoop-2.9.2-src/hadoop-dist
[root@linux122 hadoop-dist]# ls target/
antrun dist-tar-stitching.sh hadoop-dist-2.9.2.jar hadoop-dist-2.9.2-test-sources.jar maven-shared-archive-resources
classes hadoop-2.9.2 hadoop-dist-2.9.2-javadoc.jar javadoc-bundle-options test-classes
dist-layout-stitching.sh hadoop-2.9.2.tar.gz hadoop-dist-2.9.2-sources.jar maven-archiver test-dir
~~~ # 自己编写的打成jar包位置
~~~ 里面含有MergeInputFormat.java和MergeRecordReader.java文件
[root@linux122 ~]# cd hadoop-2.9.2-src/hadoop-dist/target/hadoop-2.9.2/share/hadoop/mapreduce/
hadoop-mapreduce-client-core-2.9.2.jar
### --- 编译成功
[INFO] Reactor Summary for Apache Hadoop Main 2.9.2:
[INFO]
[INFO] Apache Hadoop Main ................................. SUCCESS [ 5.214 s]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [ 4.830 s]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [ 2.969 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [ 4.714 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [ 0.466 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [ 3.761 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [ 8.122 s]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [ 7.711 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [ 10.418 s]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [ 7.660 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [01:54 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [ 12.256 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [ 15.964 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [ 0.130 s]
[INFO] Apache Hadoop HDFS Client .......................... SUCCESS [ 29.469 s]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [01:21 min]
[INFO] Apache Hadoop HDFS Native Client ................... SUCCESS [ 5.105 s]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [ 25.855 s]
[INFO] Apache Hadoop HDFS BookKeeper Journal .............. SUCCESS [ 9.607 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [ 6.868 s]
[INFO] Apache Hadoop HDFS-RBF ............................. SUCCESS [ 38.402 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [ 0.069 s]
[INFO] Apache Hadoop YARN ................................. SUCCESS [ 0.071 s]
[INFO] Apache Hadoop YARN API ............................. SUCCESS [ 19.898 s]
[INFO] Apache Hadoop YARN Common .......................... SUCCESS [ 48.027 s]
[INFO] Apache Hadoop YARN Registry ........................ SUCCESS [ 8.516 s]
[INFO] Apache Hadoop YARN Server .......................... SUCCESS [ 0.087 s]
[INFO] Apache Hadoop YARN Server Common ................... SUCCESS [ 19.856 s]
[INFO] Apache Hadoop YARN NodeManager ..................... SUCCESS [ 21.764 s]
[INFO] Apache Hadoop YARN Web Proxy ....................... SUCCESS [ 4.517 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService ....... SUCCESS [ 10.905 s]
[INFO] Apache Hadoop YARN Timeline Service ................ SUCCESS [ 7.621 s]
[INFO] Apache Hadoop YARN ResourceManager ................. SUCCESS [ 34.623 s]
[INFO] Apache Hadoop YARN Server Tests .................... SUCCESS [ 1.906 s]
[INFO] Apache Hadoop YARN Client .......................... SUCCESS [ 15.152 s]
[INFO] Apache Hadoop YARN SharedCacheManager .............. SUCCESS [ 5.897 s]
[INFO] Apache Hadoop YARN Timeline Plugin Storage ......... SUCCESS [ 4.661 s]
[INFO] Apache Hadoop YARN Router .......................... SUCCESS [ 9.551 s]
[INFO] Apache Hadoop YARN TimelineService HBase Backend ... SUCCESS [ 12.263 s]
[INFO] Apache Hadoop YARN Timeline Service HBase tests .... SUCCESS [ 3.132 s]
[INFO] Apache Hadoop YARN Applications .................... SUCCESS [ 0.090 s]
[INFO] Apache Hadoop YARN DistributedShell ................ SUCCESS [ 4.193 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher ........... SUCCESS [ 3.905 s]
[INFO] Apache Hadoop YARN Site ............................ SUCCESS [ 0.103 s]
[INFO] Apache Hadoop YARN UI .............................. SUCCESS [ 0.061 s]
[INFO] Apache Hadoop YARN Project ......................... SUCCESS [ 7.952 s]
[INFO] Apache Hadoop MapReduce Client ..................... SUCCESS [ 0.384 s]
[INFO] Apache Hadoop MapReduce Core ....................... SUCCESS [ 38.992 s]
[INFO] Apache Hadoop MapReduce Common ..................... SUCCESS [ 27.521 s]
[INFO] Apache Hadoop MapReduce Shuffle .................... SUCCESS [ 6.161 s]
[INFO] Apache Hadoop MapReduce App ........................ SUCCESS [ 21.886 s]
[INFO] Apache Hadoop MapReduce HistoryServer .............. SUCCESS [ 11.528 s]
[INFO] Apache Hadoop MapReduce JobClient .................. SUCCESS [ 15.226 s]
[INFO] Apache Hadoop MapReduce HistoryServer Plugins ...... SUCCESS [ 2.744 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [ 10.097 s]
[INFO] Apache Hadoop MapReduce ............................ SUCCESS [ 3.668 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [ 11.195 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [ 10.439 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [ 3.504 s]
[INFO] Apache Hadoop Archive Logs ......................... SUCCESS [ 3.719 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [ 8.608 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [ 7.772 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [ 4.552 s]
[INFO] Apache Hadoop Ant Tasks ............................ SUCCESS [ 3.369 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [ 5.278 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [ 9.118 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [ 7.151 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [ 22.322 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [ 12.390 s]
[INFO] Apache Hadoop Aliyun OSS support ................... SUCCESS [ 8.251 s]
[INFO] Apache Hadoop Client ............................... SUCCESS [ 7.108 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 1.612 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 8.219 s]
[INFO] Apache Hadoop Resource Estimator Service ........... SUCCESS [ 7.365 s]
[INFO] Apache Hadoop Azure Data Lake support .............. SUCCESS [ 7.077 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 19.244 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [ 1.384 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [01:33 min]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [ 5.750 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [ 0.080 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 17:32 min
[INFO] Finished at: 2021-08-19T23:55:42+08:00
[INFO] ------------------------------------------------------------------------
一、自己编译的jar包如何调用
### --- 自己编译的jar包如何调用
~~~ 通过反编译软件查看手动编写的MergeInputFormat.java和MergeRecordReader.java
~~~ 是否打包在jar中

二、调用jar包
### --- 调用jar包
~~~ 通过workcount编写的程序运行
### --- 创建step4工程:拷贝原有step2工程中的文件到该目录下
~~~ 在:com.yanqi.mr.comment.step2.MergeInputFormat和com.yanqi.mr.comment.step2.MergeRecordReader
### --- 重命名为:
~~~ MyMergeRecordReader
~~~ MyMergeInputFormat
### --- MergeDriver报错
### --- 导入重新打包后的jar包:
~~~ 包含这两个文件MergeInputFormat.java和MergeRecordReader.java
### --- 运行程序后可以正常执行
~~~ # 备份原有的hadoop-mapreduce-client-core-2.9.2.jar包为.bak
~~~ C:\Users\Administrator\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-core\2.9.2
~~~ hadoop-mapreduce-client-core-2.9.2.jar.bak
### --- 替换新的jar包
~~~ hadoop-mapreduce-client-core-2.9.2.jar
~~~ 该jar包含有:MergeInputFormat.java和MergeRecordReader.java这两个文件

附录一:报错处理
### --- 报错现象:
[INFO] Apache Hadoop Amazon Web Services support .......... FAILED [ 7.011 s]
### --- 报错分析:
~~~ 缺少依赖包:下载失败导致的DynamoDBLocal:jar
### --- 解决方案:
~~~ hadoop-aws:jar时缺少依赖包DynamoDBLocal:jar
~~~ 选择手动下载该Jar包,上传到本地maven仓库
[root@linux122 ~]# ll /root/.m2/repository/com/amazonaws/DynamoDBLocal/1.11.86
total 3628
-rw-r--r-- 1 root root 3713946 Nov 27 2020 DynamoDBLocal-1.11.86.jar
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」