电商项目实战-打包至服务器上运行(十三)
1、更改输入、输出路径
(1)输入路径为:args[0]
(2)输出路径为:args[1]
2、修改IPParser.java
src/main/java/project/utils/IPParser.java
目前本机的IP库是放在ip/qqwry.dat
要修改为:
//本机ip库路径 //private static final String ipFilePath = "ip/qqwry.dat"; //服务器端ip库路径 private static final String ipFilePath = "/home/hadoop/lib/qqwry.dat";
3、修改pom.xml
使用1.8来编译
在<project></project>中间添加:
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.3</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build>
4、本机打包,上传至服务器
本机cmd中:
C:\Users\jieqiong>cd C:\Users\jieqiong\IdeaProjects\hadoop-train-v2 C:\Users\jieqiong\IdeaProjects\hadoop-train-v2>mvn clean package -DskipTests
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2>cd target/ C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target>dir 驱动器 C 中的卷是 Windows-SSD 卷的序列号是 F0E4-86A5 C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target 的目录 2021/07/27 16:19 <DIR> . 2021/07/27 16:19 <DIR> .. 2021/07/27 16:19 <DIR> classes 2021/07/27 16:19 <DIR> generated-sources 2021/07/27 16:19 <DIR> generated-test-sources 2021/07/27 16:19 51,390 hadoop-train-v2-1.0.jar 2021/07/27 16:19 <DIR> maven-archiver 2021/07/27 16:19 <DIR> maven-status 2021/07/27 16:19 <DIR> test-classes 1 个文件 51,390 字节 8 个目录 148,776,701,952 可用字节
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target>scp hadoop-train-v2-1.0.jar hadoop@192.168.131.101:~lib/
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\ip>scp qqwry.dat hadoop@192.168.131.101:~lib/
5、服务器端上传好的文件
[hadoop@hadoop000 lib]$ pwd /home/hadoop/lib [hadoop@hadoop000 lib]$ ls hadoop-train-v2-1.0.jar qqwry.dat
6、data文件夹中的数据
[hadoop@hadoop000 data]$ pwd /home/hadoop/data [hadoop@hadoop000 data]$ ls access.log data.txt emp.txt helloworld.txt part-r-00000 accessOwn.log dept.txt emp.txt-bak h.txt trackinfo_20130721.data
6、将trackinfo_20130721.data上传至hdfs中的/project/input/raw(hdfs中本存在,之后使用到的都是已传好的版本,非自己的版本,注意路径)
[hadoop@hadoop000 data]$ hadoop fs -mkdir -p /project/input/raw [hadoop@hadoop000 data]$ hadoop fs -put trackinfo_20130721.data /project/input/raw [hadoop@hadoop000 data]$ hadoop fs -ls /project/input/raw Found 1 items -rw-r--r-- 1 hadoop supergroup 173555592 2018-12-09 08:50 /project/input/raw/trackinfo_20130721.data
7、写脚本
在/shell/pv.sh
没有pv.sh文件,使用vi直接创建并进入。
[hadoop@hadoop000 ~]$ clear [hadoop@hadoop000 ~]$ cd shell/ [hadoop@hadoop000 shell]$ ls [hadoop@hadoop000 shell]$ vi pv.sh
在pv.sh文件写入:
hadoop jar + "在hdfs中的jar包路径及jar包名" + “要运行的某一java的copy reference” + "数据输入路径" + ”数据输出路径“
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pvstat/
8、运行
先设置执行权限,再执行。
[hadoop@hadoop000 shell]$ chmod u+x pv.sh
[hadoop@hadoop000 shell]$ ./pv.sh
(1)执行com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp
注意:要复制类名的Copy Reference
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pvstat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/pvstat/part-r-00000 300000
(2)执行com.imooc.bigdata.hadoop.mr.project.mr.ProvinceStartApp
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.ProvinceStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/provincestat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/provincestat/part-r-00000
(3)执行com.imooc.bigdata.hadoop.mr.project.mr.PageStatApp
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PageStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pagestat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/pagestat/part*
(4)执行com.imooc.bigdata.hadoop.mr.project.mrv2.ETLApp
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.ETLApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/input/etl/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/input/etl/part*
(5)执行com.imooc.bigdata.hadoop.mr.project.mrv2.ProvinceStatV2App
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.ProvinceStatV2App hdfs://hadoop000:8020/project/input/etl/ hdfs://hadoop000:8020/project/output/v2/provincestatv2/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v2/provincestatv2/part*
(6)执行com.imooc.bigdata.hadoop.mr.project.mrv2.PVStatV2App
hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.PVStatV2App hdfs://hadoop000:8020/project/input/etl/ hdfs://hadoop000:8020/project/output/v2/pvstatv2/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v2/pvstatv2/part*
11、总结
大数据处理完以后的数据,是存放在HDFS上
其实大数据干的事情基本就这么多
再进一步:需要使用技术或者框架把处理完的结果导出到数据库中
Sqoop:把HDFS上的统计结果导出到MySQL中。