电商项目实战-打包至服务器上运行(十三)

1、更改输入、输出路径

(1)输入路径为:args[0]

(2)输出路径为:args[1]

 

2、修改IPParser.java

src/main/java/project/utils/IPParser.java

目前本机的IP库是放在ip/qqwry.dat

要修改为:

    //本机ip库路径
    //private static final String ipFilePath = "ip/qqwry.dat";
    //服务器端ip库路径
    private static final String ipFilePath = "/home/hadoop/lib/qqwry.dat";

 

3、修改pom.xml

使用1.8来编译

在<project></project>中间添加:

<build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

 

4、本机打包,上传至服务器

本机cmd中:

C:\Users\jieqiong>cd C:\Users\jieqiong\IdeaProjects\hadoop-train-v2
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2>mvn clean package -DskipTests
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2>cd target/
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target>dir
 驱动器 C 中的卷是 Windows-SSD
 卷的序列号是 F0E4-86A5
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target 的目录

2021/07/27  16:19    <DIR>          .
2021/07/27  16:19    <DIR>          ..
2021/07/27  16:19    <DIR>          classes
2021/07/27  16:19    <DIR>          generated-sources
2021/07/27  16:19    <DIR>          generated-test-sources
2021/07/27  16:19            51,390 hadoop-train-v2-1.0.jar
2021/07/27  16:19    <DIR>          maven-archiver
2021/07/27  16:19    <DIR>          maven-status
2021/07/27  16:19    <DIR>          test-classes
               1 个文件         51,390 字节
               8 个目录 148,776,701,952 可用字节
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\target>scp hadoop-train-v2-1.0.jar hadoop@192.168.131.101:~lib/
C:\Users\jieqiong\IdeaProjects\hadoop-train-v2\ip>scp qqwry.dat hadoop@192.168.131.101:~lib/

 

5、服务器端上传好的文件

[hadoop@hadoop000 lib]$ pwd
/home/hadoop/lib
[hadoop@hadoop000 lib]$ ls
hadoop-train-v2-1.0.jar  qqwry.dat

 

6、data文件夹中的数据

[hadoop@hadoop000 data]$ pwd
/home/hadoop/data
[hadoop@hadoop000 data]$ ls
access.log     data.txt  emp.txt      helloworld.txt  part-r-00000
accessOwn.log  dept.txt  emp.txt-bak  h.txt           trackinfo_20130721.data

 

6、将trackinfo_20130721.data上传至hdfs中的/project/input/raw(hdfs中本存在,之后使用到的都是已传好的版本,非自己的版本,注意路径)

[hadoop@hadoop000 data]$ hadoop fs -mkdir -p /project/input/raw
[hadoop@hadoop000 data]$ hadoop fs -put trackinfo_20130721.data /project/input/raw
[hadoop@hadoop000 data]$ hadoop fs -ls /project/input/raw
Found 1 items
-rw-r--r--   1 hadoop supergroup  173555592 2018-12-09 08:50 /project/input/raw/trackinfo_20130721.data

 

7、写脚本

 在/shell/pv.sh

没有pv.sh文件,使用vi直接创建并进入。

[hadoop@hadoop000 ~]$ clear
[hadoop@hadoop000 ~]$ cd shell/
[hadoop@hadoop000 shell]$ ls
[hadoop@hadoop000 shell]$ vi pv.sh

在pv.sh文件写入:

hadoop jar + "在hdfs中的jar包路径及jar包名" + “要运行的某一java的copy reference” + "数据输入路径" + ”数据输出路径“

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pvstat/

 

8、运行

先设置执行权限,再执行。

[hadoop@hadoop000 shell]$ chmod u+x pv.sh
[hadoop@hadoop000 shell]$ .
/pv.sh

(1)执行com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp

注意:要复制类名的Copy Reference

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PVStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pvstat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/pvstat/part-r-00000
300000

(2)执行com.imooc.bigdata.hadoop.mr.project.mr.ProvinceStartApp

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.ProvinceStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/provincestat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/provincestat/part-r-00000

(3)执行com.imooc.bigdata.hadoop.mr.project.mr.PageStatApp

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mr.PageStatApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/output/v1/pagestat/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v1/pagestat/part*

(4)执行com.imooc.bigdata.hadoop.mr.project.mrv2.ETLApp

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.ETLApp hdfs://hadoop000:8020/project/input/raw/ hdfs://hadoop000:8020/project/input/etl/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/input/etl/part*

(5)执行com.imooc.bigdata.hadoop.mr.project.mrv2.ProvinceStatV2App

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.ProvinceStatV2App hdfs://hadoop000:8020/project/input/etl/ hdfs://hadoop000:8020/project/output/v2/provincestatv2/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v2/provincestatv2/part*

(6)执行com.imooc.bigdata.hadoop.mr.project.mrv2.PVStatV2App

hadoop jar /home/hadoop/lib/hadoop-train-v2-1.0.jar com.imooc.bigdata.hadoop.mr.project.mrv2.PVStatV2App hdfs://hadoop000:8020/project/input/etl/ hdfs://hadoop000:8020/project/output/v2/pvstatv2/
[hadoop@hadoop000 shell]$ hadoop fs -text /project/output/v2/pvstatv2/part*

 

11、总结

大数据处理完以后的数据,是存放在HDFS上
其实大数据干的事情基本就这么多
再进一步:需要使用技术或者框架把处理完的结果导出到数据库中
Sqoop:把HDFS上的统计结果导出到MySQL中。

 


 

posted @ 2021-07-28 15:02  酱汁怪兽  阅读(66)  评论(0编辑  收藏  举报