(实际开发大数据分析系统)网络爬虫获取数据与销售数据分析系统
大数据分析应用系统的完整开发过程分为数据采集、数据存储、数据计算和数据分析和展示四个部分。
- 数据采集:WebCollector框架
- 数据存储:SQL
- 数据计算:
- 数据分析和展示:Java EE
流程就是:在Windows上使用WebCollector数据采集模块来采集数据并得到Windows下面的
一、数据采集
大数据应用的第一步是数据采集,这一步往往是最困难的一步。
这里采用WebCollector开发的数据采集模块,在Windows上进行数据采集,这也是实际生产中的常用方式。
此次采集的是京东的销售数据。
1.为了在本地用数据库存储采集下来的数据,需要在本地新建一个数据库
(1)创建数据库
mysql> create database jd_db; Query OK, 1 row affected (0.01 sec)
(2)创建数据表
mysql> use jd_db; Database changed mysql> create table spider( -> id int(11) not null auto_increment, -> platform varchar(255) default null, -> type varchar(255) default null, -> title varchar(255) default null, -> content text default null, -> memberlevel varchar(255) default null, -> fromplatform varchar(255) default null, -> area varchar(255) default null, -> userimpression varchar(255) default null, -> color varchar(255) default null, -> price varchar(255) default null, -> productSize varchar(255) default null, -> creationTime varchar(255) default null, -> zhuqutime varchar(255) default null, -> lable varchar(255) default null, -> primary key(id) -> )engine=MyISAM auto_increment=19712 default charset=utf8; Query OK, 0 rows affected (0.04 sec)
(3)查看数据表
mysql> desc spider; +----------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | platform | varchar(255) | YES | | NULL | | | type | varchar(255) | YES | | NULL | | | title | varchar(255) | YES | | NULL | | | content | text | YES | | NULL | | | memberlevel | varchar(255) | YES | | NULL | | | fromplatform | varchar(255) | YES | | NULL | | | area | varchar(255) | YES | | NULL | | | userimpression | varchar(255) | YES | | NULL | | | color | varchar(255) | YES | | NULL | | | price | varchar(255) | YES | | NULL | | | productSize | varchar(255) | YES | | NULL | | | creationTime | varchar(255) | YES | | NULL | | | zhuqutime | varchar(255) | YES | | NULL | | | lable | varchar(255) | YES | | NULL | | +----------------+--------------+------+-----+---------+----------------+ 15 rows in set (0.01 sec)
2.导入爬虫程序
二、在HBase集群上准备数据
大数据领域从SQL向HBase表导入数据有三种方法:Sqoop方法、JavaAPI方法、Import Tsv方法。下面只介绍第一种。
1.将数据采集部分获得的数据导入到MySQL表中
(1)和在Windows上操作一样,创建相同的数据库和数据表
(2)向数据库中导入采集到的数据,即.sql文件
mysql> source /home/jun/Resources/jd_data.sql;
(3)查看数据库是否已经成功导入数据
mysql> select * from spider limit 5; +--------+----------+----------------------+-----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+--------------+------------------------+--------+------------------------------------------------------------------------+-----------+-------+-------------+---------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------+ | id | platform | xinhao | title | content | memberlevel | fromplatform | area | userimpression | color | price | productSize | creationTime | zhuaqutime | lable | +--------+----------+----------------------+-----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+--------------+------------------------+--------+------------------------------------------------------------------------+-----------+-------+-------------+---------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------+ | 275301 | 京东 | 华为荣耀畅玩5X | 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 | 真正的爱情经得起平淡的流年 | 金牌会员 | 京东PC客户端 | null | 比较一般 分辨率高 功能齐全 通话质量好 待机时间长 | 暗夜灰 | 1399 | 移动4G | 2016-04-22 15:36:58 | 20160422164523 | 系统流畅,性价比高,功能齐全,反应快,信号稳定,屏幕大,外观漂亮,国民手机,分辨率高,通话质量好 | | 275302 | 京东 | 华为荣耀畅玩5X | 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 | 不错不错,只是正面什么字都没有,没有logo,带上套子有点山寨,呵呵 | 钻石会员 | 京东iPhone客户端 | 北京 | null | 落日金 | 1399 | 全网通 | 2016-04-22 15:33:37 | 20160422164523 | null | | 275303 | 京东 | 华为荣耀畅玩5X | 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 | 好。 | 金牌会员 | 京东Android客户端 | 江苏 | null | 落日金 | 1399 | 全网通 | 2016-04-22 15:32:35 | 20160422164523 | null | | 275304 | 京东 | 华为荣耀畅玩5X | 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 | 京东品质值得信赖。 | 钻石会员 | 京东iPhone客户端 | 上海 | null | 破晓银 | 1399 | 移动4G | 2016-04-22 15:31:35 | 20160422164523 | null | | 275305 | 京东 | 华为荣耀畅玩5X | 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 | 感觉还行,用的比较顺手 | 金牌会员 | 京东Android客户端 | null | null | 破晓银 | 1399 | 移动4G | 2016-04-22 15:27:32 | 20160422164523 | null | +--------+----------+----------------------+-----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------+--------------+------------------------+--------+------------------------------------------------------------------------+-----------+-------+-------------+---------------------+----------------+--------------------------------------------------------------------------------------------------------------------------------+ 5 rows in set (0.00 sec)
2.将MySQL表中的数据导入到HBase中
(1)执行start-hbase.sh脚本来启动HBase集群,然后进入HBase安装目录下的bin下执行./hbase shell进入shell
[jun@master bin]$ ./hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.6.1, rUnknown, Sun Jun 3 23:19:26 CDT 2018 hbase(main):001:0> list TABLE test1 1 row(s) in 0.1990 seconds => ["test1"]
(2)创建一个HBase表(注意HBase的表明是区分大小写的)其中f1是族名
hbase(main):002:0> create 'PINGJIA.SPIDER', 'f1' 0 row(s) in 2.4370 seconds => Hbase::Table - PINGJIA.SPIDER
(3)通过Sqoop将MySQL表spider中的数据,导入到HBase表PINGJIA.SPIDER里面。
[jun@master bin]$ cd /home/jun/sqoop-1.4.7.bin__hadoop-2.6.0/ [jun@master sqoop-1.4.7.bin__hadoop-2.6.0]$ bin/s sqoop sqoop-eval sqoop-import-all-tables sqoop-list-tables start-metastore.sh sqoop.cmd sqoop-export sqoop-import-mainframe sqoop-merge stop-metastore.sh sqoop-codegen sqoop-help sqoop-job sqoop-metastore sqoop-create-hive-table sqoop-import sqoop-list-databases sqoop-version [jun@master sqoop-1.4.7.bin__hadoop-2.6.0]$ bin/sqoop import --connect jdbc:mysql://master:3306/jd_db?useSSL=false --username root -P --table spider --hbase-table PINGJIA.SPIDER --column-family f1 --hbase-row-key id --hbase-create-table -m 1
出现了一条异常,原因是因为MySQL权限不够,需要开放权限
18/07/25 09:45:28 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Access denied for user 'root'@'master' (using password: YES)
打开MySQL,执行下面的赋权限命令
mysql> use jd_db; Database changed
mysql> grant all privileges on *.* to 'root'@'master' identified by 'bjtungirc';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> grant all privileges on *.* to 'root'@'slave0' identified by 'bjtungirc';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> grant all privileges on *.* to 'root'@'slave1' identified by 'bjtungirc';
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
然后重新执行bin/sqoop命令,可以看到执行成功,一共导入了427658条数据。
18/07/25 10:06:31 INFO mapreduce.Job: map 0% reduce 0% 18/07/25 10:08:21 INFO mapreduce.Job: map 100% reduce 0% 18/07/25 10:08:22 INFO mapreduce.Job: Job job_1532481435423_0002 completed successfully 18/07/25 10:08:22 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=209541 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=0 HDFS: Number of read operations=1 HDFS: Number of large read operations=0 HDFS: Number of write operations=0 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=104793 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=104793 Total vcore-milliseconds taken by all map tasks=104793 Total megabyte-milliseconds taken by all map tasks=107308032 Map-Reduce Framework Map input records=427658 Map output records=427658 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=2100 CPU time spent (ms)=118740 Physical memory (bytes) snapshot=230121472 Virtual memory (bytes) snapshot=2143203328 Total committed heap usage (bytes)=105381888 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=0 18/07/25 10:08:22 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 142.1074 seconds (0 bytes/sec) 18/07/25 10:08:22 INFO mapreduce.ImportJobBase: Retrieved 427658 records.
查看HBase集群中导入的数据总条数
hbase(main):004:0> count 'PINGJIA.SPIDER' ... Current count: 425000, row: 700300 Current count: 426000, row: 701300 Current count: 427000, row: 702300 427658 row(s) in 53.9180 seconds => 427658
三、使用Phoenix作为中间件
Phoenix是构建在HBase上的一个SQL层,使得用户可以用标准的JDBC APIs,而不是HBase客户端APIs来创建表、插入数据和对HBase数据进行查询。
Phoenix查询引擎会将SQL查询转换为一个或多个HBase扫描,并编排执行以生成标准的JDBC结果集。
1,Phoenix的安装
(1)http://phoenix.apache.org/download.html下载apache-phoenix-4.9.0-HBase-1.2-bin.tar.gz到指定目录并解压
(2)修改Linux环境变量,并source生效
#phoenix export PHOENIX_HOME=/home/jun/apache-phoenix-4.9.0-HBase-1.2-bin/ export PATH=$PHOENIX_HOME/bin:$PATH
(3)复制三个依赖包到Master和Slave节点的HBase lib中
cp /home/jun/apache-phoenix-4.9.0-HBase-1.2-bin/phoenix-4.9.0-HBase-1.2-client.jar /home/jun/apache-phoenix-4.9.0-HBase-1.2-bin/phoenix-4.9.0-HBase-1.2-server.jar /home/jun/apache-phoenix-4.9.0-HBase-1.2-bin/phoenix-core-4.9.0-HBase-1.2.jar /home/jun/hbase-1.2.6.1/lib/ [jun@master apache-phoenix-4.9.0-HBase-1.2-bin]$ scp phoenix-4.9.0-HBase-1.2-client.jar phoenix-4.9.0-HBase-1.2-server.jar phoenix-core-4.9.0-HBase-1.2.jar -r jun@slave0:~/hbase-1.2.6.1/lib/ phoenix-4.9.0-HBase-1.2-client.jar 100% 82MB 82.3MB/s 00:01 phoenix-4.9.0-HBase-1.2-server.jar 100% 24MB 49.9MB/s 00:00 phoenix-core-4.9.0-HBase-1.2.jar 100% 3816KB 47.6MB/s 00:00 [jun@master apache-phoenix-4.9.0-HBase-1.2-bin]$ scp phoenix-4.9.0-HBase-1.2-client.jar phoenix-4.9.0-HBase-1.2-server.jar phoenix-core-4.9.0-HBase-1.2.jar -r jun@slave1:~/hbase-1.2.6.1/lib/ phoenix-4.9.0-HBase-1.2-client.jar 100% 82MB 82.2MB/s 00:01 phoenix-4.9.0-HBase-1.2-server.jar 100% 24MB 49.5MB/s 00:00 phoenix-core-4.9.0-HBase-1.2.jar 100% 3816KB 60.7MB/s 00:00
(4)重启HBase集群
stop-hbase.sh
start-hbase.sh
2.使用Phoenix Shell访问HBase
(1)进入Phoenix Shell
[jun@master ~]$ cd /home/jun/apache-phoenix-4.9.0-HBase-1.2-bin/ [jun@master apache-phoenix-4.9.0-HBase-1.2-bin]$ bin/sqlline.py 192.168.1.100:2181 Setting property: [incremental, false] Setting property: [isolation, TRANSACTION_READ_COMMITTED] issuing: !connect jdbc:phoenix:192.168.1.100:2181 none none org.apache.phoenix.jdbc.PhoenixDriver Connecting to jdbc:phoenix:192.168.1.100:218118/07/25 10:38:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Connected to: Phoenix (version 4.9) Driver: PhoenixEmbeddedDriver (version 4.9) Autocommit status: true Transaction isolation: TRANSACTION_READ_COMMITTED Building list of tables and columns for tab-completion (set fastconnect to true to skip)... 87/87 (100%) Done Done sqlline version 1.2.0 0: jdbc:phoenix:192.168.1.100:2181> help
(2)在Phoenix中创建和HBase中对应的表
0: jdbc:phoenix:192.168.1.100:2181> create table PINGJIA.SPIDER( . . . . . . . . . . . . . . . . . > id varchar primary key, . . . . . . . . . . . . . . . . . > "f1". "platform" varchar, . . . . . . . . . . . . . . . . . > "f1". "xinhao" varchar, . . . . . . . . . . . . . . . . . > "f1". "title" varchar, . . . . . . . . . . . . . . . . . > "f1". "content" varchar, . . . . . . . . . . . . . . . . . > "f1". "menberlevel" varchar, . . . . . . . . . . . . . . . . . > "f1". "fromplatform" varchar, . . . . . . . . . . . . . . . . . > "f1". "area" varchar, . . . . . . . . . . . . . . . . . > "f1". "userimpression" varchar, . . . . . . . . . . . . . . . . . > "f1". "color" varchar, . . . . . . . . . . . . . . . . . > "f1". "price" varchar, . . . . . . . . . . . . . . . . . > "f1". "productSize" varchar, . . . . . . . . . . . . . . . . . > "f1". "creationTime" varchar, . . . . . . . . . . . . . . . . . > "f1". "zhuqutime" varchar, . . . . . . . . . . . . . . . . . > "f1". "lable" varchar); 427,658 rows affected (26.362 seconds)
可以看到,在新建表之后,就有427658条数据被关联了。实际上,Phoenix本质上是一种HBase的查询工具。在Phoenix中创建表,实际上是通过JDBC建立与HBase表的一个连接,而数据仍然是存储在HBase表中的,但现在可以通过Phoenix操作HBase表了。因此,这就要求Phoenix中创建的表与用户所希望操作的Hbase表存在对应关系,这种对应必须完全一致。
(3)查看所有表
0: jdbc:phoenix:192.168.1.100:2181> !tables
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+---------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION | INDEX_STATE |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+---------------+
| | SYSTEM | CATALOG | SYSTEM TABLE | | | | | |
| | SYSTEM | FUNCTION | SYSTEM TABLE | | | | | |
| | SYSTEM | SEQUENCE | SYSTEM TABLE | | | | | |
| | SYSTEM | STATS | SYSTEM TABLE | | | | | |
| | PINGJIA | SPIDER | TABLE | | | | | |
+------------+--------------+-------------+---------------+----------+------------+----------------------------+-----------------+---------------+
0: jdbc:phoenix:192.168.1.100:2181>
(4)查看表结构
0: jdbc:phoenix:192.168.1.100:2181> !describe PINGJIA.SPIDER +------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+--------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PR | +------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+--------+ | | PINGJIA | SPIDER | ID | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | platform | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | xinhao | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | title | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | content | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | memberlevel | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | fromplatform | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | area | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | userimpression | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | color | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | price | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | productSize | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | creationTime | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | zhuqutime | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | lable | 12 | VARCHAR | null | null | null | null | +------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+--------+ 0: jdbc:phoenix:192.168.1.100:2181>
(3)通过Phoenix来查看HBase中的PINGJIA.SPIDER表的内容。Phoenix的查询命令和标准SQL一致
0: jdbc:phoenix:192.168.1.100:2181> select * from PINGJIA.SPIDER limit 5; +---------+-----------+-----------+--------------------------------------------------+-------------------------------------+--------------+------+ | ID | platform | xinhao | title | content | menberlevel | from | +---------+-----------+-----------+--------------------------------------------------+-------------------------------------+--------------+------+ | 275301 | 京东 | 华为荣耀畅玩5X | 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 | 真正的爱情经得起平淡的流年 | | 京东PC | | 275302 | 京东 | 华为荣耀畅玩5X | 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 | 不错不错,只是正面什么字都没有,没有logo,带上套子有点山寨,呵呵 | | 京东iP | | 275303 | 京东 | 华为荣耀畅玩5X | 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 | 好。 | | 京东An | | 275304 | 京东 | 华为荣耀畅玩5X | 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 | 京东品质值得信赖。 | | 京东iP | | 275305 | 京东 | 华为荣耀畅玩5X | 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 | 感觉还行,用的比较顺手 | | 京东An | +---------+-----------+-----------+--------------------------------------------------+-------------------------------------+--------------+------+ 5 rows selected (0.49 seconds)
3.使用JDBC通过Phoenix访问HBase
实现代码(需要将Phoenix安装目录下所有的jar包拷贝到lib下并add Path):
package com.jun; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class Phoenix { private static String driver = "org.apache.phoenix.jdbc.PhoenixDriver"; public static void main(String[] args) throws SQLException { try { Class.forName(driver); } catch (ClassNotFoundException e) { e.printStackTrace(); } Statement stmt = null; ResultSet rs = null; Connection con = DriverManager.getConnection("jdbc:phoenix:192.168.1.100:2181"); stmt = con.createStatement(); String sql = "select * from PINGJIA.SPIDER limit 5"; rs = stmt.executeQuery(sql); while (rs.next()) { System.out.println(rs.getString(1)+" "+rs.getString(2)+" "+rs.getString(3) +" "+rs.getString(4)+" "+rs.getString(5)+" "+rs.getString(6)+" "+rs.getString(7) +" "+rs.getString(8)+" "+rs.getString(9)+" "+rs.getString(10)+" "+rs.getString(11) +" "+rs.getString(12)+" "+rs.getString(13)+" "+rs.getString(14)+" "+rs.getString(15)); } stmt.close(); con.close(); } }
输出:
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/jun/workspace/PhoenixSQL/lib/phoenix-4.9.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/jun/workspace/PhoenixSQL/lib/phoenix-4.9.0-HBase-1.2-hive.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/jun/workspace/PhoenixSQL/lib/phoenix-4.9.0-HBase-1.2-pig.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/jun/workspace/PhoenixSQL/lib/phoenix-4.9.0-HBase-1.2-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 275301 京东 华为荣耀畅玩5X 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 真正的爱情经得起平淡的流年 null 京东PC客户端 null 比较一般 分辨率高 功能齐全 通话质量好 待机时间长 暗夜灰 1399 移动4G 2016-04-22 15:36:58 null 系统流畅,性价比高,功能齐全,反应快,信号稳定,屏幕大,外观漂亮,国民手机,分辨率高,通话质量好 275302 京东 华为荣耀畅玩5X 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 不错不错,只是正面什么字都没有,没有logo,带上套子有点山寨,呵呵 null 京东iPhone客户端 北京 null 落日金 1399 全网通 2016-04-22 15:33:37 null null 275303 京东 华为荣耀畅玩5X 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 好。 null 京东Android客户端 江苏 null 落日金 1399 全网通 2016-04-22 15:32:35 null null 275304 京东 华为荣耀畅玩5X 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 京东品质值得信赖。 null 京东iPhone客户端 上海 null 破晓银 1399 移动4G 2016-04-22 15:31:35 null null 275305 京东 华为荣耀畅玩5X 荣耀 畅玩5X(KIW-AL10)3GB+16GB内存版 灰色 移动联通电信4G手机 双卡双待 感觉还行,用的比较顺手 null 京东Android客户端 null null 破晓银 1399 移动4G 2016-04-22 15:27:32 null null
四、基于Web的前端展示
总结一下在前端展示之前需要准备的工作:
- (Master)启动HDFS,执行start-dfs.sh
- (Master)启动Yarn,执行start-yarn.sh
- (Master、Slave0、Slave1)分别启动独立的ZooKeeper集群,执行/home/jun/zookeeper-3.4.12/bin/zkServer.sh start
- (Master)启动HBase,执行start-hbase.sh
1.启动Eclipse-加载Web项目
(1)代码结构
(2)数据库连接配置代码
public class DataBase { private static final String DRIVER="com.mysql.jdbc.Driver"; private static final String URL="jdbc:mysql://localhost:3306/jd_pingjia?useUnicode=true&characterEncoding=utf-8&useSSL=false"; private static final String USER="root"; private static final String PASSWORD="bjtungirc"; private static Connection connection; private static java.sql.PreparedStatement pstmt; private static ResultSet resultSet;
...
}
(3)Phoenix连接代码
public class phoenix_Hbase { //驱动名称: private static final String DRIVER="org.apache.phoenix.jdbc.PhoenixDriver"; //访问zookeeper地址 private static final String URL = "jdbc:phoenix:192.168.1.100:2181"; //用户名: private static final String USERNMAE=""; //密码: private static final String PASSWORD="";
private static Connection connection ; private static ResultSet resultSet ; private static Statement statement ; public static Connection getConnection() throws SQLException{ try { Class.forName(DRIVER); connection=DriverManager.getConnection(URL); } catch (ClassNotFoundException e) { e.printStackTrace(); } return connection; } public static void closeConnection(Connection c,Statement s,ResultSet r){ try { c.close(); s.close(); r.close(); } catch (SQLException e) { e.printStackTrace(); } } }
2.安装Tomcat-启动程序
3.在浏览器中输入http://master:8080/new_pingjia_hbase/index.jsp即可查看相关内容。
从浏览器中可以看到下面的数据分别应该用哪些部分展示。
先来回顾一下数据表中有哪些信息,并且考虑一下需要展示哪些信息。
+------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+--------+ | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE | TYPE_NAME | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PR | +------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+--------+ | | PINGJIA | SPIDER | ID | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | platform | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | xinhao | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | title | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | content | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | menberlevel | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | fromplatform | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | area | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | userimpression | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | color | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | price | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | productSize | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | creationTime | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | zhuqutime | 12 | VARCHAR | null | null | null | null | | | PINGJIA | SPIDER | lable | 12 | VARCHAR | null | null | null | null | +------------+--------------+-------------+-----------------+------------+------------+--------------+----------------+-----------------+--------+
对于这些种类的数据来说,有下面的一些展示方式,还有一些很多别的方式,就看用户想关注什么样的信息了,实际应用中需要根据需求来决定展示哪些数据和按照什么方式展示数据。
- 仪表盘:电商平台树、用户购买手机总量、数据总条数
- 饼状图:客户端来源(PC、Android等)、运营商,用户对颜色的喜爱
- 曲线图:买家会员等级(钻石会员、金牌会员等)
- 统计直方图:用户购买印象,手机品牌销量排行,不同等级会员买的手机价格
- 地图:销量来源
- 等等