car-travel project

Some part of code can be download from

Only a part of the project file, OrderStreamingProcessor.scala is for part 3. Virtual Station

                                           KafkaManager.scala is for part 4


Project Scope

This is a data engineer project, supporting stream tracking of cab, cab order calculation, Virtual Station calculation and data wraehousing.


Project Structure



1. Flume-kafka-redis-hbase pipeline

Purpose: Real-time order track for taxi

Log on to cloudera , start kafka





connect flume successful

Configre one flume agent to one kafka topic, there is also flume catlog sending to many topics.


 flume:node02 redis:node01

 GPSConsumer: produce data,flume monitor and send to kafka,kafka give redis

attention:redis needs to start,flume start,redis password correctly set.



The tracked cab would move according to streaming of the shell script


2. Flume-kafka-sparkstreaming-redis pipeline

Purpose: Caluculate real-time order presenting to users



OrderStreamingProcessor:(store in redis)

The order count would go up with real-time order per hour


3. Hbase-sparksql-spark-hbase-jdbc

Purpose: Spark calculating Virtual Station for customers


What is Virtual Station

Virtual Station is a virtual getting on spot for cab drivers and customers. They found that it would be a waste of time if customers say something like 'pick me up near the bridge', so the APP will suggest potential getting on spots where a lot of other users getting on cabs.





 data pre-processing


 database selection outcome covert to json

time manage


 1.Virtual Station,uber h3
2.spark offline task
hbase->spark load
4.Virtual Station,Map
5.phoenix+hbase  -> jdbc service
6.web->jdbc service

Phoenix install


install python2(python2 code running in python3 environment)

conda create --name python2 python=2.7
source activate python2




create phoenix view for hbase table





Virtual_Stations are caluculated in the city with 100+ people getting on an off the cab near certain points in one day. 



4. MySQL-maxwell-kafka-hbase

Purpose:Data warehousing from mySQL to hbase



Data Warehouse helps to integrate many sources of data to reduce stress on the production system. Data warehouse helps to reduce total turnaround time for analysis and reporting. Restructuring and Integration make it easier for the user to use for reporting and analysis.










Hbase loading balance:

1pre partition

2rowkey setting(no more than 64 bit):





Install KafkaOffsetMonitor

where maxwell binlog locates



rollback binlog data

sql Bootstrap table

 daily environment setting:

        <!-- daily environment-->

        <!-- developing environment-->

        <!-- testing environment-->








IntelliJ IDEA build project "xxx package" or "cannot find symbol"


SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”

download slf4j-simple-1.6.2.jar modify xml


No appenders could be found for logger(log4j)?

Under project resources modify




virtual box start problem:

VirtualBox.xml is empty

Your problem is that you have a corrupt "VirtualBox.xml" file in the location contained in the error message, '/Users/alexanderevans/Library/VirtualBox/VirtualBox.xml'. In that same folder there's a "VirtualBox.xml-prev" file. Delete the "VirtualBox.xml" file and rename the "VirtualBox.xml-prev" to "VirtualBox.xml". Try it again.


Error compiling sbt component 'compiler-interface-2.11.1-52.0'





idea package dependency:

1.check mark as diractory




linux root user: su hdfs ///hadoop dfs -chmod 777 /sparkapp







abbr. 超文本传输协议安全(Hyper Text Transfer Protocol)
HTTPS: 安全超文本传输协议(Hypertext Transfer Protocol Secure)
android https: 通信安全



posted @ 2019-11-30 18:57  cschen588  阅读(281)  评论(0编辑  收藏  举报