|NO.Z.00047|——————————|BigDataEnd|——|Hadoop&实时数仓.V27|——|项目.v27|需求三:数据处理&增量统计广告.V1|——|需求分析|
一、需求3:每隔5秒统计最近1小时内广告的点击量---增量

二、实现步骤
### --- 实现步骤
~~~ 获取数据源(input)--- flume
~~~ 转化
~~~ 数据源的格式:area:uid:product_id:time: 样例类中AdClick;过滤操作filter product_id != null
~~~ .水印Watemark、.keyBy(productId) 、.timeWindow
~~~ .aggregate(MyAggFunc,MyWindowFunc)
~~~ MyAggFunc:编写计算逻辑的代码---- 累加广告的点击次数
~~~ MyWindowFunc:apply,将结果数据向下游传递
三、增量统计广告点击率:每隔5秒统计最近1小时内广告的点击量$增量统计
### --- MyAggFunc:编写计算逻辑的代码--累加广告的点击次数
{
"yanqi_event": [
{
"name": "goods_detail_loading",
"json": {
"entry": "2",
"goodsid": "0",
"loading_time": "92",
"action": "3",
"staytime": "10",
"showtype": "0"
},
"time": 1595265099584
},
{
"name": "notification",
"json": {
"action": "1",
"type": "3"
},
"time": 1595341087663
},
{
"name": "ad",
"json": {
"duration": "10",
"ad_action": "0",
"shop_id": "23",
"event_type": "ad",
"ad_type": "1",
"show_style": "0",
"product_id": "36",
"place": "placecampaign2_left",
"sort": "1"
},
"time": 1595276738208
}
],
"attr": {
"area": "东莞",
"uid": "2F10092A0",
"app_v": "1.1.0",
"event_type": "common",
"device_id": "1FB872-9A1000",
"os_type": "1.1",
"channel": "广宣",
"language": "chinese",
"brand": "iphone-0"
}
}
四、数据类型转换:
### --- ODS:kafka中eventlog中;DIM:无;DWD:event.log文件说明:不规范json格式,需要转换
{
"data": [
{
"id": "6",
"payMethod": "meituan",
"payName": "美团支付",
"description": "美团支付",
"payOrder": "0",
"online": "-1"
}
],
"database": "dwshow",
"es": 1604461572000,
"id": 6,
"isDdl": false,
"mysqlType": {
"id": "int(11)",
"payMethod": "varchar(20)",
"payName": "varchar(255)",
"description": "varchar(255)",
"payOrder": "int(11)",
"online": "tinyint(4)"
},
"old": null,
"pkNames": null,
"sql": "",
"sqlType": {
"id": 4,
"payMethod": 12,
"payName": 12,
"description": 12,
"payOrder": 4,
"online": -6
},
"table": "yanqi_payments",
"ts": 1604461572297,
"type": "INSERT"
}
{
"yanqi_event": [
{
"name": "goods_detail_loading",
"json": {
"entry": "2",
"goodsid": "0",
"loading_time": "92",
"action": "3",
"staytime": "10",
"showtype": "0"
},
"time": 1595265099584
},
{
"name": "notification",
"json": {
"action": "1",
"type": "3"
},
"time": 1595341087663
},
{
"name": "ad",
"json": {
"duration": "10",
"ad_action": "0",
"shop_id": "23",
"event_type": "ad",
"ad_type": "1",
"show_style": "0",
"product_id": "36",
"place": "placecampaign2_left",
"sort": "1"
},
"time": 1595276738208
}
],
"attr": {
"area": "东莞",
"uid": "2F10092A0",
"app_v": "1.1.0",
"event_type": "common",
"device_id": "1FB872-9A1000",
"os_type": "1.1",
"channel": "广宣",
"language": "chinese",
"brand": "iphone-0"
}
}
### --- 转换代码
//对Kafka中的JSON日志进行转换
val mapEventStream: DataStream[AdClick] = eventLogStream.map(x => {
val jsonObj: JSONObject = JSON.parseObject(x)
val attr: String = jsonObj.get("attr").toString
val attrJson: JSONObject = JSON.parseObject(attr)
val area: String = attrJson.get("area").toString
val uid: String = attrJson.get("uid").toString
//[{"name":"praise","json":
{"id":0,"type":4,"add_time":"1597851188753","userid":0,"target":8},
// "time":1595329059805}]
// 此处的时间戳是毫秒
val eventData: String = jsonObj.get("yanqi_event").toString
val datas: JSONArray = JSON.parseArray(eventData)
val list = new java.util.ArrayList[String]()
datas.forEach(x => list.add(x.toString))
var productId: String = null
var timestamp: Long = 0L
list.forEach(x => {
//{"name":"ad","json":
{"duration":"10","ad_action":"0","shop_id":"23","event_type":"ad","ad_type":"1",
//
"show_style":"0","product_id":"36","place":"placecampaign2_left","sort":"1"},"time":15952767
38208}
val xJson: JSONObject = JSON.parseObject(x)
if (xJson.get("name").toString.equals("ad")) {
val jsonData: String = xJson.get("json").toString
val jsonDatas = JSON.parseObject(jsonData)
productId = jsonDatas.get("product_id").toString
timestamp = TimeUnit.MILLISECONDS.toSeconds(xJson.get("time").toString.toLong)
}
})
AdClick(area, uid, productId, timestamp)
})
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
分类:
bdv026-EB实时数仓
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通