|NO.Z.00047|——————————|BigDataEnd|——|Hadoop&实时数仓.V27|——|项目.v27|需求三:数据处理&增量统计广告.V1|——|需求分析|

一、需求3:每隔5秒统计最近1小时内广告的点击量---增量
二、实现步骤
### --- 实现步骤

~~~     获取数据源(input)--- flume
~~~     转化
~~~     数据源的格式:area:uid:product_id:time:  样例类中AdClick;过滤操作filter product_id != null
~~~     .水印Watemark、.keyBy(productId) 、.timeWindow       
~~~     .aggregate(MyAggFunc,MyWindowFunc)
~~~     MyAggFunc:编写计算逻辑的代码---- 累加广告的点击次数
~~~     MyWindowFunc:apply,将结果数据向下游传递

三、增量统计广告点击率:每隔5秒统计最近1小时内广告的点击量$增量统计
### --- MyAggFunc:编写计算逻辑的代码--累加广告的点击次数

{
  "yanqi_event": [
    {
      "name": "goods_detail_loading",
      "json": {
        "entry": "2",
        "goodsid": "0",
        "loading_time": "92",
        "action": "3",
        "staytime": "10",
        "showtype": "0"
      },
      "time": 1595265099584
    },
    {
      "name": "notification",
      "json": {
        "action": "1",
        "type": "3"
      },
      "time": 1595341087663
    },
    {
      "name": "ad",
      "json": {
        "duration": "10",
        "ad_action": "0",
        "shop_id": "23",
        "event_type": "ad",
        "ad_type": "1",
        "show_style": "0",
        "product_id": "36",
        "place": "placecampaign2_left",
        "sort": "1"
      },
      "time": 1595276738208
    }
  ],
  "attr": {
    "area": "东莞",
    "uid": "2F10092A0",
    "app_v": "1.1.0",
    "event_type": "common",
    "device_id": "1FB872-9A1000",
    "os_type": "1.1",
    "channel": "广宣",
    "language": "chinese",
    "brand": "iphone-0"
  }
}
四、数据类型转换:
### --- ODS:kafka中eventlog中;DIM:无;DWD:event.log文件说明:不规范json格式,需要转换

{
  "data": [
    {
      "id": "6",
      "payMethod": "meituan",
      "payName": "美团支付",
      "description": "美团支付",
      "payOrder": "0",
      "online": "-1"
    }
  ],
  "database": "dwshow",
  "es": 1604461572000,
  "id": 6,
  "isDdl": false,
  "mysqlType": {
    "id": "int(11)",
    "payMethod": "varchar(20)",
    "payName": "varchar(255)",
    "description": "varchar(255)",
    "payOrder": "int(11)",
    "online": "tinyint(4)"
  },
  "old": null,
  "pkNames": null,
  "sql": "",
  "sqlType": {
    "id": 4,
    "payMethod": 12,
    "payName": 12,
    "description": 12,
    "payOrder": 4,
    "online": -6
  },
  "table": "yanqi_payments",
  "ts": 1604461572297,
  "type": "INSERT"
}
{
  "yanqi_event": [
    {
      "name": "goods_detail_loading",
      "json": {
        "entry": "2",
        "goodsid": "0",
        "loading_time": "92",
        "action": "3",
        "staytime": "10",
        "showtype": "0"
      },
      "time": 1595265099584
    },
    {
      "name": "notification",
      "json": {
        "action": "1",
        "type": "3"
      },
      "time": 1595341087663
    },
    {
      "name": "ad",
      "json": {
        "duration": "10",
        "ad_action": "0",
        "shop_id": "23",
        "event_type": "ad",
        "ad_type": "1",
        "show_style": "0",
        "product_id": "36",
        "place": "placecampaign2_left",
        "sort": "1"
      },
      "time": 1595276738208
    }
  ],
  "attr": {
    "area": "东莞",
    "uid": "2F10092A0",
    "app_v": "1.1.0",
    "event_type": "common",
    "device_id": "1FB872-9A1000",
    "os_type": "1.1",
    "channel": "广宣",
    "language": "chinese",
    "brand": "iphone-0"
  }
}
### --- 转换代码

    //对Kafka中的JSON日志进行转换
    val mapEventStream: DataStream[AdClick] = eventLogStream.map(x => {
        val jsonObj: JSONObject = JSON.parseObject(x)
        val attr: String = jsonObj.get("attr").toString
        val attrJson: JSONObject = JSON.parseObject(attr)
        val area: String = attrJson.get("area").toString
        val uid: String = attrJson.get("uid").toString
    //[{"name":"praise","json":
        {"id":0,"type":4,"add_time":"1597851188753","userid":0,"target":8},
        // "time":1595329059805}]
        // 此处的时间戳是毫秒
        val eventData: String = jsonObj.get("yanqi_event").toString
        val datas: JSONArray = JSON.parseArray(eventData)
        val list = new java.util.ArrayList[String]()
        datas.forEach(x => list.add(x.toString))
        var productId: String = null
        var timestamp: Long = 0L
        list.forEach(x => {
            //{"name":"ad","json":
                {"duration":"10","ad_action":"0","shop_id":"23","event_type":"ad","ad_type":"1",
                //
                "show_style":"0","product_id":"36","place":"placecampaign2_left","sort":"1"},"time":15952767
        38208}
        val xJson: JSONObject = JSON.parseObject(x)
        if (xJson.get("name").toString.equals("ad")) {
            val jsonData: String = xJson.get("json").toString
            val jsonDatas = JSON.parseObject(jsonData)
            productId = jsonDatas.get("product_id").toString
            timestamp = TimeUnit.MILLISECONDS.toSeconds(xJson.get("time").toString.toLong)
        }
    })
    AdClick(area, uid, productId, timestamp)
})

 
 
 
 
 
 
 
 
 

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
                                                                                                                                                   ——W.S.Landor

 

posted on   yanqi_vip  阅读(23)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· 无需6万激活码!GitHub神秘组织3小时极速复刻Manus,手把手教你使用OpenManus搭建本
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

导航

统计

点击右上角即可分享
微信分享提示