|NO.Z.00007|——————————|BigDataEnd|——|Hadoop&OLAP_Druid.V07|——|Druid.v07|入门|从HDFS加载数据.V1|

一、从HDFS中加载数据
### --- 在hdfs中创建数据目录

~~~     # 在kafka中准备配置文件
[root@hadoop02 ~]# hdfs dfs -cat /data/druidlog.dat
 
{"ts":"2021-10-01T00:01:35Z","srcip":"6.6.6.6", "dstip":"8.8.8.8", "srcport":6666, "dstPort":8888, "protocol": "tcp", "packets":1, "bytes":1000, "cost": 0.1}
{"ts":"2021-10-01T00:01:36Z","srcip":"6.6.6.6", "dstip":"8.8.8.8", "srcport":6666, "dstPort":8888, "protocol": "tcp", "packets":2, "bytes":2000, "cost": 0.1}
{"ts":"2021-10-01T00:01:37Z","srcip":"6.6.6.6", "dstip":"8.8.8.8", "srcport":6666, "dstPort":8888, "protocol": "tcp", "packets":3, "bytes":3000, "cost": 0.1}
{"ts":"2021-10-01T00:01:38Z","srcip":"6.6.6.6", "dstip":"8.8.8.8", "srcport":6666, "dstPort":8888, "protocol": "tcp", "packets":4, "bytes":4000, "cost": 0.1}
{"ts":"2021-10-01T00:02:08Z","srcip":"1.1.1.1", "dstip":"2.2.2.2", "srcport":6666, "dstPort":8888, "protocol": "udp", "packets":5, "bytes":5000, "cost": 0.2}
{"ts":"2021-10-01T00:02:09Z","srcip":"1.1.1.1", "dstip":"2.2.2.2", "srcport":6666, "dstPort":8888, "protocol": "udp", "packets":6, "bytes":6000, "cost": 0.2}
{"ts":"2021-10-01T00:02:10Z","srcip":"1.1.1.1", "dstip":"2.2.2.2", "srcport":6666, "dstPort":8888, "protocol": "udp", "packets":7, "bytes":7000, "cost": 0.2}
{"ts":"2021-10-01T00:02:11Z","srcip":"1.1.1.1", "dstip":"2.2.2.2", "srcport":6666, "dstPort":8888, "protocol": "udp", "packets":8, "bytes":8000, "cost": 0.2}
{"ts":"2021-10-01T00:02:12Z","srcip":"1.1.1.1", "dstip":"2.2.2.2", "srcport":6666, "dstPort":8888, "protocol": "udp", "packets":9, "bytes":9000, "cost": 0.2}
### --- 定义数据摄取规范

~~~     HDFS文件,数据格式 json,时间戳 ts
~~~     不定义Rollup;保留所有的明细数据
~~~     Segment granularity:Day
~~~     DataSource Name:yanqitable2
### --- 数据查询

~~~     # 数据查询
select * from "yanqitable2"
select protocol, count(*) rowcoun, sum(bytes) as bytes, sum(packets) as packets, max(cost) as maxcost from "yanqitable2" group by protocol
二、从HDFS摄取数据
三、查看构建的任务
### --- 查看构建的任务
### --- 查看数据源
四、进行数据查询
### --- 进行数据查询

~~~     查询数据

五、生成的Json文件
{
  "type": "index_parallel",
  "spec": {
    "ioConfig": {
      "type": "index_parallel",
      "inputSource": {
        "type": "hdfs",
        "paths": "/data/druidlog.dat"
      },
      "inputFormat": {
        "type": "json"
      }
    },
    "tuningConfig": {
      "type": "index_parallel",
      "partitionsSpec": {
        "type": "dynamic"
      }
    },
    "dataSchema": {
      "timestampSpec": {
        "column": "ts",
        "format": "iso"
      },
      "dimensionsSpec": {
        "dimensions": [
          {
            "type": "long",
            "name": "bytes"
          },
          {
            "type": "double",
            "name": "cost"
          },
          "dstip",
          {
            "type": "long",
            "name": "dstPort"
          },
          {
            "type": "long",
            "name": "packets"
          },
          "protocol",
          "srcip",
          {
            "type": "long",
            "name": "srcport"
          }
        ]
      },
      "granularitySpec": {
        "queryGranularity": "minute",
        "rollup": false,
        "segmentGranularity": "day"
      },
      "dataSource": "yanqitable2"
    }
  }
}


 
 
 
 
 
 
 
 

Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
                                                                                                                                                   ——W.S.Landor

 

posted on   yanqi_vip  阅读(22)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· 记一次.NET内存居高不下排查解决与启示
· 白话解读 Dapr 1.15:你的「微服务管家」又秀新绝活了
· DeepSeek 开源周回顾「GitHub 热点速览」
< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5

导航

统计

点击右上角即可分享
微信分享提示