dremio 的job 分析参考示例

dremio 官方专门提供了一个章节,讲解如何进行分析,一些是一个总结,以及学习

预备

下载profiles,以及上传到dremio 中(个人默认space中)因为profiles 默认是json 格式的,我们直接使用dremio 查询分析数据

需要关注的内容:

sql 查询:sql 查询是否符合预期
query 源: 数据是否在需要的数据源中
计划时间: 时间是否符合预期
非默认参数:查看是否有非模式参数使用了
计划数据: 查看是否使用了rows ,row count 会造成比较高昂的广播,构建,构建会将数据加载到内存中
加速数据:包含了反射的选择,系统活跃情况
我们需要重点关注的是线程状态,通过此了解系统的等待,阻塞情况
错误处理:dremio nodes 情况,oom
性能考虑: 计划时间以及执行实现,线程数,row count 以及row,阻塞以及睡眠,准备以及等待时间

operator_type 数据准备

  • 准备操作类型映射
    operator_type.json 内容如下
 
{"NAME":"SINGLE_SENDER","ID":0}
{"NAME":"BROADCAST_SENDER","ID":1}
{"NAME":"FILTER","ID":2}
{"NAME":"HASH_AGGREGATE","ID":3}
{"NAME":"HASH_JOIN","ID":4}
{"NAME":"MERGE_JOIN","ID":5}
{"NAME":"HASH_PARTITION_SENDER","ID":6}
{"NAME":"LIMIT","ID":7}
{"NAME":"MERGING_RECEIVER","ID":8}
{"NAME":"ORDERED_PARTITION_SENDER","ID":9}
{"NAME":"PROJECT","ID":10}
{"NAME":"UNORDERED_RECEIVER","ID":11}
{"NAME":"RANGE_SENDER","ID":12}
{"NAME":"SCREEN","ID":13}
{"NAME":"SELECTION_VECTOR_REMOVER","ID":14}
{"NAME":"STREAMING_AGGREGATE","ID":15}
{"NAME":"TOP_N_SORT","ID":16}
{"NAME":"EXTERNAL_SORT","ID":17}
{"NAME":"TRACE","ID":18}
{"NAME":"UNION","ID":19}
{"NAME":"OLD_SORT","ID":20}
{"NAME":"PARQUET_ROW_GROUP_SCAN","ID":21}
{"NAME":"HIVE_SUB_SCAN","ID":22}
{"NAME":"SYSTEM_TABLE_SCAN","ID":23}
{"NAME":"MOCK_SUB_SCAN","ID":24}
{"NAME":"PARQUET_WRITER","ID":25}
{"NAME":"DIRECT_SUB_SCAN","ID":26}
{"NAME":"TEXT_WRITER","ID":27}
{"NAME":"TEXT_SUB_SCAN","ID":28}
{"NAME":"JSON_SUB_SCAN","ID":29}
{"NAME":"INFO_SCHEMA_SUB_SCAN","ID":30}
{"NAME":"COMPLEX_TO_JSON","ID":31}
{"NAME":"PRODUCER_CONSUMER","ID":32}
{"NAME":"HBASE_SUB_SCAN","ID":33}
{"NAME":"WINDOW","ID":34}
{"NAME":"NESTED_LOOP_JOIN","ID":35}
{"NAME":"AVRO_SUB_SCAN","ID":36}
{"NAME":"MONGO_SUB_SCAN","ID":37}
{"NAME":"ELASTICSEARCH_SUB_SCAN","ID":38}
{"NAME":"ELASTICSEARCH_AGGREGATOR_SUB_SCAN","ID":39}
{"NAME":"FLATTEN","ID":40}
{"NAME":"EXCEL_SUB_SCAN","ID":41}
{"NAME":"ARROW_SUB_SCAN","ID":42}
{"NAME":"ARROW_WRITER","ID":43}
{"NAME":"JSON_WRITER","ID":44}
{"NAME":"VALUES_READER","ID":45}
{"NAME":"CONVERT_FROM_JSON","ID":46}
{"NAME":"JDBC_SUB_SCAN","ID":47}
{"NAME":"DICTIONARY_LOOKUP","ID":48}
{"NAME":"WRITER_COMMITTER","ID":49}
{"NAME":"ROUND_ROBIN_SENDER","ID":50}

效果

 

 

参考查询

  • 查询行数分析
SELECT majorFragmentId, operatorId, op_type.name, sum(records) records
-- minorFragmentId, nested_4.minorFragmentProfile.maxMemoryUsed AS maxMemoryUsed, nested_4.minorFragmentProfile.memoryUsed AS memoryUsed, nested_4.endpoint.address AS address, nested_4.endpoint.maxDirectMemory AS maxDirectMemory
FROM (
  SELECT minorFragmentProfile, nested_3.minorFragmentProfile.endpoint AS endpoint, minorFragmentId, operatorType, operatorId, nested_3.inputProfile.records AS records, majorFragmentId
  FROM (
    SELECT minorFragmentProfile, nested_2.minorFragmentProfile.minorFragmentId AS minorFragmentId, operatorProfile, nested_2.operatorProfile.operatorType AS operatorType, nested_2.operatorProfile.operatorId AS operatorId, flatten(nested_2.operatorProfile.inputProfile) AS inputProfile, majorFragmentId
    FROM (
      SELECT fragmentProfile, minorFragmentProfile, flatten(nested_1.minorFragmentProfile.operatorProfile) AS operatorProfile, majorFragmentId
      FROM (
        SELECT fragmentProfile, flatten(nested_0.fragmentProfile.minorFragmentProfile) AS minorFragmentProfile, nested_0.fragmentProfile.majorFragmentId AS majorFragmentId
        FROM (
          SELECT flatten(fragmentProfile) AS fragmentProfile
          FROM "@dalong"."profile_attempt_0" AS "profile_attempt_0"
        ) nested_0
      ) nested_1
    ) nested_2
  ) nested_3
) nested_4, "@dalong".operator_type op_type
where nested_4.operatorType = op_type.id
group by majorFragmentId, operatorId, op_type.name
order by majorFragmentId, operatorId

效果

 

 

  • 查询内存情况
SELECT nested_2.endpoint.address AS address, sum(nested_2.minorFragmentProfile.maxMemoryUsed)/1024/1024 AS maxMemoryUsed_MB
-- nested_2.minorFragmentProfile.memoryUsed AS memoryUsed, nested_2.minorFragmentProfile.maxMemoryUsed AS maxMemoryUsed, nested_2.endpoint.address AS address, nested_2.endpoint.maxDirectMemory AS maxDirectMemory, nested_2.endpoint.provisionId AS provisionId, majorFragmentId
FROM (
 SELECT fragmentProfile, minorFragmentProfile, nested_1.minorFragmentProfile.endpoint AS endpoint, majorFragmentId
 FROM (
   SELECT fragmentProfile, flatten(nested_0.fragmentProfile.minorFragmentProfile) AS minorFragmentProfile, nested_0.fragmentProfile.majorFragmentId AS majorFragmentId
   FROM (
     SELECT flatten(fragmentProfile) AS fragmentProfile
     FROM "@dalong"."profile_attempt_0" AS "profile_attempt_0"
   ) nested_0
 ) nested_1
) nested_2
group by nested_2.endpoint.address
order by nested_2.endpoint.address

效果

 

 

说明

对于job 的分析可以结合apache drill 学习,dremio 文档对于这部分介绍还是太简单,推荐结合drill 学习,同时默认32k 数据太小,需要调整下
参考命令

 
alter system set limits.single_field_size_bytes = 99000;

参考资料

https://docs.dremio.com/software/jobs/job-profile/
https://docs.dremio.com/software/jobs/analyzing-profiles/
https://docs.dremio.com/software/jobs/analyzing-profiles-attempt-0/
https://docs.dremio.com/software/jobs/analyzing-profiles-mappings-for-examples/
https://drill.apache.org/docs/query-profiles/
https://drill.apache.org/docs/query-profile-column-descriptions/

posted on 2022-05-30 16:11  荣锋亮  阅读(85)  评论(0编辑  收藏  举报

导航