Fork me on GitHub

Kylin| 常见问题

 

详细使用参考官方文档:

          http://kylin.apache.org/cn/docs/

Kylin的构建过程

sql如下

复制代码
SELECT
`APP_POINT`.`MID` as `APP_POINT_MID`
,`APP_POINT`.`MEMBER_LEVEL` as `APP_POINT_MEMBER_LEVEL`
,`APP_POINT`.`MARKET_ID` as `APP_POINT_MARKET_ID`
,`APP_POINT`.`SUMMARY_TIME` as `APP_POINT_SUMMARY_TIME`
,`APP_USER_BASE_INFO`.`GENDER` as `APP_USER_BASE_INFO_GENDER`
,`APP_USER_BASE_INFO`.`BIRTH_TIME` as `APP_USER_BASE_INFO_BIRTH_TIME`
,`APP_USER_BASE_INFO`.`IS_HAVE_CHILD` as `APP_USER_BASE_INFO_IS_HAVE_CHILD`
,`APP_USER_BASE_INFO`.`MEMBER_TYPE` as `APP_USER_BASE_INFO_MEMBER_TYPE`
,`APP_USER_BASE_INFO`.`IS_NEW_MEMBER` as `APP_USER_BASE_INFO_IS_NEW_MEMBER`
,`APP_USER_BASE_INFO`.`IS_HAVE_CAR` as `APP_USER_BASE_INFO_IS_HAVE_CAR`
,`APP_USER_BASE_INFO`.`VIP_LEVEL` as `APP_USER_BASE_INFO_VIP_LEVEL`
,`APP_USER_BASE_INFO`.`MID` as `APP_USER_BASE_INFO_MID`
,`APP_POINT`.`POINT` as `APP_POINT_POINT`
 FROM `HST_APP`.`APP_POINT` as `APP_POINT`
INNER JOIN `HST_APP`.`APP_USER_BASE_INFO` as `APP_USER_BASE_INFO`
ON `APP_POINT`.`MID` = `APP_USER_BASE_INFO`.`MID`
WHERE 1=1
View Code
复制代码

kylin.log日志,构建所创建的表及索引

复制代码
USE default;
DROP TABLE IF EXISTS kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306;
CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306
(
\`APP_POINT_MID\` bigint
,\`APP_POINT_MEMBER_LEVEL\` bigint
,\`APP_POINT_MARKET_ID\` bigint
,\`APP_POINT_SUMMARY_TIME\` string
,\`APP_USER_BASE_INFO_GENDER\` string
,\`APP_USER_BASE_INFO_BIRTH_TIME\` string
,\`APP_USER_BASE_INFO_IS_HAVE_CHILD\` string
,\`APP_USER_BASE_INFO_MEMBER_TYPE\` string
,\`APP_USER_BASE_INFO_IS_NEW_MEMBER\` string
,\`APP_USER_BASE_INFO_IS_HAVE_CAR\` string
,\`APP_USER_BASE_INFO_VIP_LEVEL\` string
,\`APP_POINT_POINT\` bigint
)
STORED AS SEQUENCEFILE
LOCATION 'hdfs://nameservice1/kylin/kylin_metadata/kylin-8a15ba0f-54d5-699e-9812-bab76649c511/kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306';
ALTER TABLE kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306 SET TBLPROPERTIES('auto.purge'='true');
INSERT OVERWRITE TABLE \`kylin_intermediate_point_cube_0051b1b9_09d4_be20_1c3b_034038b5b306\` SELECT
\`APP_POINT\`.\`MID\` as \`APP_POINT_MID\`
,\`APP_POINT\`.\`MEMBER_LEVEL\` as \`APP_POINT_MEMBER_LEVEL\`
,\`APP_POINT\`.\`MARKET_ID\` as \`APP_POINT_MARKET_ID\`
,\`APP_POINT\`.\`SUMMARY_TIME\` as \`APP_POINT_SUMMARY_TIME\`
,\`APP_USER_BASE_INFO\`.\`GENDER\` as \`APP_USER_BASE_INFO_GENDER\`
,\`APP_USER_BASE_INFO\`.\`BIRTH_TIME\` as \`APP_USER_BASE_INFO_BIRTH_TIME\`
,\`APP_USER_BASE_INFO\`.\`IS_HAVE_CHILD\` as \`APP_USER_BASE_INFO_IS_HAVE_CHILD\`
,\`APP_USER_BASE_INFO\`.\`MEMBER_TYPE\` as \`APP_USER_BASE_INFO_MEMBER_TYPE\`
,\`APP_USER_BASE_INFO\`.\`IS_NEW_MEMBER\` as \`APP_USER_BASE_INFO_IS_NEW_MEMBER\`
,\`APP_USER_BASE_INFO\`.\`IS_HAVE_CAR\` as \`APP_USER_BASE_INFO_IS_HAVE_CAR\`
,\`APP_USER_BASE_INFO\`.\`VIP_LEVEL\` as \`APP_USER_BASE_INFO_VIP_LEVEL\`
,\`APP_POINT\`.\`POINT\` as \`APP_POINT_POINT\`
 FROM \`HST_APP\`.\`APP_POINT\` as \`APP_POINT\`
INNER JOIN \`HST_APP\`.\`APP_USER_BASE_INFO\` as \`APP_USER_BASE_INFO\`
ON \`APP_POINT\`.\`MID\` = \`APP_USER_BASE_INFO\`.\`MID\`
WHERE 1=1 AND (\`APP_POINT\`.\`SUMMARY_TIME\` >= '2020-12-01' AND \`APP_POINT\`.\`SUMMARY_TIME\` < '2021-02-01')
;
" 
--hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true
View Code
复制代码

 

1. 常见问题

兼容性问题

apache-kylin-3.0.2、 apache-hive-3.1.2

 

 它去读取Hive的原数据,发现没有CATALOG_NAME:

 

 解决方法:1)修改源码 或者

Hive访问元数据必须采用metastore的方式,即保证以下两点:

① 保证hive-site.xml文件有以下参数: 否则它就会去获取myslq中的元数据。

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://hadoop101:9083</value>
</property>

②启动hive metastore服务

hive --service metastore

 

 

 Skip snapshot for this lookup table.

(Some  lookup table is too big eg. > 300MB  for snapshot,thus must be marked as limited.  Limited  lookup table cannot be queried  directly and does not support derived  dimensions. )

如果维度表超过了300M,没有勾选Skip snapshot,在cube构建时就会报错:

 

 

 tail -f  kylin.log  #监控kylin.log日志

 

 以上错误是两个Segments overlap 构建的时间维度有重合了。

解决方法:

进行refresh

 

 或者 将状态 从READY 变成 DISABLE 后进行清除Purge 重新构建 

    

 页面中的Storage记录了 每次Segment的记录  

  

2. BitMap的使用

http://kylin.apache.org/cn/docs/howto/howto_use_hive_mr_dict.html

Count distinct(bitmap) 度量对于许多场景来说都非常重要, 比如统计点击量, kylin从1.5.3版本开始支持精确去重.

Apache Kylin 实现了基于bitmap的精确去重, 并且使用全局字典将字符串类型编码为整数类型。当前的全局字典是单线程构建的,对于高基列可能会占用大量的时间和内存。

Kylin v3.0.0 引入了第一版的 Hive global dictionary(KYLIN-3841). 这个功能使用Hive的分布式SQL引擎来构建全局字典。

为了进一步提升性能, kylin v3.1.0 引入了第二版的Hive global dictionary v2(KYLIN-4342), 这个版本在某些步骤使用MapReduce代替HQL进行全局字典的构建。

收益

1.使用分布式的方式来构建全局字典,节省时间。

2.Kylin集群中的Job Server可以做更少的工作, 因此会更加稳定。

3.OneID, Hive Global Dictionary在kylin之外仍然具有可读性,因此每个人都可以在公司其他场景中重用这个字典。

 

 

 

 cubes的sql如下:

复制代码
SELECT
`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`MARKET_ID` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID`
,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_KEY` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY`
,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_TYPE` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE`
,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_VALUE` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE`
,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`TAG_WEIGHT` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT`
,`APP_POINT_USE_WAY_TEST`.`POINT_COUNT` as `APP_POINT_USE_WAY_TEST_POINT_COUNT`
,`APP_POINT_USE_WAY_TEST`.`CHANNEL` as `APP_POINT_USE_WAY_TEST_CHANNEL`
,`APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`MID` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID`
 FROM `HST_APP`.`APP_PROFILE_ACTION_TAGS_PART_CATEGORY` as `APP_PROFILE_ACTION_TAGS_PART_CATEGORY`
LEFT JOIN `HST_APP`.`APP_POINT_USE_WAY_TEST` as `APP_POINT_USE_WAY_TEST`
ON `APP_PROFILE_ACTION_TAGS_PART_CATEGORY`.`MID` = `APP_POINT_USE_WAY_TEST`.`MID`
WHERE 1=1
View Code
复制代码

Monitor的jobs工作过程如下:

复制代码
USE default;

DROP TABLE IF EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0;
CREATE EXTERNAL TABLE IF NOT EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0
(
\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID\` bigint
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY\` string
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE\` string
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE\` string
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT\` decimal(32,10)
,\`APP_POINT_USE_WAY_TEST_POINT_COUNT\` bigint
,\`APP_POINT_USE_WAY_TEST_CHANNEL\` bigint
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID\` string
)
STORED AS SEQUENCEFILE
LOCATION 'hdfs://nameservice1/kylin/kylin_metadata/kylin-0c6c73f4-9871-1f04-ccc2-2d276126b7cd/kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0';

ALTER TABLE kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 SET TBLPROPERTIES('auto.purge'='true');
INSERT OVERWRITE TABLE \`kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0\` SELECT
\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`MARKET_ID\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID\`
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_KEY\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY\`
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_TYPE\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE\`
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_VALUE\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE\`
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`TAG_WEIGHT\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT\`
,\`APP_POINT_USE_WAY_TEST\`.\`POINT_COUNT\` as \`APP_POINT_USE_WAY_TEST_POINT_COUNT\`
,\`APP_POINT_USE_WAY_TEST\`.\`CHANNEL\` as \`APP_POINT_USE_WAY_TEST_CHANNEL\`
,\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`MID\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID\`
 FROM \`HST_APP\`.\`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\` as \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`
LEFT JOIN \`HST_APP\`.\`APP_POINT_USE_WAY_TEST\` as \`APP_POINT_USE_WAY_TEST\`
ON \`APP_PROFILE_ACTION_TAGS_PART_CATEGORY\`.\`MID\` = \`APP_POINT_USE_WAY_TEST\`.\`MID\`
WHERE 1=1;

" --hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 --
hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true


USE default;
set hive.exec.compress.output=false;
set hive.mapred.mode=unstrict;
CREATE TABLE IF NOT EXISTS default.MidBitMap_global_dict
 ( dict_key STRING COMMENT '', 
   dict_val INT COMMENT '' 
) 
COMMENT 'Hive Global Dictionary' 
PARTITIONED BY (dict_column string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
STORED AS TEXTFILE; 

DROP TABLE IF EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value; 
CREATE TABLE IF NOT EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value 
( 
   dict_key STRING COMMENT '' 
) 
COMMENT '' 
PARTITIONED BY (dict_column string) 
STORED AS TEXTFILE 
;

DROP TABLE IF EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict; 
CREATE TABLE IF NOT EXISTS kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict 
( 
   dict_key STRING COMMENT '' , 
  dict_val STRING COMMENT '' 
) 
COMMENT '' 
PARTITIONED BY (dict_column string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
STORED AS TEXTFILE 
;

INSERT OVERWRITE TABLE kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value 
PARTITION (dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID') 
SELECT 
    a.DICT_KEY FROM 
(SELECT 
APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID as DICT_KEY 
  FROM kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0
  GROUP BY APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID) a 
LEFT JOIN 
  (SELECT DICT_KEY FROM default.MidBitMap_global_dict    WHERE DICT_COLUMN = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID' ) b 
ON a.DICT_KEY = b.DICT_KEY 
WHERE b.DICT_KEY IS NULL 
;

INSERT OVERWRITE TABLE  kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value PARTITION (DICT_COLUMN = 'KYLIN_MAX_DISTINCT_COUNT') 
SELECT 
  CONCAT_WS(',', tc.dict_column, 
  cast(tc.total_distinct_val AS String), 
  if(tm.max_dict_val is null, '0', cast(max_dict_val as string))) 
FROM (
    SELECT dict_column, count(1) total_distinct_val
    FROM default.kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0__distinct_value
    WHERE DICT_COLUMN != 'KYLIN_MAX_DISTINCT_COUNT'
    GROUP BY dict_column) tc 
LEFT JOIN (

    SELECT dict_column, if(max(dict_val) is null, 0, max(dict_val)) as max_dict_val 
    FROM default.MidBitMap_global_dict
    GROUP BY dict_column) tm 
ON tc.dict_column = tm.dict_column;
" 
--hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true



USE default;
set mapreduce.job.reduces=25;
set hive.merge.mapredfiles=false;
INSERT OVERWRITE TABLE \`kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0\` SELECT * FROM \`kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0\` DISTRIBUTE BY APP_PROFILE_ACTION_TAGS_
PART_CATEGORY_MARKET_ID,APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY,APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE;

" --hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true


USE default;
set hive.mapred.mode=unstrict;
ALTER TABLE kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict ADD IF NOT EXISTS PARTITION (dict_column='APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID'); 

INSERT OVERWRITE TABLE default.MidBitMap_global_dict PARTITION (dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID') 
SELECT 
    dict_key, dict_val 
FROM default.MidBitMap_global_dict 
WHERE dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID' 
UNION ALL 
SELECT 
   dict_key, dict_val 
FROM kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0_global_dict 
 WHERE dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID' ;

" 
--hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true


USE default;
set hive.exec.compress.output=false; set hive.mapred.mode=unstrict;
INSERT OVERWRITE TABLE default.kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 
SELECT 
a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MARKET_ID 
,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_KEY 
,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_TYPE 
,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_VALUE 
,a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_TAG_WEIGHT 
,a.APP_POINT_USE_WAY_TEST_POINT_COUNT 
,a.APP_POINT_USE_WAY_TEST_CHANNEL 
,b.dict_val 
FROM default.kylin_intermediate_midbitmap_3a2a2ec7_c7d7_b5ea_34cc_19b581afe9c0 a 
LEFT OUTER JOIN 
 (SELECT dict_key, dict_val FROM default.MidBitMap_global_dict WHERE dict_column = 'APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID') b 
ON a.APP_PROFILE_ACTION_TAGS_PART_CATEGORY_MID = b.dict_key;
" 
--hiveconf hive.auto.convert.join=true 
--hiveconf dfs.replication=2 
--hiveconf hive.auto.convert.join.noconditionaltask=true 
--hiveconf hive.merge.mapfiles=false 
--hiveconf mapreduce.reduce.java.opts=-Xms6g 
--hiveconf mapreduce.map.memory.mb=8192 
--hiveconf hive.merge.mapredfiles=false 
--hiveconf mapreduce.reduce.input.buffer.percent=0.5 
--hiveconf hive.exec.compress.output=true 
--hiveconf mapreduce.reduce.memory.mb=16384 
--hiveconf mapreduce.job.split.metainfo.maxsize=-1 
--hiveconf mapreduce.map.java.opts=-Xms3g 
--hiveconf hive.auto.convert.join.noconditionaltask.size=100000000 
--hiveconf hive.stats.autogather=true
View Code
复制代码

 

Kylin常用函数:

intersect_value 和 intersect_count

复制代码
select 
  market_id,
  intersect_value(mid,tag_value,array['女']) as arr_mid_bitmap
from APP_PROFILE_ACTION_TAGS_PART_CATEGORY 
 where tag_type = 'gender' and tag_value = ''
group by market_id 
---
132  [119787,119798,119823]
164  [136555,140549,141655,145728,145729,145796,14580,1639794]
213  [632938,635520,644039]
150  [588915,1328378]

select 
market_id,
intersect_count(mid,tag_value,array['女']) as arr_mid_bitmap
from APP_PROFILE_ACTION_TAGS_PART_CATEGORY 
where tag_type = 'gender' and tag_value = ''
group by market_id

--
132    3
164   8
213   3
150   2
复制代码

 

 
posted @   kris12  阅读(931)  评论(0编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
levels of contents
点击右上角即可分享
微信分享提示