clickhouse常见的三种空值问题以及解决方案
1.建表时的空值问题
如果我们建表时,不特殊说明空值,比如:
CREATE TABLE test.table1( id String, name String ) ENGINE = MergeTree PARTITION BY id ORDER BY id SETTINGS index_granularity = 8192
这种情况下,如果将包含空值的数据,读入到表中时,会报错。
DB::Exception: Expression returns value NULL, that is out of range of type String, at: null)
因此,为了防止这种情况的发生,我们一般会这样建表:
注意:这里的主键是不可以包含空值的,如果把主键也加Nullable会报错
CREATE TABLE test.table1( id String, name Nullable(String) ) ENGINE = MergeTree PARTITION BY id ORDER BY id SETTINGS index_granularity = 8192
2.查询时的空值问题
上面说了建表的问题,接下来要实例一下,当我们表已经建好,且表数据已经有了,一列数据既包含null,又包含''这类空值,这个时候,如果不注意语法,会报错,如果包含这两类数据,不能使用coalesce,如下:
SELECT COALESCE ( paymentterm, 0 ) AS paymentterm_a, count( DISTINCT orderid ) AS ornum FROM ckdb.test WHERE d = '2020-05-08' GROUP BY paymentterm_a
报错如下:错误原因是paymentterm是string类型,不可以改成int类型
Code: 386, e.displayText() = DB::Exception: There is no supertype for types String, UInt8 because some of them are String/FixedString and some of them are not (version 19.17.6.36 (official build))
这里有一个小的知识点:
group by后面的名称,可以写select中的逻辑,也可以写as为的别名,下面使用case when改写上面的内容:
--方式一 select case when paymentterm is null or paymentterm = '' then 'null' else paymentterm end as paymentterm, count(distinct orderid) as ornum from ckdb.test where d = '2020-05-08' group by paymentterm --方式二 select case when paymentterm is null or paymentterm = '' then 'null' else paymentterm end as paymentterm, count(distinct orderid) as ornum from ckdb.test where d = '2020-05-08' group by case when paymentterm is null or paymentterm = '' then 'null' else paymentterm end --方式三 select coalesce(paymentterm,'null') as paymentterm, count(distinct orderid) as ornum from ckdb.test where d = '2020-05-08' group by coalesce(paymentterm,'null')
3.关联中的空值问题
如下场景,需要使用a表关联b表,把a和b都有的id剔除,在hive中我们一般这样实现:
select a.* from a left join b on a.id = b.id where b.id is null
不过这种方式,在CK中是有问题的,未连接的行使用默认值填充的。String类型就填充空字符串,数值类型就填充 0,要借用其他方式解决
1)使用coalesce来完成
select a.* from a left join b on a.id = b.id where coalesce(b.id,0) = 0
2)使用 settings join_use_nulls 来完成
修改参数,在 SQL 最后加入 settings join_use_nulls = 1
select * from st_center.test_join_1 as t1 all left join st_center.test_join_2 as t2 on t1.id = t2.id settings join_use_nulls = 1
注意:关于jdbc相关调用方式,导致settings无法应用,可以设置users.xml,
<join_use_nulls>1</join_use_nulls>
设置完成后,相关系统表内容也会更新,后续使用不用再单独指定。(可指定profiles组,注意使用账号所属profiles组)
放置位置参考:
SELECT * FROM system.settings WHERE name = 'join_use_nulls' Query id: b5f9d42e-6e02-47ed-b261-fb04562ba18b ┌─name───────────┬─value─┬─changed─┬─description─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─min──┬─max──┬─readonly─┬─type─┐ │ join_use_nulls │ 1 │ 1 │ Use NULLs for non-joined rows of outer JOINs for types that can be inside Nullable. If false, use default value of corresponding columns data type. │ ???? │ ???? │ 0 │ Bool │ └────────────────┴───────┴─────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────┴──────┴──────────┴──────┘ 1 rows in set. Elapsed: 0.003 sec.