presto 访问kudu 多schemas配置
presto需要访问kudu数据源,但是impala可以直接支持多数据库存储,但是presto不能原生支持,按照presto的官网设置了然而并不起作用。
官方文档:
到官方github提问了,然后并没有答复。怎么设置都是无法起作用,有点无从下手的。
查看了官方kudu文档,无意中看到一个关于 Using the Hive Metastore with Kudu 的功能,文档中大概提到:
With the Hive Metastore integration disabled, Kudu presents tables as a single flat namespace, with no hierarchy or concept of a database. Additionally, Kudu’s only restriction on table names is that they be a valid UTF-8 encoded string. When the HMS integration is enabled in Kudu, both of these properties change in order to match the HMS model: the table name must indicate the table’s membership of a Hive database, and table name identifiers (i.e. the table name and database name) are subject to the Hive table name identifier constraints.
大概意思就是在禁用Hive Metastore集成的情况下,Kudu将表显示为单个平面名称空间,没有数据库的层次结构或概念。但是我本地的环境是CDH,查看了CDH 的官方文档在6.3.X的版本才能支持该功能。我本地环境 CDH6.2 版本,抱着试一试的心态。进行了如下操作:
- 先把CDH 6.2 升级到 6.3,具体过程参照 cdh版本升级(5.14 -> 6.2)
- 升级好了以后,按照官方文档设置好。
Setup using Cloudera Manager
- When the Hive Metastore is configured with fine-grained authorization using Apache Sentry, and the Sentry HDFS Sync feature is enabled, the Kudu admin need to be able to access and modify directories that are created for Kudu by the HMS. This can be done by adding the Kudu admin user to the group of the Hive service users, e.g. by running the usermod -aG hive kudu command on the HMS nodes.
- Go to the Hive service.
- Click the Configuration tab.
- Select the Kudu Service with which the Hive Metastore will synchronize the Kudu tables.
居然可以了,但是到这里并没有结束。
开启了 HMS 涉及到原来表的升级,具体可以参照 hms 升级现有表的功能,以及这里面有个需要注意的细节就是,在开启HMS之前必须先把presto 按照官方文档设置好了
kudu.schema-emulation.enabled=true
and kudu.schema-emulation.prefix=
kudu生成了$schemas表以后再开启HMS功能才能支持,如果顺序不对,依然是无法成功的。