Presto是什么
Presto 是一个分布式 SQL 查询引擎,旨在查询大型数据集 分布在一个或多个异构数据源上。笔者所参与的项目主要使用Presto做数据探查和数据分析。
Presto架构
Presto查询引擎是一个Master-Slave的架构,由一个Coordinator节点,一个Discovery Server节点,多个Worker节点组成,Discovery Server通常内嵌于Coordinator节点中。
Coordinator负责解析SQL语句,生成执行计划,分发执行任务给Worker节点执行。
Worker节点负责实际执行查询任务。Worker节点启动后向Discovery Server服务注册,Coordinator从Discovery Server获得可以正常工作的Worker节点。
Presto是功能如何实现
Presto工程中使用Connector负责Presto与数据源进行交互,不同的数据库对应于不同的Connector。Connector是使用 SPI 作为服务提供/发现机制的。Java中SPI机制主要思想是将装配的控制权移到程序之外。
在Presto中的应用就是基于 com.facebook.presto.spi.Plugin 这个接口实现一个对应的xxxPlugin。并且在plugin.dir配置的路径中添加实现com.facebook.presto.spi.Plugin接口代码对应的产物,即可在Presto中实现与数据源的交互。
高斯数据库GuassPlugin插件开发
软件版本信息
· presto源码版本0.253
· jdk1.8-151(和presto源码版本对应)
具体实现步骤
Presto原始项目中创建presto-gauss项目
presto-gauss pom中需要添加com.facebook.presto作为父项目,同时需要添加Presto官方实现的JdbcPlugin依赖包,以及高斯数据库JDBC包。
将打包项目打包方式设置成presto-plugin。
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>presto-root</artifactId>
<groupId>com.facebook.presto</groupId>
<version>0.253</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>presto-gauss</artifactId>
<description>Presto - gauss Connector</description>
<packaging>presto-plugin</packaging>
<dependencies>
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-base-jdbc</artifactId>
</dependency>
<dependency>
<groupId>com.facebook.airlift</groupId>
<artifactId>configuration</artifactId>
</dependency>
<dependency>
<groupId>com.facebook.airlift</groupId>
<artifactId>log-manager</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.google.inject</groupId>
<artifactId>guice</artifactId>
</dependency>
<dependency>
<groupId>javax.inject</groupId>
<artifactId>javax.inject</artifactId>
</dependency>
<dependency>
<groupId>javax.validation</groupId>
<artifactId>validation-api</artifactId>
</dependency>
<!-- Presto SPI -->
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-spi</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-common</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.facebook.drift</groupId>
<artifactId>drift-api</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.airlift</groupId>
<artifactId>slice</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.airlift</groupId>
<artifactId>units</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.openjdk.jol</groupId>
<artifactId>jol-core</artifactId>
<scope>provided</scope>
</dependency>
<!-- 华为高斯依赖 -->
<dependency>
<groupId>com.huawei</groupId>
<artifactId>gsjdbc200</artifactId>
<version>8.1.3</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<!-- TODO: Remove this once fixed -->
<ignoredDependencies>
</ignoredDependencies>
</configuration>
</plugin>
</plugins>
</build>
<profiles>
<profile>
<id>ci</id>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<excludes combine.self="override"/>
</configuration>
</plugin>
</plugins>
</build>
</profile>
<profile>
<id>default</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
在presto-root项目的pom中增加module
<modules>
……
<module>presto-gauss</module>
……
</modules>
在presto-server项目中增加presto-gauss依赖
<dependency>
<groupId>com.facebook.presto</groupId>
<artifactId>presto-gauss</artifactId>
<version>${project.version}</version>
<type>zip</type>
<scope>provided</scope>
</dependency>
基于高斯数据库驱动实现ConnectionFactory实例的创建
因为高斯数据库是JDBC 数据源官方的JdbcPlugin中已经实现com.facebook.presto.spi.Plugin中的接口,因此我们仅需高斯Jdbc驱动实例构建ConnectionFactory即可。
高斯数据库驱动对于JDBC的支持程度、数据源本身的特性,所以需要对原始JDBC进行一些适配。
创建GaussClient.java实现ConnectionFactory,同时完成表重命名、数据类型映射、查询表和模式的等方法进行适配。
package com.facebook.presto.plugin.gauss;
import com.facebook.presto.common.type.Decimals;
import com.facebook.presto.common.type.Type;
import com.facebook.presto.common.type.VarcharType;
import com.facebook.presto.plugin.jdbc.*;
import com.facebook.presto.spi.ConnectorSession;
import com.facebook.presto.spi.PrestoException;
import com.facebook.presto.spi.SchemaTableName;
import com.huawei.gauss200.jdbc.Driver;
import javax.inject.Inject;
import java.sql.*;
import java.util.Optional;
import java.util.Properties;
import static com.facebook.presto.common.type.DecimalType.createDecimalType;
import static com.facebook.presto.common.type.VarbinaryType.VARBINARY;
import static com.facebook.presto.common.type.VarcharType.createUnboundedVarcharType;
import static com.facebook.presto.common.type.VarcharType.createVarcharType;
import static com.facebook.presto.plugin.jdbc.DriverConnectionFactory.basicConnectionProperties;
import static com.facebook.presto.plugin.jdbc.JdbcErrorCode.JDBC_ERROR;
import static com.facebook.presto.plugin.jdbc.StandardReadMappings.*;
import static com.facebook.presto.spi.StandardErrorCode.NOT_SUPPORTED;
import static java.lang.String.format;
import static java.util.Locale.ENGLISH;
import static java.util.Objects.requireNonNull;
public class GaussClient extends BaseJdbcClient
{
private static final int FETCH_SIZE = 1000;
private final boolean synonymsEnabled;
private final int numberDefaultScale;
@Inject
public GaussClient(JdbcConnectorId connectorId, BaseJdbcConfig config, GaussConfig gaussConfig) throws SQLException {
super(connectorId, config, "", connectionFactory(config, gaussConfig));
requireNonNull(gaussConfig, "gauss config is null");
this.synonymsEnabled = gaussConfig.isSynonymsEnabled();
this.numberDefaultScale = gaussConfig.getNumberDefaultScale();
}
private static ConnectionFactory connectionFactory(BaseJdbcConfig config, GaussConfig gaussConfig) {
Properties connectionProperties = basicConnectionProperties(config);
connectionProperties.setProperty("useUnicode", "true");
connectionProperties.setProperty("characterEncoding", "utf8");
connectionProperties.setProperty("ssl", "false");
return new DriverConnectionFactory(new Driver(), config.getConnectionUrl(),
Optional.ofNullable(config.getUserCredentialName()),
Optional.ofNullable(config.getPasswordCredentialName()), connectionProperties);
}
private String[] getTableTypes()
{
if (synonymsEnabled) {
return new String[] {"TABLE", "VIEW", "SYNONYM"};
}
return new String[] {"TABLE", "VIEW"};
}
@Override
protected ResultSet getTables(Connection connection, Optional<String> schemaName, Optional<String> tableName)
throws SQLException
{
DatabaseMetaData metadata = connection.getMetaData();
String escape = metadata.getSearchStringEscape();
return metadata.getTables(
connection.getCatalog(),
escapeNamePattern(schemaName, Optional.of(escape)).orElse(null),
escapeNamePattern(tableName, Optional.of(escape)).orElse(null),
getTableTypes());
}
@Override
public PreparedStatement getPreparedStatement(Connection connection, String sql)
throws SQLException
{
PreparedStatement statement = connection.prepareStatement(sql);
statement.setFetchSize(FETCH_SIZE);
return statement;
}
@Override
protected String generateTemporaryTableName()
{
return "presto_tmp_" + System.nanoTime();
}
@Override
protected void renameTable(JdbcIdentity identity, String catalogName, SchemaTableName oldTable, SchemaTableName newTable)
{
if (!oldTable.getSchemaName().equalsIgnoreCase(newTable.getSchemaName())) {
throw new PrestoException(NOT_SUPPORTED, "Table rename across schemas is not supported in gauss");
}
String newTableName = newTable.getTableName().toUpperCase(ENGLISH);
String oldTableName = oldTable.getTableName().toUpperCase(ENGLISH);
String sql = format(
"ALTER TABLE %s RENAME TO %s",
quoted(catalogName, oldTable.getSchemaName(), oldTableName),
quoted(newTableName));
try (Connection connection = connectionFactory.openConnection(identity)) {
execute(connection, sql);
}
catch (SQLException e) {
throw new PrestoException(JDBC_ERROR, e);
}
}
@Override
public Optional<ReadMapping> toPrestoType(ConnectorSession session, JdbcTypeHandle typeHandle)
{
int columnSize = typeHandle.getColumnSize();
switch (typeHandle.getJdbcType()) {
case Types.CLOB:
case Types.NCLOB:
return Optional.of(varcharReadMapping(createUnboundedVarcharType()));
case Types.SMALLINT:
return Optional.of(smallintReadMapping());
case Types.FLOAT:
case Types.DOUBLE:
return Optional.of(doubleReadMapping());
case Types.REAL:
return Optional.of(realReadMapping());
case Types.NUMERIC:
int precision = columnSize == 0 ? Decimals.MAX_PRECISION : columnSize;
int scale = typeHandle.getDecimalDigits();
if (scale == 0) {
return Optional.of(bigintReadMapping());
}
if (scale < 0 || scale > precision) {
return Optional.of(decimalReadMapping(createDecimalType(precision, numberDefaultScale)));
}
return Optional.of(decimalReadMapping(createDecimalType(precision, scale)));
case Types.VARCHAR:
case Types.NVARCHAR:
case Types.LONGVARCHAR:
case Types.LONGNVARCHAR:
if (columnSize > VarcharType.MAX_LENGTH) {
return Optional.of(varcharReadMapping(createUnboundedVarcharType()));
}
return Optional.of(varcharReadMapping(createVarcharType(columnSize)));
case Types.BLOB:
return Optional.of(varbinaryReadMapping());
}
return super.toPrestoType(session, typeHandle);
}
@Override
protected String toSqlType(Type type) {
if (VARBINARY.equals(type)) {
return "blob";
}
return super.toSqlType(type);
}
}
基于presto使用的Guice依赖注入框架,创建GuassPlugin.java、GaussClientModule.java、GaussConfig.java 类注册实现的插件
GuassPlugin.java:
package com.facebook.presto.plugin.gauss;
import com.facebook.presto.plugin.jdbc.JdbcPlugin;
/**
* Initialize GuassPlugin class for prestoDB
*/
public class GuassPlugin
extends JdbcPlugin
{
/**
* gauss Plugin Constructor
*/
public GuassPlugin()
{
super("gauss", new GaussClientModule());
}
}
GaussClientModule.java:
package com.facebook.presto.plugin.gauss;
import com.facebook.presto.plugin.jdbc.BaseJdbcConfig;
import com.facebook.presto.plugin.jdbc.JdbcClient;
import com.google.inject.Binder;
import com.google.inject.Module;
import com.google.inject.Scopes;
import static com.facebook.airlift.configuration.ConfigBinder.configBinder;
public class GaussClientModule
implements Module
{
@Override
public void configure(Binder binder)
{
binder.bind(JdbcClient.class).to(GaussClient.class)
.in(Scopes.SINGLETON);
configBinder(binder).bindConfig(BaseJdbcConfig.class);
configBinder(binder).bindConfig(GaussConfig.class);
}
}
GaussClientModule.java:
package com.facebook.presto.plugin.gauss;
import com.facebook.airlift.configuration.Config;
import javax.validation.constraints.Max;
import javax.validation.constraints.Min;
import javax.validation.constraints.NotNull;
import java.math.RoundingMode;
public class GaussConfig
{
private boolean synonymsEnabled;
private int varcharMaxSize = 4000;
private int timestampDefaultPrecision = 6;
private int numberDefaultScale = 10;
private RoundingMode numberRoundingMode = RoundingMode.HALF_UP;
@NotNull
public boolean isSynonymsEnabled()
{
return synonymsEnabled;
}
@Config("gauss.synonyms.enabled")
public GaussConfig setSynonymsEnabled(boolean enabled)
{
this.synonymsEnabled = enabled;
return this;
}
@Min(0)
@Max(38)
public int getNumberDefaultScale()
{
return numberDefaultScale;
}
@Config("gauss.number.default-scale")
public GaussConfig setNumberDefaultScale(int numberDefaultScale)
{
this.numberDefaultScale = numberDefaultScale;
return this;
}
@NotNull
public RoundingMode getNumberRoundingMode()
{
return numberRoundingMode;
}
@Config("gauss.number.rounding-mode")
public GaussConfig setNumberRoundingMode(RoundingMode numberRoundingMode)
{
this.numberRoundingMode = numberRoundingMode;
return this;
}
@Min(4000)
public int getVarcharMaxSize()
{
return varcharMaxSize;
}
@Config("gauss.varchar.max-size")
public GaussConfig setVarcharMaxSize(int varcharMaxSize)
{
this.varcharMaxSize = varcharMaxSize;
return this;
}
@Min(0)
@Max(9)
public int getTimestampDefaultPrecision()
{
return timestampDefaultPrecision;
}
@Config("gauss.timestamp.precision")
public GaussConfig setTimestampDefaultPrecision(int timestampDefaultPrecision)
{
this.timestampDefaultPrecision = timestampDefaultPrecision;
return this;
}
}
测试数据源注册
入参:
{
"catalogName": "gauss-test1",
"connectorName": "gauss",
"properties": {
"connection-url":"jdbc: gaussdb://172.30.***.***:***/yth_shuguan",
"connection-user":"***",
"connection-password":"***"
}
}
返回数据源信息:
2024-03-06T17:27:08.532+0800 INFO Bootstrap PROPERTY DEFAULT RUNTIME DESCRIPTION
2024-03-06T17:27:08.532+0800 INFO Bootstrap gauss.number.default-scale 10 10
2024-03-06T17:27:08.533+0800 INFO Bootstrap gauss.number.rounding-mode HALF_UP HALF_UP
2024-03-06T17:27:08.533+0800 INFO Bootstrap gauss.synonyms.enabled false false
2024-03-06T17:27:08.533+0800 INFO Bootstrap gauss.timestamp.precision 6 6
2024-03-06T17:27:08.533+0800 INFO Bootstrap gauss.varchar.max-size 4000 4000
2024-03-06T17:27:08.533+0800 INFO Bootstrap case-insensitive-name-matching false false
2024-03-06T17:27:08.533+0800 INFO Bootstrap case-insensitive-name-matching.cache-ttl 1.00m 1.00m
2024-03-06T17:27:08.533+0800 INFO Bootstrap connection-password [REDACTED] [REDACTED]
2024-03-06T17:27:08.533+0800 INFO Bootstrap connection-url ---- jdbc:gaussdb://172.30.***.***:***/yth_shuguan
2024-03-06T17:27:08.533+0800 INFO Bootstrap connection-user ---- yth
2024-03-06T17:27:08.533+0800 INFO Bootstrap password-credential-name ---- ----
2024-03-06T17:27:08.533+0800 INFO Bootstrap user-credential-name ---- ----
2024-03-06T17:27:08.533+0800 INFO Bootstrap allow-drop-table false false Allow connector to drop tables
2024-03-06T17:27:08.550+0800 INFO com.facebook.airlift.bootstrap.LifeCycleManager Life cycle startup complete. System ready.
2024-03-06T17:27:08.549+0800 INFO com.facebook.airlift.bootstrap.LifeCycleManager Life cycle starting...
其他说明
适配非JDBC数据源,如HuaweiHD651-V310(基于Hive开发),需要自己实现如下接口
接口名称 | 说明 |
---|---|
ConnectorFactory | Connector工厂 |
ConnectorMetadata | 获取数据源元数据 |
ConnectorHandleResolver | 获取各种Handler |
ConnectorSplitManager | 处理任务分片 |
ConnectorRecordSetProviderConnectorPageSourceProvider | 读取数据 |
ConnectorPageSinkProvider | 写入数据 |
参考
Presto官方文档:
https://prestodb.io/docs/0.253/
美团技术团队分享: