[Apache Doris] Apache Doris 元数据设计及DDL操作源码阅读
元数据设计
如上图,Doris 的元数据主要存储4类数据:
- 用户数据信息。包括数据库、表的 Schema、分片信息等。
- 各类作业信息。如导入作业,Clone 作业、SchemaChange 作业等。
- 用户及权限信息
- 集群及节点信息
元数据目录
元数据目录通过 FE 的配置项 meta_dir 指定。
bdb/ 目录下为 bdbje 的数据存放目录。
image/ 目录下为 image 文件的存放目录。
image.[logid] 是最新的 image 文件。后缀 logid 表明 image 所包含的最后一条日志的 id。
image.ckpt 是正在写入的 image 文件,如果写入成功,会重命名为 image.[logid],并替换掉旧的 image 文件。
VERSION 文件中记录着 cluster_id。cluster_id 唯一标识一个 Doris 集群。是在 leader 第一次启动时随机生成的一个 32 位整型。也可以通过 fe 配置项 cluster_id 来指定一个 cluster id。
ROLE 文件中记录的 FE 自身的角色。只有 FOLLOWER 和 OBSERVER 两种。其中 FOLLOWER 表示 FE 为一个可选举的节点。(注意:即使是 leader 节点,其角色也为 FOLLOWER)
DDL相关源代码阅读
启动MySQL服务
org.apache.doris.qe.QeService
if (nioEnabled) { mysqlServer = new NMysqlServer(port, scheduler); } else { mysqlServer = new MysqlServer(port, scheduler); }
DDL代码调用过程
org.apache.doris.qe.ConnectProcessor#dispatch 命令识别
switch (command) { case COM_INIT_DB: handleInitDb(); break; case COM_QUIT: handleQuit(); break; case COM_QUERY: handleQuery(); break; case COM_FIELD_LIST: handleFieldList(); break; case COM_PING: handlePing(); break; default: ctx.getState().setError("Unsupported command(" + command + ")"); LOG.warn("Unsupported command(" + command + ")"); break;
org.apache.doris.qe.ConnectProcessor#analyze 词法语法解析
// Parse statement with parser generated by CUP&FLEX SqlScanner input = new SqlScanner(new StringReader(originStmt), ctx.getSessionVariable().getSqlMode()); SqlParser parser = new SqlParser(input);
从连接中读取原始语句字符串
词法解析文件
• fe/fe-core/src/main/jflex/sql_scanner.flex
• 语法解析文件
• fe/fe-core/src/main/cup/sql_parser.cup
所有语法实现类:
StatementBase [vim org/apache/doris/analysis/StatementBase.java +33] ├── ExportStmt [vim org/apache/doris/analysis/ExportStmt.java +48] ├── ImportColumnsStmt [vim org/apache/doris/analysis/ImportColumnsStmt.java +21] ├── ImportDeleteOnStmt [vim org/apache/doris/analysis/ImportDeleteOnStmt.java +19] ├── ImportSequenceStmt [vim org/apache/doris/analysis/ImportSequenceStmt.java +19] ├── ImportWhereStmt [vim org/apache/doris/analysis/ImportWhereStmt.java +19] ├── KillStmt [vim org/apache/doris/analysis/KillStmt.java +19] ├── SetStmt [vim org/apache/doris/analysis/SetStmt.java +24] ├── UseStmt [vim org/apache/doris/analysis/UseStmt.java +33] ├── QueryStmt [vim org/apache/doris/analysis/QueryStmt.java +38] │ ├── SelectStmt [vim org/apache/doris/analysis/SelectStmt.java +65] │ └── SetOperationStmt [vim org/apache/doris/analysis/SetOperationStmt.java +36] ├── ShowStmt [vim org/apache/doris/analysis/ShowStmt.java +22] │ ├── AdminShowConfigStmt [vim org/apache/doris/analysis/AdminShowConfigStmt.java +33] │ ├── AdminShowDataSkewStmt [vim org/apache/doris/analysis/AdminShowDataSkewStmt.java +32] │ ├── AdminShowReplicaDistributionStmt [vim org/apache/doris/analysis/AdminShowReplicaDistributionStmt.java +34] │ ├── AdminShowReplicaStatusStmt [vim org/apache/doris/analysis/AdminShowReplicaStatusStmt.java +39] │ ├── DescribeStmt [vim org/apache/doris/analysis/DescribeStmt.java +54] │ ├── HelpStmt [vim org/apache/doris/analysis/HelpStmt.java +26] │ ├── ShowAlterStmt [vim org/apache/doris/analysis/ShowAlterStmt.java +46] │ ├── ShowAuthorStmt [vim org/apache/doris/analysis/ShowAuthorStmt.java +23] │ ├── ShowBackendsStmt [vim org/apache/doris/analysis/ShowBackendsStmt.java +30] │ ├── ShowBackupStmt [vim org/apache/doris/analysis/ShowBackupStmt.java +38] │ ├── ShowBrokerStmt [vim org/apache/doris/analysis/ShowBrokerStmt.java +30] │ ├── ShowCharsetStmt [vim org/apache/doris/analysis/ShowCharsetStmt.java +23] │ ├── ShowClusterStmt [vim org/apache/doris/analysis/ShowClusterStmt.java +34] │ ├── ShowCollationStmt [vim org/apache/doris/analysis/ShowCollationStmt.java +24] │ ├── ShowColumnStatsStmt [vim org/apache/doris/analysis/ShowColumnStatsStmt.java +28] │ ├── ShowColumnStmt [vim org/apache/doris/analysis/ShowColumnStmt.java +28] │ ├── ShowCreateDbStmt [vim org/apache/doris/analysis/ShowCreateDbStmt.java +36] │ ├── ShowCreateFunctionStmt [vim org/apache/doris/analysis/ShowCreateFunctionStmt.java +32] │ ├── ShowCreateRoutineLoadStmt [vim org/apache/doris/analysis/ShowCreateRoutineLoadStmt.java +24] │ ├── ShowCreateTableStmt [vim org/apache/doris/analysis/ShowCreateTableStmt.java +29] │ ├── ShowDataStmt [vim org/apache/doris/analysis/ShowDataStmt.java +56] │ ├── ShowDbIdStmt [vim org/apache/doris/analysis/ShowDbIdStmt.java +29] │ ├── ShowDbStmt [vim org/apache/doris/analysis/ShowDbStmt.java +27] │ ├── ShowDeleteStmt [vim org/apache/doris/analysis/ShowDeleteStmt.java +31] │ ├── ShowDynamicPartitionStmt [vim org/apache/doris/analysis/ShowDynamicPartitionStmt.java +29] │ ├── ShowEncryptKeysStmt [vim org/apache/doris/analysis/ShowEncryptKeysStmt.java +32] │ ├── ShowEnginesStmt [vim org/apache/doris/analysis/ShowEnginesStmt.java +23] │ ├── ShowEventsStmt [vim org/apache/doris/analysis/ShowEventsStmt.java +23] │ ├── ShowExportStmt [vim org/apache/doris/analysis/ShowExportStmt.java +40] │ ├── ShowFrontendsStmt [vim org/apache/doris/analysis/ShowFrontendsStmt.java +30] │ ├── ShowFunctionsStmt [vim org/apache/doris/analysis/ShowFunctionsStmt.java +32] │ ├── ShowGrantsStmt [vim org/apache/doris/analysis/ShowGrantsStmt.java +32] │ ├── ShowIndexStmt [vim org/apache/doris/analysis/ShowIndexStmt.java +32] │ ├── ShowLoadProfileStmt [vim org/apache/doris/analysis/ShowLoadProfileStmt.java +27] │ ├── ShowLoadStmt [vim org/apache/doris/analysis/ShowLoadStmt.java +42] │ ├── ShowLoadWarningsStmt [vim org/apache/doris/analysis/ShowLoadWarningsStmt.java +36] │ ├── ShowMigrationsStmt [vim org/apache/doris/analysis/ShowMigrationsStmt.java +31] │ ├── ShowOpenTableStmt [vim org/apache/doris/analysis/ShowOpenTableStmt.java +23] │ ├── ShowPartitionIdStmt [vim org/apache/doris/analysis/ShowPartitionIdStmt.java +29] │ ├── ShowPartitionsStmt [vim org/apache/doris/analysis/ShowPartitionsStmt.java +49] │ ├── ShowPluginsStmt [vim org/apache/doris/analysis/ShowPluginsStmt.java +23] │ ├── ShowProcStmt [vim org/apache/doris/analysis/ShowProcStmt.java +32] │ ├── ShowProcedureStmt [vim org/apache/doris/analysis/ShowProcedureStmt.java +23] │ ├── ShowProcesslistStmt [vim org/apache/doris/analysis/ShowProcesslistStmt.java +24] │ ├── ShowQueryProfileStmt [vim org/apache/doris/analysis/ShowQueryProfileStmt.java +27] │ ├── ShowRepositoriesStmt [vim org/apache/doris/analysis/ShowRepositoriesStmt.java +25] │ ├── ShowResourcesStmt [vim org/apache/doris/analysis/ShowResourcesStmt.java +37] │ ├── ShowRestoreStmt [vim org/apache/doris/analysis/ShowRestoreStmt.java +38] │ ├── ShowRolesStmt [vim org/apache/doris/analysis/ShowRolesStmt.java +29] │ ├── ShowRollupStmt [vim org/apache/doris/analysis/ShowRollupStmt.java +28] │ ├── ShowRoutineLoadStmt [vim org/apache/doris/analysis/ShowRoutineLoadStmt.java +34] │ ├── ShowRoutineLoadTaskStmt [vim org/apache/doris/analysis/ShowRoutineLoadTaskStmt.java +32] │ ├── ShowSmallFilesStmt [vim org/apache/doris/analysis/ShowSmallFilesStmt.java +32] │ ├── ShowSnapshotStmt [vim org/apache/doris/analysis/ShowSnapshotStmt.java +29] │ ├── ShowSqlBlockRuleStmt [vim org/apache/doris/analysis/ShowSqlBlockRuleStmt.java +31] │ ├── ShowStatusStmt [vim org/apache/doris/analysis/ShowStatusStmt.java +23] │ ├── ShowStreamLoadStmt [vim org/apache/doris/analysis/ShowStreamLoadStmt.java +39] │ ├── ShowSyncJobStmt [vim org/apache/doris/analysis/ShowSyncJobStmt.java +33] │ ├── ShowTableIdStmt [vim org/apache/doris/analysis/ShowTableIdStmt.java +30] │ ├── ShowTableStatsStmt [vim org/apache/doris/analysis/ShowTableStatsStmt.java +32] │ ├── ShowTableStatusStmt [vim org/apache/doris/analysis/ShowTableStatusStmt.java +35] │ ├── ShowTableStmt [vim org/apache/doris/analysis/ShowTableStmt.java +34] │ ├── ShowTabletStmt [vim org/apache/doris/analysis/ShowTabletStmt.java +39] │ ├── ShowTransactionStmt [vim org/apache/doris/analysis/ShowTransactionStmt.java +35] │ ├── ShowTrashDiskStmt [vim org/apache/doris/analysis/ShowTrashDiskStmt.java +33] │ ├── ShowTrashStmt [vim org/apache/doris/analysis/ShowTrashStmt.java +36] │ ├── ShowTriggersStmt [vim org/apache/doris/analysis/ShowTriggersStmt.java +23] │ ├── ShowUserPropertyStmt [vim org/apache/doris/analysis/ShowUserPropertyStmt.java +42] │ ├── ShowUserStmt [vim org/apache/doris/analysis/ShowUserStmt.java +25] │ ├── ShowVariablesStmt [vim org/apache/doris/analysis/ShowVariablesStmt.java +29] │ ├── ShowViewStmt [vim org/apache/doris/analysis/ShowViewStmt.java +39] │ ├── ShowWarningStmt [vim org/apache/doris/analysis/ShowWarningStmt.java +23] │ └── ShowWhiteListStmt [vim org/apache/doris/analysis/ShowWhiteListStmt.java +23] ├── TransactionStmt [vim org/apache/doris/analysis/TransactionStmt.java +22] │ ├── TransactionBeginStmt [vim org/apache/doris/analysis/TransactionBeginStmt.java +24] │ ├── TransactionCommitStmt [vim org/apache/doris/analysis/TransactionCommitStmt.java +19] │ └── TransactionRollbackStmt [vim org/apache/doris/analysis/TransactionRollbackStmt.java +19] ├── UnsupportedStmt [vim org/apache/doris/analysis/UnsupportedStmt.java +22] │ └── EmptyStmt [vim org/apache/doris/analysis/EmptyStmt.java +19] └── DdlStmt [vim org/apache/doris/analysis/DdlStmt.java +19] ├── AdminCancelRepairTableStmt [vim org/apache/doris/analysis/AdminCancelRepairTableStmt.java +33] ├── AdminCheckTabletsStmt [vim org/apache/doris/analysis/AdminCheckTabletsStmt.java +33] ├── AdminCleanTrashStmt [vim org/apache/doris/analysis/AdminCleanTrashStmt.java +34] ├── AdminRepairTableStmt [vim org/apache/doris/analysis/AdminRepairTableStmt.java +33] ├── AdminSetConfigStmt [vim org/apache/doris/analysis/AdminSetConfigStmt.java +32] ├── AdminSetReplicaStatusStmt [vim org/apache/doris/analysis/AdminSetReplicaStatusStmt.java +30] ├── AlterClusterStmt [vim org/apache/doris/analysis/AlterClusterStmt.java +29] ├── AlterColumnStatsStmt [vim org/apache/doris/analysis/AlterColumnStatsStmt.java +33] ├── AlterDatabasePropertyStmt [vim org/apache/doris/analysis/AlterDatabasePropertyStmt.java +24] ├── AlterDatabaseQuotaStmt [vim org/apache/doris/analysis/AlterDatabaseQuotaStmt.java +30] ├── AlterDatabaseRename [vim org/apache/doris/analysis/AlterDatabaseRename.java +34] ├── AlterRoutineLoadStmt [vim org/apache/doris/analysis/AlterRoutineLoadStmt.java +34] ├── AlterSqlBlockRuleStmt [vim org/apache/doris/analysis/AlterSqlBlockRuleStmt.java +31] ├── AlterSystemStmt [vim org/apache/doris/analysis/AlterSystemStmt.java +28] ├── AlterTableStatsStmt [vim org/apache/doris/analysis/AlterTableStatsStmt.java +33] ├── AlterTableStmt [vim org/apache/doris/analysis/AlterTableStmt.java +37] ├── CancelLoadStmt [vim org/apache/doris/analysis/CancelLoadStmt.java +26] ├── CreateClusterStmt [vim org/apache/doris/analysis/CreateClusterStmt.java +33] ├── CreateDataSyncJobStmt [vim org/apache/doris/analysis/CreateDataSyncJobStmt.java +36] ├── CreateDbStmt [vim org/apache/doris/analysis/CreateDbStmt.java +31] ├── CreateEncryptKeyStmt [vim org/apache/doris/analysis/CreateEncryptKeyStmt.java +30] ├── CreateFileStmt [vim org/apache/doris/analysis/CreateFileStmt.java +35] ├── CreateFunctionStmt [vim org/apache/doris/analysis/CreateFunctionStmt.java +47] ├── CreateMaterializedViewStmt [vim org/apache/doris/analysis/CreateMaterializedViewStmt.java +43] ├── CreateRepositoryStmt [vim org/apache/doris/analysis/CreateRepositoryStmt.java +28] ├── CreateResourceStmt [vim org/apache/doris/analysis/CreateResourceStmt.java +32] ├── CreateRoleStmt [vim org/apache/doris/analysis/CreateRoleStmt.java +28] ├── CreateRoutineLoadStmt [vim org/apache/doris/analysis/CreateRoutineLoadStmt.java +48] ├── CreateSqlBlockRuleStmt [vim org/apache/doris/analysis/CreateSqlBlockRuleStmt.java +37] ├── CreateTableAsSelectStmt [vim org/apache/doris/analysis/CreateTableAsSelectStmt.java +26] ├── CreateTableLikeStmt [vim org/apache/doris/analysis/CreateTableLikeStmt.java +31] ├── CreateTableStmt [vim org/apache/doris/analysis/CreateTableStmt.java +56] ├── CreateUserStmt [vim org/apache/doris/analysis/CreateUserStmt.java +36] ├── DeleteStmt [vim org/apache/doris/analysis/DeleteStmt.java +35] ├── DropClusterStmt [vim org/apache/doris/analysis/DropClusterStmt.java +31] ├── DropDbStmt [vim org/apache/doris/analysis/DropDbStmt.java +30] ├── DropEncryptKeyStmt [vim org/apache/doris/analysis/DropEncryptKeyStmt.java +28] ├── DropFileStmt [vim org/apache/doris/analysis/DropFileStmt.java +34] ├── DropFunctionStmt [vim org/apache/doris/analysis/DropFunctionStmt.java +27] ├── DropMaterializedViewStmt [vim org/apache/doris/analysis/DropMaterializedViewStmt.java +29] ├── DropRepositoryStmt [vim org/apache/doris/analysis/DropRepositoryStmt.java +27] ├── DropResourceStmt [vim org/apache/doris/analysis/DropResourceStmt.java +27] ├── DropRoleStmt [vim org/apache/doris/analysis/DropRoleStmt.java +28] ├── DropSqlBlockRuleStmt [vim org/apache/doris/analysis/DropSqlBlockRuleStmt.java +30] ├── DropTableStmt [vim org/apache/doris/analysis/DropTableStmt.java +28] ├── DropUserStmt [vim org/apache/doris/analysis/DropUserStmt.java +27] ├── EnterStmt [vim org/apache/doris/analysis/EnterStmt.java +25] ├── GrantStmt [vim org/apache/doris/analysis/GrantStmt.java +39] ├── InsertStmt [vim org/apache/doris/analysis/InsertStmt.java +66] ├── InstallPluginStmt [vim org/apache/doris/analysis/InstallPluginStmt.java +31] ├── LinkDbStmt [vim org/apache/doris/analysis/LinkDbStmt.java +31] ├── LoadStmt [vim org/apache/doris/analysis/LoadStmt.java +45] ├── MigrateDbStmt [vim org/apache/doris/analysis/MigrateDbStmt.java +29] ├── PauseRoutineLoadStmt [vim org/apache/doris/analysis/PauseRoutineLoadStmt.java +26] ├── PauseSyncJobStmt [vim org/apache/doris/analysis/PauseSyncJobStmt.java +22] ├── RecoverDbStmt [vim org/apache/doris/analysis/RecoverDbStmt.java +33] ├── RecoverPartitionStmt [vim org/apache/doris/analysis/RecoverPartitionStmt.java +32] ├── RecoverTableStmt [vim org/apache/doris/analysis/RecoverTableStmt.java +32] ├── ResumeRoutineLoadStmt [vim org/apache/doris/analysis/ResumeRoutineLoadStmt.java +26] ├── ResumeSyncJobStmt [vim org/apache/doris/analysis/ResumeSyncJobStmt.java +22] ├── RevokeStmt [vim org/apache/doris/analysis/RevokeStmt.java +32] ├── SetUserPropertyStmt [vim org/apache/doris/analysis/SetUserPropertyStmt.java +31] ├── StopRoutineLoadStmt [vim org/apache/doris/analysis/StopRoutineLoadStmt.java +23] ├── StopSyncJobStmt [vim org/apache/doris/analysis/StopSyncJobStmt.java +22] ├── SyncStmt [vim org/apache/doris/analysis/SyncStmt.java +22] ├── TruncateTableStmt [vim org/apache/doris/analysis/TruncateTableStmt.java +27] ├── UninstallPluginStmt [vim org/apache/doris/analysis/UninstallPluginStmt.java +28] ├── UpdateStmt [vim org/apache/doris/analysis/UpdateStmt.java +35] ├── AbstractBackupStmt [vim org/apache/doris/analysis/AbstractBackupStmt.java +36] │ ├── BackupStmt [vim org/apache/doris/analysis/BackupStmt.java +29] │ └── RestoreStmt [vim org/apache/doris/analysis/RestoreStmt.java +33] ├── BaseViewStmt [vim org/apache/doris/analysis/BaseViewStmt.java +39] │ ├── AlterViewStmt [vim org/apache/doris/analysis/AlterViewStmt.java +31] │ └── CreateViewStmt [vim org/apache/doris/analysis/CreateViewStmt.java +33] └── CancelStmt [vim org/apache/doris/analysis/CancelStmt.java +19] ├── CancelAlterSystemStmt [vim org/apache/doris/analysis/CancelAlterSystemStmt.java +28] ├── CancelAlterTableStmt [vim org/apache/doris/analysis/CancelAlterTableStmt.java +31] └── CancelBackupStmt [vim org/apache/doris/analysis/CancelBackupStmt.java +30]
org.apache.doris.qe.StmtExecutor#execute(TUniqueId)
analyze(context.getSessionVariable().toThrift()); //语义解析 if (isForwardToMaster()) { forwardToMaster(); //转发处理 if (masterOpExecutor != null && masterOpExecutor.getQueryId() != null) { context.setQueryId(masterOpExecutor.getQueryId()); } return; } else { LOG.debug("no need to transfer to Master. stmt: {}", context.getStmtId()); } //命令执行 else if (parsedStmt instanceof DdlStmt) { handleDdlStmt(); } else if (parsedStmt instanceof ShowStmt) { handleShow(); } else if (parsedStmt instanceof KillStmt) { handleKill(); }
语义解析
判断含义的正确性
@Override public void analyze(Analyzer analyzer) throws UserException { super.analyze(analyzer); tableName.analyze(analyzer); checkTblPriv(ConnectContext.get(), tableName.getDb(), tableName.getTbl(), PrivPredicate.CREATE) analyzeEngineName(); keysDesc.analyze(columnDefs); for (ColumnDef columnDef : columnDefs) { columnDef.analyze(engineName.equals("olap")); } partitionDesc.analyze(columnDefs, properties); distributionDesc.analyze(columnSet); }
验证名称是否合法
权限是否正确
分区是否合法
列类型是否合法
转发处理
Master、Follower、Observer
只有Master有元数据的修改能力
所有需要修改元数据的操作,需要转发到Master去执行
转发类型:
FORWARD_NO_SYNC
FORWARD_WITH_SYNC
NO_FORWARD
DDL 采用 FORWARD_WITH_SYNC
命令执行
org.apache.doris.qe.DdlExecutor#execute()
//根据语句类型执行相应的函数 if (ddlStmt instanceof CreateTableStmt) { catalog.createTable((CreateTableStmt) ddlStmt); }
支持多种表类型, 除了olap 表, 其余都为映射表
if (engineName.equals("olap")) { createOlapTable(db, stmt); return; } else if (engineName.equals("odbc")) { createOdbcTable(db, stmt); return; } else if (engineName.equals("mysql")) { createMysqlTable(db, stmt); return; } else if (engineName.equals("broker")) { createBrokerTable(db, stmt); return; } else if (engineName.equalsIgnoreCase("elasticsearch") || engineName.equalsIgnoreCase("es")) { createEsTable(db, stmt); return; } else if (engineName.equalsIgnoreCase("hive")) { createHiveTable(db, stmt); return; }
org.apache.doris.catalog.Catalog#createOlapTable
//将语法对象转为元数据对象 String tableName = stmt.getTableName(); LOG.debug("begin create olap table: {}", tableName); // create columns List<Column> baseSchema = stmt.getColumns(); validateColumns(baseSchema); // create partition info PartitionDesc partitionDesc = stmt.getPartitionDesc(); PartitionInfo partitionInfo = null; //创建table对象 long tableId = Catalog.getCurrentCatalog().getNextId(); OlapTable olapTable = new OlapTable(tableId, tableName, baseSchema, keysType, partitionInfo, distributionInfo, indexes); // 创建Partition 对象 if (partitionInfo.getType() == PartitionType.UNPARTITIONED) { // this is a 1-level partitioned table // use table name as partition name String partitionName = tableName; long partitionId = partitionNameToId.get(partitionName); // create partition Partition partition = createPartitionWithIndices() olapTable.addPartition(partition); } //添加元数据并进行持久化 Pair<Boolean, Boolean> result = db.createTableWithLock(olapTable, false, stmt.isSetIfNotExists());
org.apache.doris.catalog.Catalog#createPartitionWithIndices
分区是表的实体
// create base index first. Preconditions.checkArgument(baseIndexId != -1); MaterializedIndex baseIndex = new MaterializedIndex(baseIndexId, IndexState.NORMAL); // create partition with base index Partition partition = new Partition(partitionId, partitionName, baseIndex, distributionInfo); // add to index map Map<Long, MaterializedIndex> indexMap = new HashMap<>(); indexMap.put(baseIndexId, baseIndex); // create rollup index if has for (long indexId : indexIdToMeta.keySet()) { if (indexId == baseIndexId) { continue; } MaterializedIndex rollup = new MaterializedIndex(indexId, IndexState.NORMAL); indexMap.put(indexId, rollup); } for (Map.Entry<Long, MaterializedIndex> entry : indexMap.entrySet()) { // create tablets int schemaHash = indexMeta.getSchemaHash(); TabletMeta tabletMeta = new TabletMeta(dbId, tableId, partitionId, indexId, schemaHash, storageMedium); createTablets(clusterName, index, ReplicaState.NORMAL, distributionInfo, version, versionHash, replicaAlloc, tabletMeta, tabletIdSet); // add create replica task for olap short shortKeyColumnCount = indexMeta.getShortKeyColumnCount(); TStorageType storageType = indexMeta.getStorageType(); List<Column> schema = indexMeta.getSchema(); KeysType keysType = indexMeta.getKeysType(); int totalTaskNum = index.getTablets().size() * totalReplicaNum; MarkedCountDownLatch<Long, Long> countDownLatch = new MarkedCountDownLatch<Long, Long>(totalTaskNum); AgentBatchTask batchTask = new AgentBatchTask(); for (Tablet tablet : index.getTablets()) { long tabletId = tablet.getId(); for (Replica replica : tablet.getReplicas()) { long backendId = replica.getBackendId(); countDownLatch.addMark(backendId, tabletId); CreateReplicaTask task = new CreateReplicaTask(backendId, dbId, tableId, partitionId, indexId, tabletId, shortKeyColumnCount, schemaHash, version, versionHash, keysType, storageType, storageMedium, schema, bfColumns, bfFpp, countDownLatch, indexes, isInMemory, tabletType); task.setStorageFormat(storageFormat); batchTask.addTask(task); // add to AgentTaskQueue for handling finish report. // not for resending task AgentTaskQueue.addTask(task); } } AgentTaskExecutor.submit(batchTask); }
整体流程:
- 创建Partition 对象
- 创建MaterializedIndex对象
- 对于每个MaterializedIndex对象 创建创建Tablet
- 创建replica并下发任务到BE
// estimate timeout long timeout = Config.tablet_create_timeout_second * 1000L * totalTaskNum; timeout = Math.min(timeout, Config.max_create_table_timeout_second * 1000); try { ok = countDownLatch.await(timeout, TimeUnit.MILLISECONDS); } catch (InterruptedException e) { LOG.warn("InterruptedException: ", e); ok = false; }
等待BE执行任务完成
org.apache.doris.catalog.Database#createTableWithLock
idToTable.put(table.getId(), table); nameToTable.put(table.getName(), table); lowerCaseToTableName.put(tableName.toLowerCase(), tableName); if (!isReplay) { // Write edit log CreateTableInfo info = new CreateTableInfo(fullQualifiedName, table); Catalog.getCurrentCatalog().getEditLog().logCreateTable(info); } if (table.getType() == TableType.ELASTICSEARCH) { Catalog.getCurrentCatalog().getEsRepository().registerTable((EsTable) table); }
将table添加到DataBase对象里
判断是否replay
写入元数据日志
流程总结:
FE与BE交互
FE 发送任务
BE执行
BE汇报执行结果
FE汇总结果
AgentBatchTask batchTask = new AgentBatchTask(); for (Tablet tablet : index.getTablets()) { long tabletId = tablet.getId(); for (Replica replica : tablet.getReplicas()) { long backendId = replica.getBackendId(); countDownLatch.addMark(backendId, tabletId); CreateReplicaTask task = new CreateReplicaTask(backendId, dbId, tableId, partitionId, indexId, tabletId, shortKeyColumnCount, schemaHash, version, versionHash, keysType, storageType, storageMedium, schema, bfColumns, bfFpp, countDownLatch, indexes, isInMemory, tabletType); task.setStorageFormat(storageFormat); batchTask.addTask(task); // add to AgentTaskQueue for handling finish report. // not for resending task AgentTaskQueue.addTask(task); } } AgentTaskExecutor.submit(batchTask);
AgentBatchTask:
收集Task并按照Be分组
AgentTaskExecutor:
发送AgentBatchTask
AgentTaskQueue:
处理任务完成的上报
BE任务接收
be/src/agent/agent_server.cpp
接收Task
// resend request when something is wrong(BE may need some logic to guarantee idempotence. void AgentServer::submit_tasks(TAgentResult& agent_result, const std::vector<TAgentTaskRequest>& tasks) { Status ret_st; // TODO check master_info here if it is the same with that of heartbeat rpc if (_master_info.network_address.hostname == "" || _master_info.network_address.port == 0) { Status ret_st = Status::Cancelled("Have not get FE Master heartbeat yet"); ret_st.to_thrift(&agent_result.status); return; } for (auto task : tasks) { VLOG_RPC << "submit one task: " << apache::thrift::ThriftDebugString(task).c_str(); TTaskType::type task_type = task.task_type; int64_t signature = task.signature; #define HANDLE_TYPE(t_task_type, work_pool, req_member) \ case t_task_type: \ if (task.__isset.req_member) { \ work_pool->submit_task(task); \ } else { \ ret_st = Status::InvalidArgument(strings::Substitute( \ "task(signature=$0) has wrong request member", signature)); \ } \ break; ... ret_st.to_thrift(&agent_result.status); }
工作线程
while (_is_work) { TAgentTaskRequest agent_task_req; TCreateTabletReq create_tablet_req; { lock_guard<Mutex> worker_thread_lock(_worker_thread_lock); while (_is_work && _tasks.empty()) { _worker_thread_condition_variable.wait(); } if (!_is_work) { return; } //从队列中取出任务 agent_task_req = _tasks.front(); create_tablet_req = agent_task_req.create_tablet_req; _tasks.pop_front(); //执行 OLAPStatus create_status = _env->storage_engine()->create_tablet(create_tablet_req); TFinishTaskRequest finish_task_request; finish_task_request.__set_finish_tablet_infos(finish_tablet_infos); finish_task_request.__set_backend(_backend); finish_task_request.__set_report_version(_s_report_version); finish_task_request.__set_task_type(agent_task_req.task_type); finish_task_request.__set_signature(agent_task_req.signature); finish_task_request.__set_task_status(task_status); //汇报结果 _finish_task(finish_task_request); }
处理任务汇报
org.apache.doris.service.FrontendServiceImpl#finishTask
org.apache.doris.master.MasterImpl#finishTask
FE、BE通过Thrift协议通信
错误处理
org.apache.doris.task.AgentTaskQueue 存储正在执行的Task
org.apache.doris.master.ReportHandler#handleReport
org.apache.doris.master.ReportHandler#taskReport
BE: Report tasks/olap tablet/disk state to the master server
FE master 处理任务,超时会进行重试
private static void taskReport(long backendId, Map<TTaskType, Set<Long>> runningTasks) { ... // to escape sending duplicate agent task to be if (task.shouldResend(taskReportTime)) { batchTask.addTask(task); } ... }
元数据持久化
Edit类似WAL
BDBJE 分布式KV存储
元数据持久化:org.apache.doris.catalog.Database#createTableWithLock
public Pair<Boolean, Boolean> createTableWithLock(Table table, boolean isReplay, boolean setIfNotExist) { ... //更新内存 nameToTable.put(table.getName(), table); // Write edit log //构建元数据日志 CreateTableInfo info = new CreateTableInfo(fullQualifiedName, table); //写入元数据日志 Catalog.getCurrentCatalog().getEditLog().logCreateTable(info); ... }
元数据回放
元数据回放发生在FE leader 给 其他FE节点同步的时候
逐一回放元数据
在内存中复原元数据
org.apache.doris.catalog.Catalog#replayCreateTable
public void replayCreateTable(String dbName, Table table) { Database db = this.fullNameToDb.get(dbName); db.createTableWithLock(table, true, false); ... }
如何实现一个新的语句
fe/fe-core/src/main/cup/sql_parser.cup 语法文件
KW_CREATE opt_external:isExternal KW_TABLE opt_if_not_exists:ifNotExists table_name:name LPAREN column_definition_list:columns COMMA index_definition_list:indexes RPAREN opt_engine:engineName opt_keys:keys opt_comment:tableComment opt_partition:partition opt_distribution:distribution opt_rollup:index opt_properties:tblProperties opt_ext_properties:extProperties {: RESULT = new CreateTableStmt(ifNotExists, isExternal, name, columns, indexes, engineName, keys, partition, distribution, tblProperties, extProperties, tableComment, index); :}
fe/fe-core/src/main/jflex/sql_scanner.flex 词法文件
keywordMap.put("create", new Integer(SqlParserSymbols.KW_CREATE)); keywordMap.put("cross", new Integer(SqlParserSymbols.KW_CROSS)); keywordMap.put("cube", new Integer(SqlParserSymbols.KW_CUBE)); keywordMap.put("current", new Integer(SqlParserSymbols.KW_CURRENT)); keywordMap.put("current_user", new Integer(SqlParserSymbols.KW_CURRENT_USER)); keywordMap.put("data", new Integer(SqlParserSymbols.KW_DATA)); keywordMap.put("database", new Integer(SqlParserSymbols.KW_DATABASE));
词法语法的代码生成:
cd fe/ && mvn clean install –DskipTests • SqlScanner.java • SqlParser.java • SqlParserSymbols.java
实现新语句步骤总结:
- 定义词法语法文件
- 实现对应的语句类,比如CreateTableStmt
- 实现元数据修改的方法,如Catalog.createTable()
- 定义对应操作的元数据日志类,如CreateTableInfo
- 实现元数据日志的写入
- 实现对应的replay方法,如Catalog.replayCreateTable()
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· TypeScript + Deepseek 打造卜卦网站:技术与玄学的结合
· 阿里巴巴 QwQ-32B真的超越了 DeepSeek R-1吗?
· 【译】Visual Studio 中新的强大生产力特性
· 2025年我用 Compose 写了一个 Todo App
· 张高兴的大模型开发实战:(一)使用 Selenium 进行网页爬虫