Castled 源码解析 - container 模块说明

container 属于Castled api 后端服务,后端包含了任务调度,db 迁移,有几个服务是比较重要的
主要是pipelineservice,ExternalAppService,WarehouseService,而且官方还提供了一套基于events 的处理
主要包含PipelineEvent,CastledEvent,其他的主要是基于dropwizard 开发的rest api 了,整体代码并不难
pipelineservice 在其中比较核心,进行了app 与connector 的关联操作,pipelineservice 会使用到event ,task 处理
PipelineExecutor 对于数据的处理主要是在此task 执行的(时间数据拉取以及发送处理就是在这里边的)

pipelineservice核心方法

参考下图

 

 

PipelineExecutor 处理

 
 public String executeTask(Task task) {
        Long pipelineId = ((Number) task.getParams().get(CommonConstants.PIPELINE_ID)).longValue();
        Pipeline pipeline = this.pipelineService.getActivePipeline(pipelineId);
        if (pipeline == null) {
            return null;
        }
        // 挺重要的,进行状态统计的
        WarehouseSyncFailureListener warehouseSyncFailureListener = null;
        Warehouse warehouse = this.warehouseService.getWarehouse(pipeline.getWarehouseId());
        PipelineRun pipelineRun = getOrCreatePipelineRun(pipelineId);
        WarehousePollContext warehousePollContext = WarehousePollContext.builder()
                .primaryKeys(PipelineUtils.getWarehousePrimaryKeys(pipeline)).pipelineUUID(pipeline.getUuid())
                .pipelineRunId(pipelineRun.getId()).warehouseConfig(warehouse.getConfig())
                .dataEncryptionKey(encryptionManager.getEncryptionKey(warehouse.getTeamId()))
                .queryMode(pipeline.getQueryMode())
                .query(pipeline.getSourceQuery()).pipelineId(pipeline.getId()).build();
        try {
          // 调用warehouse connector,获取数据
            WarehouseExecutionContext warehouseExecutionContext = pollRecords(warehouse, pipelineRun, warehousePollContext);
 
            log.info("Poll records completed for pipeline {}", pipeline.getName());
            this.pipelineService.updatePipelineRunstage(pipelineRun.getId(), PipelineRunStage.RECORDS_POLLED);
 
            ExternalApp externalApp = externalAppService.getExternalApp(pipeline.getAppId());
            ExternalAppConnector externalAppConnector = this.externalAppConnectors.get(externalApp.getType());
            RecordSchema appSchema = externalAppConnector.getSchema(externalApp.getConfig(), pipeline.getAppSyncConfig())
                    .getAppSchema();
 
            log.info("App schema fetch completed for pipeline {}", pipeline.getName());
 
            warehousePollContext.setWarehouseSchema(warehouseExecutionContext.getWarehouseSchema());
            warehouseSyncFailureListener = warehouseConnectors.get(warehouse.getType())
                    .syncFailureListener(warehousePollContext);
 
            MysqlErrorTracker mysqlErrorTracker = new MysqlErrorTracker(warehousePollContext);
 
            ErrorOutputStream schemaMappingErrorOutputStream = new ErrorOutputStream(warehouseSyncFailureListener, mysqlErrorTracker);
 
            SchemaMappedMessageInputStream schemaMappedMessageInputStream = new SchemaMappedMessageInputStream(
                    appSchema, warehouseExecutionContext.getMessageInputStreamImpl(), pipeline.getDataMapping().appWarehouseMapping(),
                    pipeline.getDataMapping().warehouseAppMapping(), schemaMappingErrorOutputStream);
 
            SchemaMappedRecordOutputStream schemaMappedRecordOutputStream =
                    new SchemaMappedRecordOutputStream(SchemaUtils.filterSchema(warehousePollContext.getWarehouseSchema(),
                            PipelineUtils.getWarehousePrimaryKeys(pipeline)), warehouseSyncFailureListener,
                            pipeline.getDataMapping().warehouseAppMapping());
 
            ErrorOutputStream sinkErrorOutputStream = new ErrorOutputStream(schemaMappedRecordOutputStream,
                    new SchemaMappedErrorTracker(mysqlErrorTracker, warehouseExecutionContext.getWarehouseSchema(), pipeline.getDataMapping().warehouseAppMapping()));
 
            log.info("App Sync started for pipeline {}", pipeline.getName());
 
            List<String> mappedAppFields = pipeline.getDataMapping().getFieldMappings().stream().filter(mapping -> !mapping.isSkipped())
                    .map(FieldMapping::getAppField).collect(Collectors.toList());
 
            DataSinkRequest dataSinkRequest = DataSinkRequest.builder().externalApp(externalApp).errorOutputStream(sinkErrorOutputStream)
                    .appSyncConfig(pipeline.getAppSyncConfig()).mappedFields(mappedAppFields)
                    .objectSchema(appSchema).primaryKeys(pipeline.getDataMapping().getPrimaryKeys())
                    .messageInputStream(schemaMappedMessageInputStream)
                    .build();
 
           //  进行数据同步使用了MonitoredDataSink 对象,实现了一些统计信息
            PipelineSyncStats pipelineSyncStats = monitoredDataSink.syncRecords(externalAppConnector.getDataSink(),
                    pipelineRun.getPipelineSyncStats(), pipelineRun.getId(), dataSinkRequest);
 
            schemaMappedMessageInputStream.close();
 
            log.info("App Sync completed for pipeline {}", pipeline.getName());
            //flush output streams
            schemaMappingErrorOutputStream.flushFailedRecords();
            sinkErrorOutputStream.flushFailedRecords();
 
            warehouseConnectors.get(warehouse.getType()).getDataPoller().cleanupPipelineRunResources(warehousePollContext);
            // Also add the records that failed schema mapping phase to the final stats
            pipelineSyncStats.setRecordsFailed(schemaMappedMessageInputStream.getFailedRecords() + pipelineSyncStats.getRecordsFailed());
            this.pipelineService.markPipelineRunProcessed(pipelineRun.getId(), pipelineSyncStats);
 
        } catch (Exception e) {
            if (ObjectRegistry.getInstance(AppShutdownHandler.class).isShutdownTriggered()) {
                throw new PipelineInterruptedException();
            }
            this.pipelineService.markPipelineRunFailed(pipelineRun.getId(), Optional.ofNullable(e.getMessage()).orElse("Unknown Error"));
            log.error("Pipeline run failed for pipeline {} ", pipeline.getId(), e);
            this.warehouseConnectors.get(warehouse.getType()).getDataPoller().cleanupPipelineRunResources(warehousePollContext);
            Optional.ofNullable(warehouseSyncFailureListener).ifPresent(syncFailureListener ->
                    syncFailureListener.cleanupResources(pipeline.getUuid(), pipelineRun.getId(), warehouse.getConfig()));
 
            if (e instanceof PipelineExecutionException) {
                handlePipelineExecutionException(pipeline, (PipelineExecutionException) e);
            } else {
                log.error("Pipeline run failed for pipeline {} ", pipeline.getId(), e);
            }
        }
        return null;
    }

说明

目前从代码中可以看到每个创建的任务会发送消息到Castled的统计服务中,如果不需要的话,最好处理下,目前看配置定义,暂时没有开关可以禁用
尽管系统使用了kafka,但是感觉kafaka 的使用并不是很明显(更多是一个任务排队的处理),并不是基于kafka 的消息发送处理

参考资料

https://github.com/castledio/castled

posted on   荣锋亮  阅读(67)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2021-02-01 cube.js 上下文实践的一些说明
2020-02-01 Performance Profiling Zeebe
2020-02-01 bazel 学习一 简单java 项目运行
2020-02-01 一个好用node http keeplive agnet
2019-02-01 Benchmarking Zeebe: An Intro to How Zeebe Scales Horizontally and How We Measure It
2019-02-01 What's New In Zeebe: Scaling Zeebe, New Client APIs, Faster Requests, Timestamps, NodeJS Client, and Default Topic is Back!
2019-02-01 Architecture options to run a workflow engine

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
点击右上角即可分享
微信分享提示