dremio CatalogMaintenanceService 服务简单说明

说明此服务是从25.0 开始包含的,同时在release note 中也有说明,以下主要说明下内部实现

release 信息

如下,具体就不翻译了,主要是添加了一个每个任务进行每个view最大保留50个历史信息

Added daily catalog maintenance tasks to trim history of views to a maximum of 50 records per view. This limits the storage needed for datasetVersions records in the KV store.

内部处理

是由CatalogMaintenanceService 服务启动的 CatalogMaintenanceRunnableProvider 任务

  • 服务注册

可以看到此服务是在协调节点执行的,也比较符合dremio的套路

private void registerCatalogMaintenanceService(
      SingletonRegistry registry, boolean isCoordinator) {
    if (isCoordinator) {
      registry.bindSelf(
          new CatalogMaintenanceService(
              registry.provider(SchedulerService.class),
              registry.provider(java.util.concurrent.ExecutorService.class),
              new CatalogMaintenanceRunnableProvider(
                      registry.provider(OptionManager.class),
                      registry.provider(KVStoreProvider.class).get())
                  .get(0)));
    }
  }
  • CatalogMaintenanceRunnableProvider 内部处理
CatalogMaintenanceRunnable.builder()
    .setName("TrimVersions")
    .setSchedule(makeDailySchedule(trimVersionsTime))
    .setRunnable(
        () ->
            DatasetVersionTrimmer.trimHistory(
                Clock.systemUTC(),
                storeProvider.getStore(DatasetVersionMutator.VersionStoreCreator.class),
               // 此值是50
                (int) optionManager.getOption(NamespaceOptions.DATASET_VERSIONS_LIMIT),
                minAgeInDays))
    .build());
  • DatasetVersionTrimmer.trimHistory 处理

实际处理,代码注释应该都说明了,可以结合分析下

private void trimHistory(int maxVersionsToKeep, int minAgeInDays) {
    Preconditions.checkArgument(maxVersionsToKeep > 0, "maxVersionsToKeep must be positive");
    Preconditions.checkArgument(minAgeInDays > 0, "minAgeInDays must be positive");
 
    // Assume number of datasets is somewhat small compared to number of versions.
    // First pass: count versions per dataset.
    Map<DatasetPath, Integer> counts = Maps.newHashMap();
    for (Document<DatasetVersionMutator.VersionDatasetKey, VirtualDatasetVersion> entry :
        datasetVersionsStore.find()) {
      counts.compute(entry.getKey().getPath(), (key, count) -> count != null ? count + 1 : 1);
    }
 
    // Collect and order paths with more than requested number of versions.
    ImmutableList<Map.Entry<DatasetPath, Integer>> pathsWithCounts =
        counts.entrySet().stream()
            .sorted(Comparator.comparing(e -> e.getKey().toPathString()))
            .collect(ImmutableList.toImmutableList());
    ImmutableSet<DatasetPath> pathsSet =
        pathsWithCounts.stream()
            .filter(e -> e.getValue() > maxVersionsToKeep)
            .map(Map.Entry::getKey)
            .collect(ImmutableSet.toImmutableSet());
 
    if (!pathsSet.isEmpty()) {
      // Second pass: get versions to delete (past the maxVersionsToKeep) and update (set previous
      // version to null in the last element of the kept history).
      ArrayList<DatasetVersionMutator.VersionDatasetKey> keysToDelete = new ArrayList<>();
      Map<DatasetVersionMutator.VersionDatasetKey, VirtualDatasetVersion> versionsToUpdate =
          Maps.newHashMap();
      DatasetPath startPath = pathsWithCounts.get(0).getKey();
      int versionsInRange = 0;
      for (int index = 0; index < pathsWithCounts.size(); index++) {
        Map.Entry<DatasetPath, Integer> pathAndCount = pathsWithCounts.get(index);
        DatasetPath endPath = pathAndCount.getKey();
        versionsInRange += pathAndCount.getValue();
        if (versionsInRange < MAX_VERSIONS_IN_RANGE && index + 1 < pathsWithCounts.size()) {
          continue;
        }
 
        // Collect versions to trim/update in the range.
        logger.info("Collecting records to trim, batch: s: {} e: {}", startPath, endPath);
        keysToDelete.clear();
        versionsToUpdate.clear();
        findVersionKeysToTrim(
            startPath,
            endPath,
            pathsSet,
            maxVersionsToKeep,
            minAgeInDays,
            keysToDelete,
            versionsToUpdate);
 
        // Update versions first, for any partial updates due to errors/conflicts etc, next run will
        // fix it.
        logger.info("Updating batch of {} older dataset versions", versionsToUpdate.size());
        for (Map.Entry<DatasetVersionMutator.VersionDatasetKey, VirtualDatasetVersion> entry :
            versionsToUpdate.entrySet()) {
          datasetVersionsStore.put(entry.getKey(), entry.getValue());
        }
        for (List<DatasetVersionMutator.VersionDatasetKey> keysRange :
            Lists.partition(keysToDelete, MAX_VERSIONS_TO_DELETE)) {
          logger.info("Deleting batch of {} older dataset versions", keysRange.size());
          datasetVersionsStore.bulkDelete(keysRange);
        }
 
        // Reset range.
        startPath = endPath;
        versionsInRange = 0;
      }
    }
  }

说明

CatalogMaintenanceService 是新添加的服务模块,对于release note 的信息集合源码看会更加清晰

参考资料

dac/backend/src/main/java/com/dremio/dac/service/catalog/CatalogMaintenanceRunnableProvider.java
services/catalog/src/main/java/com/dremio/catalog/CatalogMaintenanceService.java
dac/backend/src/main/java/com/dremio/dac/service/datasets/DatasetVersionMutator.java
dac/backend/src/main/java/com/dremio/dac/service/datasets/DatasetVersionTrimmer.java
dac/backend/src/main/java/com/dremio/dac/daemon/DACDaemonModule.java

posted on   荣锋亮  阅读(10)  评论(0编辑  收藏  举报

相关博文:
阅读排行:
· 全程不用写代码,我用AI程序员写了一个飞机大战
· DeepSeek 开源周回顾「GitHub 热点速览」
· 记一次.NET内存居高不下排查解决与启示
· MongoDB 8.0这个新功能碉堡了,比商业数据库还牛
· .NET10 - 预览版1新功能体验(一)
历史上的今天:
2022-05-10 citus vs greenplum 对比参考
2022-05-10 使用citus 列式存储压缩数据
2022-05-10 citus 以及timescaledb对于时许数据存储的处理
2021-05-10 cubestore 配置
2019-05-10 dinoql 使用nodejs 运行的几个问题
2019-05-10 dinoql 试用
2019-05-10 dinoql 使用graphql 语法查询javascript objects

导航

< 2025年3月 >
23 24 25 26 27 28 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31 1 2 3 4 5
点击右上角即可分享
微信分享提示