dremio CatalogMaintenanceService 服务简单说明

说明此服务是从25.0 开始包含的,同时在release note 中也有说明,以下主要说明下内部实现

release 信息

如下,具体就不翻译了,主要是添加了一个每个任务进行每个view最大保留50个历史信息

Added daily catalog maintenance tasks to trim history of views to a maximum of 50 records per view. This limits the storage needed for datasetVersions records in the KV store.

内部处理

是由CatalogMaintenanceService 服务启动的 CatalogMaintenanceRunnableProvider 任务

  • 服务注册

可以看到此服务是在协调节点执行的,也比较符合dremio的套路

private void registerCatalogMaintenanceService(
      SingletonRegistry registry, boolean isCoordinator) {
    if (isCoordinator) {
      registry.bindSelf(
          new CatalogMaintenanceService(
              registry.provider(SchedulerService.class),
              registry.provider(java.util.concurrent.ExecutorService.class),
              new CatalogMaintenanceRunnableProvider(
                      registry.provider(OptionManager.class),
                      registry.provider(KVStoreProvider.class).get())
                  .get(0)));
    }
  }
  • CatalogMaintenanceRunnableProvider 内部处理
CatalogMaintenanceRunnable.builder()
    .setName("TrimVersions")
    .setSchedule(makeDailySchedule(trimVersionsTime))
    .setRunnable(
        () ->
            DatasetVersionTrimmer.trimHistory(
                Clock.systemUTC(),
                storeProvider.getStore(DatasetVersionMutator.VersionStoreCreator.class),
               // 此值是50
                (int) optionManager.getOption(NamespaceOptions.DATASET_VERSIONS_LIMIT),
                minAgeInDays))
    .build());
  • DatasetVersionTrimmer.trimHistory 处理

实际处理,代码注释应该都说明了,可以结合分析下

private void trimHistory(int maxVersionsToKeep, int minAgeInDays) {
    Preconditions.checkArgument(maxVersionsToKeep > 0, "maxVersionsToKeep must be positive");
    Preconditions.checkArgument(minAgeInDays > 0, "minAgeInDays must be positive");
 
    // Assume number of datasets is somewhat small compared to number of versions.
    // First pass: count versions per dataset.
    Map<DatasetPath, Integer> counts = Maps.newHashMap();
    for (Document<DatasetVersionMutator.VersionDatasetKey, VirtualDatasetVersion> entry :
        datasetVersionsStore.find()) {
      counts.compute(entry.getKey().getPath(), (key, count) -> count != null ? count + 1 : 1);
    }
 
    // Collect and order paths with more than requested number of versions.
    ImmutableList<Map.Entry<DatasetPath, Integer>> pathsWithCounts =
        counts.entrySet().stream()
            .sorted(Comparator.comparing(e -> e.getKey().toPathString()))
            .collect(ImmutableList.toImmutableList());
    ImmutableSet<DatasetPath> pathsSet =
        pathsWithCounts.stream()
            .filter(e -> e.getValue() > maxVersionsToKeep)
            .map(Map.Entry::getKey)
            .collect(ImmutableSet.toImmutableSet());
 
    if (!pathsSet.isEmpty()) {
      // Second pass: get versions to delete (past the maxVersionsToKeep) and update (set previous
      // version to null in the last element of the kept history).
      ArrayList<DatasetVersionMutator.VersionDatasetKey> keysToDelete = new ArrayList<>();
      Map<DatasetVersionMutator.VersionDatasetKey, VirtualDatasetVersion> versionsToUpdate =
          Maps.newHashMap();
      DatasetPath startPath = pathsWithCounts.get(0).getKey();
      int versionsInRange = 0;
      for (int index = 0; index < pathsWithCounts.size(); index++) {
        Map.Entry<DatasetPath, Integer> pathAndCount = pathsWithCounts.get(index);
        DatasetPath endPath = pathAndCount.getKey();
        versionsInRange += pathAndCount.getValue();
        if (versionsInRange < MAX_VERSIONS_IN_RANGE && index + 1 < pathsWithCounts.size()) {
          continue;
        }
 
        // Collect versions to trim/update in the range.
        logger.info("Collecting records to trim, batch: s: {} e: {}", startPath, endPath);
        keysToDelete.clear();
        versionsToUpdate.clear();
        findVersionKeysToTrim(
            startPath,
            endPath,
            pathsSet,
            maxVersionsToKeep,
            minAgeInDays,
            keysToDelete,
            versionsToUpdate);
 
        // Update versions first, for any partial updates due to errors/conflicts etc, next run will
        // fix it.
        logger.info("Updating batch of {} older dataset versions", versionsToUpdate.size());
        for (Map.Entry<DatasetVersionMutator.VersionDatasetKey, VirtualDatasetVersion> entry :
            versionsToUpdate.entrySet()) {
          datasetVersionsStore.put(entry.getKey(), entry.getValue());
        }
        for (List<DatasetVersionMutator.VersionDatasetKey> keysRange :
            Lists.partition(keysToDelete, MAX_VERSIONS_TO_DELETE)) {
          logger.info("Deleting batch of {} older dataset versions", keysRange.size());
          datasetVersionsStore.bulkDelete(keysRange);
        }
 
        // Reset range.
        startPath = endPath;
        versionsInRange = 0;
      }
    }
  }

说明

CatalogMaintenanceService 是新添加的服务模块,对于release note 的信息集合源码看会更加清晰

参考资料

dac/backend/src/main/java/com/dremio/dac/service/catalog/CatalogMaintenanceRunnableProvider.java
services/catalog/src/main/java/com/dremio/catalog/CatalogMaintenanceService.java
dac/backend/src/main/java/com/dremio/dac/service/datasets/DatasetVersionMutator.java
dac/backend/src/main/java/com/dremio/dac/service/datasets/DatasetVersionTrimmer.java
dac/backend/src/main/java/com/dremio/dac/daemon/DACDaemonModule.java

posted on 2024-05-10 08:00  荣锋亮  阅读(5)  评论(0编辑  收藏  举报

导航