dremio 队列类型判断处理简单说明
使用过dremio 的基本都支持dremio 包含了一个队列的概念,以下简单说明下dremio 对于队列判断的处理
目前定义的队列类型
public enum QueueType {
// TODO figure out split between capacities for below queues
SMALL(30D),
LARGE(30D),
REFLECTION_SMALL(25D),
REFLECTION_LARGE(15D);
private double capacity;
QueueType(double capacity) {
this.capacity = capacity;
}
public double getCapacity() {
return capacity;
}
}
内部判定处理
- 资源分配器对于队列的处理
BasicResourceAllocator 中的方法
public ResourceSchedulingResult allocate(
final ResourceSchedulingContext queryContext,
final ResourceSchedulingProperties resourceSchedulingProperties,
final ResourceSchedulingObserver resourceSchedulingObserver,
final Consumer<ResourceSchedulingDecisionInfo> schedulingDecisionInfoConsumer) {
final ResourceSchedulingDecisionInfo resourceSchedulingDecisionInfo =
new ResourceSchedulingDecisionInfo();
// 此处进行判断
final QueueType queueType =
getQueueNameFromSchedulingProperties(queryContext, resourceSchedulingProperties);
resourceSchedulingDecisionInfo.setQueueName(queueType.name());
resourceSchedulingDecisionInfo.setQueueId(queueType.name());
resourceSchedulingDecisionInfo.setWorkloadClass(
queryContext.getQueryContextInfo().getPriority().getWorkloadClass());
schedulingDecisionInfoConsumer.accept(resourceSchedulingDecisionInfo);
resourceSchedulingObserver.beginQueueWait();
final Pointer<DistributedSemaphore.DistributedLease> lease = new Pointer();
ListenableFuture<ResourceSet> futureAllocation =
executorService.submit(
() -> {
lease.value = acquireQuerySemaphoreIfNecessary(queryContext, queueType);
// update query limit based on the queueType
final OptionManager options = queryContext.getOptions();
final boolean memoryControlEnabled =
options.getOption(BasicResourceConstants.ENABLE_QUEUE_MEMORY_LIMIT);
// TODO REFLECTION_SMALL, REFLECTION_LARGE was not there before - was it a bug???
final long memoryLimit =
(queueType == QueueType.SMALL || queueType == QueueType.REFLECTION_SMALL)
? options.getOption(BasicResourceConstants.SMALL_QUEUE_MEMORY_LIMIT)
: options.getOption(BasicResourceConstants.LARGE_QUEUE_MEMORY_LIMIT);
long queryMaxAllocation = queryContext.getQueryContextInfo().getQueryMaxAllocation();
if (memoryControlEnabled && memoryLimit > 0) {
queryMaxAllocation = Math.min(memoryLimit, queryMaxAllocation);
}
final UserBitShared.QueryId queryId = queryContext.getQueryId();
final long queryMaxAllocationFinal = queryMaxAllocation;
final ResourceSet resourceSet =
new BasicResourceSet(
queryId, lease.value, queryMaxAllocationFinal, queueType.name());
return resourceSet;
});
Futures.addCallback(
futureAllocation,
new FutureCallback<ResourceSet>() {
@Override
public void onSuccess(@Nullable ResourceSet resourceSet) {
// don't need to do anything additional
}
@Override
public void onFailure(Throwable throwable) {
// need to close lease
releaseLease(lease.value);
}
},
executorService);
final ResourceSchedulingResult resourceSchedulingResult =
new ResourceSchedulingResult(resourceSchedulingDecisionInfo, futureAllocation);
return resourceSchedulingResult;
}
- getQueueNameFromSchedulingProperties 处理
protected QueueType getQueueNameFromSchedulingProperties(
final ResourceSchedulingContext queryContext,
final ResourceSchedulingProperties resourceSchedulingProperties) {
final Double cost = resourceSchedulingProperties.getQueryCost();
Preconditions.checkNotNull(cost, "Queue Cost is not provided, Unable to determine " + "queue.");
// 可以看到核心是基于开销判定的
final long queueThreshold =
queryContext.getOptions().getOption(BasicResourceConstants.QUEUE_THRESHOLD_SIZE);
final QueueType queueType;
if (queryContext
.getQueryContextInfo()
.getPriority()
.getWorkloadClass()
.equals(UserBitShared.WorkloadClass.BACKGROUND)) {
// 后台任务类型的是反射,否则是正常的请求,是基于配置指定的大小
queueType = (cost > queueThreshold) ? QueueType.REFLECTION_LARGE : QueueType.REFLECTION_SMALL;
} else {
queueType = (cost > queueThreshold) ? QueueType.LARGE : QueueType.SMALL;
}
return queueType;
}
- resourceSchedulingProperties.getQueryCost() 的处理
此方法实际上去pojo 的数据,数据的赋值是基于physicalPlan 物理计划处理的
ResourceTracker.java 中的方法
void allocate(PhysicalPlan physicalPlan, MaestroObserver observer)
throws ExecutionSetupException, ResourceAllocationException {
// 物理计划获取到的开销
final double planCost = physicalPlan.getCost();
ResourceSchedulingProperties resourceSchedulingProperties = new ResourceSchedulingProperties();
resourceSchedulingProperties.setQueryCost(planCost);
resourceSchedulingProperties.setRoutingQueue(context.getSession().getRoutingQueue());
resourceSchedulingProperties.setRoutingTag(context.getSession().getRoutingTag());
resourceSchedulingProperties.setQueryType(
Utilities.getHumanReadableWorkloadType(context.getWorkloadType()));
resourceSchedulingProperties.setRoutingEngine(context.getSession().getRoutingEngine());
resourceSchedulingProperties.setQueryLabel(context.getSession().getQueryLabel());
- physicalPlan.getCost 处理
此部分实际上就是dremio 的逻辑计划到物理计划操作器生成评估的过程,之后会转换为json 格式的计划任务,直接节点可以获取到相关的信息
public double getCost() {
double totalCost = 0;
for (final PhysicalOperator ops : getSortedOperators()) {
totalCost += ops.getProps().getCost();
}
return totalCost;
}
说明
以上是一个简单的说明,关于物理操作器的开销部分没说明,后边结合实际分析完善下
参考资料
sabot/kernel/src/main/java/com/dremio/exec/maestro/ResourceTracker.java
sabot/kernel/src/main/java/com/dremio/exec/physical/PhysicalPlan.java
services/resourcescheduler/src/main/java/com/dremio/resource/basic/BasicResourceAllocator.java