dremio node节点统计信息显示问题简单说明

可能有人注意到dremio 管理界面的统计信息为N/A ,而且因为web 端进行了一些额外显示处理,造成一些疑惑

界面效果


接口返回数据信息

 

目前从官方代码来说,此显示是正常的,因为计算的是百分比,而且当系统负载比较低的时候,结果基本就为0

web 处理

NodeActivityView.js

  • 参考代码
 [port]: {
          node: () => (node.get("port") !== -1 ? node.get("port") : "N/A"),
        },
        [cpu]: {
          node: () =>
            node.get("cpu") !== 0
              ? `${NumberFormatUtils.roundNumberField(node.get("cpu"))}%`
              : "N/A",
        },
        [memory]: {
          node: () =>
            node.get("memory") !== 0
              ? `${NumberFormatUtils.roundNumberField(node.get("memory"))}%`
              : "N/A", // todo: check comps for digits. and fix so no need for parseFloat
        },
        [version]: {
          node: () => node.get("version") || "-",
        },

后端处理代码

  • api 接口 (SystemResource )
@GET
  @Path("/nodes")
  @Produces(MediaType.APPLICATION_JSON)
  public List<NodeInfo> getNodes(){
    final List<NodeInfo> result = new ArrayList<>();
    final Map<String, NodeEndpoint> execMap = new HashMap<>();
    final Map<String, NodeEndpoint> coordMap = new HashMap<>();
 
    // first get the coordinator nodes (in case there are no executors running)
    for(NodeEndpoint ep : context.get().getCoordinators()){
      coordMap.put(ep.getAddress() + ":" + ep.getFabricPort(), ep);
    }
 
    // try to get any executor nodes, but don't throw a UserException if we can't find any
    try {
      NodeStatsListener nodeStatsListener = new NodeStatsListener(context.get().getExecutors().size());
      context.get().getExecutors().forEach(
        ep -> {
         // 调用executorServiceClient 进行rpc 调用处理
          executorServiceClientFactoryProvider.get().getClientForEndpoint(ep).getNodeStats(Empty.newBuilder().build(),
                  nodeStatsListener);
        }
      );
 
      try {
        nodeStatsListener.waitForFinish();
      } catch (Exception ex) {
        logger.warn("Error while collecting node statistics: {}", ex.getMessage());
      }
 
      ConcurrentHashMap<String, NodeInstance> nodeStats = nodeStatsListener.getResult();
 
      for (NodeEndpoint ep : context.get().getExecutors()) {
        execMap.put(ep.getAddress() + ":" + ep.getFabricPort(), ep);
      }
 
      for (Map.Entry<String, NodeInstance> statsEntry : nodeStats.entrySet()) {
        NodeInstance stat = statsEntry.getValue();
        NodeEndpoint ep = execMap.remove(statsEntry.getKey());
        coordMap.remove(statsEntry.getKey());
        if (ep == null) {
          logger.warn("Unable to find node with identity: {}", statsEntry.getKey());
          continue;
        }
        result.add(NodeInfo.fromNodeInstance(stat));
      }
    } catch (UserException e) {
      logger.warn(e.getMessage());
    }
 
    final List<NodeInfo> finalList = new ArrayList<>();
    final List<NodeInfo> coord = new ArrayList<>();
    for (NodeEndpoint ep : coordMap.values()){
     // response 数据转换
      final NodeInfo nodeInfo = NodeInfo.fromEndpoint(ep);
      if (nodeInfo.getIsMaster()) {
        finalList.add(nodeInfo);
      } else {
        coord.add(nodeInfo);
      }
    }
 
    final List<NodeInfo> failedNodes = new ArrayList<>();
    for (NodeEndpoint ep : execMap.values()){
      final NodeInfo nodeInfo = NodeInfo.fromUnresponsiveEndpoint(ep);
      failedNodes.add(nodeInfo);
    }
 
    // put coordinators first.
    finalList.addAll(coord);
    finalList.addAll(result);
    finalList.addAll(failedNodes);
 
    return finalList;
  }

ExecutorServiceImpl 类 (executorService server 实现)

  • 参考代码
public static CoordExecRPC.NodeStats getNodeStatsFromContext(SabotContext context) {
    final ThreadsIterator threads = new ThreadsIterator(context, null);
    final MemoryIterator memoryIterator = new MemoryIterator(context, null);
    final WorkStats stats = context.getWorkStatsProvider().get();
    final CoordinationProtos.NodeEndpoint ep = context.getEndpoint();
    final double load = stats.getClusterLoad();
    final int configuredMaxWidth = (int) context.getClusterResourceInformation().getAverageExecutorCores(context.getOptionManager());
    final int actualMaxWidth = (int) Math.max(1, configuredMaxWidth * stats.getMaxWidthFactor());
   // 默认为0,计算的是百分比
    double memory = 0;
    double cpu = 0;
 
    // get cpu
    while(threads.hasNext()) {
      ThreadsIterator.ThreadSummary summary = (ThreadsIterator.ThreadSummary) threads.next();
      double cpuTime = summary.cpu_time == null ? 0 : summary.cpu_time;
      double numCores = summary.cores;
      cpu += (cpuTime / numCores);
    }
 
    // get memory
    if(memoryIterator.hasNext()) {
      MemoryIterator.MemoryInfo memoryInfo = ((MemoryIterator.MemoryInfo) memoryIterator.next());
      memory = memoryInfo.direct_current * 100.0 / memoryInfo.direct_max;
    }
 
    String ip =  null;
    try {
      ip = InetAddress.getLocalHost().getHostAddress();
    } catch (UnknownHostException e) {
      // no op
    }
    return CoordExecRPC.NodeStats.newBuilder()
            .setCpu(cpu)
            .setMemory(memory)
            .setVersion(DremioVersionInfo.getVersion())
            .setPort(ep.getFabricPort())
            .setName(ep.getAddress())
            .setIp(ip)
            .setStatus("green")
            .setLoad(load)
            .setConfiguredMaxWidth(configuredMaxWidth)
            .setActualMaxWith(actualMaxWidth)
            .setCurrent(false)
            .build();
  }
  • 接口port 为-1 的问题
    如下,因为userpoprt 就是为-1(使用随机端口),fabricport 是确定的

 


接口返回数据的处理
Nodes 类

// 获取是是userport 就是-1,所以界面显示就就是N/A 了
public static NodeInfo fromEndpoint(CoordinationProtos.NodeEndpoint endpoint) {
      final boolean master = endpoint.getRoles().getMaster();
      final boolean coord = endpoint.getRoles().getSqlQuery();
      final boolean exec = endpoint.getRoles().getJavaExecutor();
      boolean isCompatible = isCompatibleVersion(endpoint.getDremioVersion());
      return new NodeInfo(
        endpoint.getAddress(),
        endpoint.getAddress(),
        endpoint.getAddress(),
        endpoint.getUserPort(),
        0d,
        0d,
        "green",
        master,
        coord,
        exec,
        isCompatible,
        endpoint.getNodeTag(),
        endpoint.getDremioVersion(),
        endpoint.getStartTime(),
        isCompatible ? NodeDetails.NONE.toMessage(null) : NodeDetails.INVALID_VERSION.toMessage(endpoint.getDremioVersion())
      );
}

说明

dremio 对于一些信息缺少说明,结合源码查看是比较好的

参考资料

dac/ui/src/pages/AdminPage/subpages/NodeActivity/NodeActivityView.js
dac/backend/src/main/java/com/dremio/dac/resource/SystemResource.java
services/executorservice/src/main/java/com/dremio/service/executor/ExecutorServiceClient.java
sabot/kernel/src/main/java/com/dremio/exec/service/executor/ExecutorServiceProductClient.java
sabot/kernel/src/main/java/com/dremio/sabot/rpc/CoordExecService.java
sabot/kernel/src/main/java/com/dremio/exec/service/executor/ExecutorServiceImpl.java
sabot/kernel/src/main/java/com/dremio/exec/store/sys/ThreadsIterator.java
sabot/kernel/src/main/java/com/dremio/exec/store/sys/MemoryIterator.java
common/legacy/src/main/java/com/dremio/common/VM.java
sabot/kernel/src/main/java/com/dremio/exec/server/ContextService.java
dac/backend/src/main/java/com/dremio/dac/model/system/Nodes.java

posted on 2024-02-19 18:05  荣锋亮  阅读(14)  评论(0编辑  收藏  举报

导航