大叔经验分享(21)yarn中查看每个应用实时占用的内存和cpu资源

在yarn中的application详情页面

http://resourcemanager/cluster/app/$applicationId

或者通过application命令

yarn application -status $applicationId

只能看到应用启动以来占用的资源*时间统计,比如:

Aggregate Resource Allocation : 3962853 MB-seconds, 1466 vcore-seconds

到处都找不到这个应用当前实时的资源占用情况,比如当前占用了多少内存多少核,跟进yarn代码发现其实是有这个统计的:

org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport

复制代码
  public static ApplicationResourceUsageReport newInstance(
      int numUsedContainers, int numReservedContainers, Resource usedResources,
      Resource reservedResources, Resource neededResources, long memorySeconds,
      long vcoreSeconds) {
    ApplicationResourceUsageReport report =
        Records.newRecord(ApplicationResourceUsageReport.class);
    report.setNumUsedContainers(numUsedContainers);
    report.setNumReservedContainers(numReservedContainers);
    report.setUsedResources(usedResources);
    report.setReservedResources(reservedResources);
    report.setNeededResources(neededResources);
    report.setMemorySeconds(memorySeconds);
    report.setVcoreSeconds(vcoreSeconds);
    return report;
  }
复制代码

其中usedResources就是当前的实时占用资源情况,包括内存和cpu,这个统计是在YarnScheduler的接口中返回:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.YarnScheduler

复制代码
  /**
   * Get a resource usage report from a given app attempt ID.
   * @param appAttemptId the id of the application attempt
   * @return resource usage report for this given attempt
   */
  @LimitedPrivate("yarn")
  @Evolving
  ApplicationResourceUsageReport getAppResourceUsageReport(
      ApplicationAttemptId appAttemptId);
复制代码

getAppResourceUsageReport方法被RMAppAttemptImpl.getApplicationResourceUsageReport调用:

org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl

复制代码
  @Override
  public ApplicationResourceUsageReport getApplicationResourceUsageReport() {
    this.readLock.lock();
    try {
      ApplicationResourceUsageReport report =
          scheduler.getAppResourceUsageReport(this.getAppAttemptId());
      if (report == null) {
        report = RMServerUtils.DUMMY_APPLICATION_RESOURCE_USAGE_REPORT;
      }
      AggregateAppResourceUsage resUsage =
          this.attemptMetrics.getAggregateAppResourceUsage();
      report.setMemorySeconds(resUsage.getMemorySeconds());
      report.setVcoreSeconds(resUsage.getVcoreSeconds());
      return report;
    } finally {
      this.readLock.unlock();
    }
  }
复制代码

RMAppAttemptImpl.getApplicationResourceUsageReport被两个地方调用:

第一个调用

org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl

  public ApplicationReport createAndGetApplicationReport(String clientUserName,
      boolean allowAccess) {
...
          appUsageReport = currentAttempt.getApplicationResourceUsageReport();
...

RMAppImpl.createAndGetApplicationReport会被ClientRMService.getApplications和ClientRMService.getApplicationReport调用,这两个方法分别对应命令

yarn application -list
yarn application -status $applicationId

这两个地方展示信息的时候都没展示usedResources,可能作者觉得这个实时资源占用统计没那么重要。

详见:
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService

第二个调用

org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo

复制代码
  public AppInfo(RMApp app, Boolean hasAccess, String schemePrefix) {
...
          ApplicationResourceUsageReport resourceReport = attempt
              .getApplicationResourceUsageReport();
          if (resourceReport != null) {
            Resource usedResources = resourceReport.getUsedResources();
            allocatedMB = usedResources.getMemory();
            allocatedVCores = usedResources.getVirtualCores();
            runningContainers = resourceReport.getNumUsedContainers();
          }
...
复制代码

这个构造函数会在RMWebServices.getApp和RMWebServices.getApps时被调用,这是个service接口,对应url分别为:

http://resourcemanager/ws/v1/cluster/apps/$applicationId
http://resourcemanager/ws/v1/cluster/apps?state=RUNNING

这两个接口的返回值中有实时资源占用情况如下:

<allocatedMB>56320</allocatedMB>
<allocatedVCores>21</allocatedVCores>

分别对应实时内存占用和实时CPU占用;

详见:
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices

 

如果你发现spark应用内存的占用比你分配的要多,可以参考这里:https://www.cnblogs.com/barneywill/p/10102353.html

 

posted @   匠人先生  阅读(14484)  评论(1编辑  收藏  举报
编辑推荐:
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
阅读排行:
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人
点击右上角即可分享
微信分享提示