大叔经验分享(21)yarn中查看每个应用实时占用的内存和cpu资源
在yarn中的application详情页面
http://resourcemanager/cluster/app/$applicationId
或者通过application命令
yarn application -status $applicationId
只能看到应用启动以来占用的资源*时间统计,比如:
Aggregate Resource Allocation : 3962853 MB-seconds, 1466 vcore-seconds
到处都找不到这个应用当前实时的资源占用情况,比如当前占用了多少内存多少核,跟进yarn代码发现其实是有这个统计的:
org.apache.hadoop.yarn.api.records.ApplicationResourceUsageReport
public static ApplicationResourceUsageReport newInstance( int numUsedContainers, int numReservedContainers, Resource usedResources, Resource reservedResources, Resource neededResources, long memorySeconds, long vcoreSeconds) { ApplicationResourceUsageReport report = Records.newRecord(ApplicationResourceUsageReport.class); report.setNumUsedContainers(numUsedContainers); report.setNumReservedContainers(numReservedContainers); report.setUsedResources(usedResources); report.setReservedResources(reservedResources); report.setNeededResources(neededResources); report.setMemorySeconds(memorySeconds); report.setVcoreSeconds(vcoreSeconds); return report; }
其中usedResources就是当前的实时占用资源情况,包括内存和cpu,这个统计是在YarnScheduler的接口中返回:
org.apache.hadoop.yarn.server.resourcemanager.scheduler.YarnScheduler
/** * Get a resource usage report from a given app attempt ID. * @param appAttemptId the id of the application attempt * @return resource usage report for this given attempt */ @LimitedPrivate("yarn") @Evolving ApplicationResourceUsageReport getAppResourceUsageReport( ApplicationAttemptId appAttemptId);
getAppResourceUsageReport方法被RMAppAttemptImpl.getApplicationResourceUsageReport调用:
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
@Override public ApplicationResourceUsageReport getApplicationResourceUsageReport() { this.readLock.lock(); try { ApplicationResourceUsageReport report = scheduler.getAppResourceUsageReport(this.getAppAttemptId()); if (report == null) { report = RMServerUtils.DUMMY_APPLICATION_RESOURCE_USAGE_REPORT; } AggregateAppResourceUsage resUsage = this.attemptMetrics.getAggregateAppResourceUsage(); report.setMemorySeconds(resUsage.getMemorySeconds()); report.setVcoreSeconds(resUsage.getVcoreSeconds()); return report; } finally { this.readLock.unlock(); } }
RMAppAttemptImpl.getApplicationResourceUsageReport被两个地方调用:
第一个调用
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl
public ApplicationReport createAndGetApplicationReport(String clientUserName, boolean allowAccess) { ... appUsageReport = currentAttempt.getApplicationResourceUsageReport(); ...
RMAppImpl.createAndGetApplicationReport会被ClientRMService.getApplications和ClientRMService.getApplicationReport调用,这两个方法分别对应命令
yarn application -list
yarn application -status $applicationId
这两个地方展示信息的时候都没展示usedResources,可能作者觉得这个实时资源占用统计没那么重要。
详见:
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService
第二个调用
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.AppInfo
public AppInfo(RMApp app, Boolean hasAccess, String schemePrefix) { ... ApplicationResourceUsageReport resourceReport = attempt .getApplicationResourceUsageReport(); if (resourceReport != null) { Resource usedResources = resourceReport.getUsedResources(); allocatedMB = usedResources.getMemory(); allocatedVCores = usedResources.getVirtualCores(); runningContainers = resourceReport.getNumUsedContainers(); } ...
这个构造函数会在RMWebServices.getApp和RMWebServices.getApps时被调用,这是个service接口,对应url分别为:
http://resourcemanager/ws/v1/cluster/apps/$applicationId
http://resourcemanager/ws/v1/cluster/apps?state=RUNNING
这两个接口的返回值中有实时资源占用情况如下:
<allocatedMB>56320</allocatedMB>
<allocatedVCores>21</allocatedVCores>
分别对应实时内存占用和实时CPU占用;
详见:
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices
如果你发现spark应用内存的占用比你分配的要多,可以参考这里:https://www.cnblogs.com/barneywill/p/10102353.html
---------------------------------------------------------------- 结束啦,我是大魔王先生的分割线 :) ----------------------------------------------------------------
- 由于大魔王先生能力有限,文中可能存在错误,欢迎指正、补充!
- 感谢您的阅读,如果文章对您有用,那么请为大魔王先生轻轻点个赞,ありがとう
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列:向量存储与相似性搜索在 .NET 中的实现
· 基于Microsoft.Extensions.AI核心库实现RAG应用
· Linux系列:如何用heaptrack跟踪.NET程序的非托管内存泄露
· 开发者必知的日志记录最佳实践
· SQL Server 2025 AI相关能力初探
· winform 绘制太阳,地球,月球 运作规律
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· AI与.NET技术实操系列(五):向量存储与相似性搜索在 .NET 中的实现
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人