Hadoop Exporter开源项目开发指南
Hadoop Exporter开源项目
该项目最后一次更新为2018年。其主要就是监控集群中的各个组件的JMX端口。而开源的集群大多数也都是通过JMX开放自己的重要监控数据。例如:HDFS、YARN等。
总体来说,项目是不错的,如果我们自己去逐个组件开发支持Prometheus,会耗用我们大量时间。所以,在完成Hadoop集群监控的对接后,考虑到将来项目的升级、扩展。我Fork了该项目,后续对项目持续维护,后续还会提供独立的版本号。
项目基本说明
可以说Hadoop Exporter是一个ETL项目。负责将JMX的JSON数据转换为维度模型。我把当前架构理解为如下:
当前V1.0版本,ETL程序需要在每个节点上安装,而不能做统一采集,这样当有集群扩展或变更时,不利于维护。d但好处是,各个ETL程序彼此独立,互不干涉。
开发语言及依赖库
项 | 说明 |
---|---|
开发语言 | Python |
开发语言版本 | 2.7.x |
依赖库 | request、prometheus_client、python-consul、pyyaml |
项目架构
处理基础流程
- 解析命令行参数
- 启动Prometheus对外暴露指标的HTTP服务器
- 向集群JMX配置中心发送HTTP请求,通过本机的hostname拿到JMX配置信息
- 根据配置中心中配置的不同进程对应的JMX URL,创建对应的Collector实例
- 将Collector注册到Prometheus客户端
- 当服务器端请求拉取JMX指标时,会自动调用对应Collector的collect方法
- 获取JMX端口的MBean,在根据不同的进程处理标签、再处理指标。
- 用通用指标更新指定实例指标
项目源码结构
项 | 说明 |
---|---|
入口 | hadoop_exporter.py |
cmd | 处理各类JMX指标核心代码 |
config | 存放配置 |
test | 存放测试数据 |
开发规范[持续完善]
类名
- 类名以大写驼峰式命名
- 单类单文件
方法
- 公开方法以小写字母开头,用下划线分隔单词
- 公开方法需要使用文档注释
- 私有方法以下划线开头
- 初始化方法统一放在
__init__
中
应用开发
- 每增加一类组件,需要新增一个Collector实现
- 每增加一类组件,需要创建一个目录,并基于MBean对指标进行分类
- 在核心流程上的节点必须添加logger.info,为了方便日后排错,多提供logger.debug信息
开发指南
JMX指标结构
所有的JMX指标封装在一个名为beans的数组中。
- name为MBean的名称
- 其他的均为对应监控信息。
{
"beans" : [ {
"name" : "Hadoop:service=NameNode,name=JvmMetrics",
"modelerType" : "JvmMetrics",
"tag.Context" : "jvm",
"tag.ProcessName" : "NameNode",
"tag.SessionId" : null,
"tag.Hostname" : "ha-node1",
"MemNonHeapUsedM" : 69.99187,
"MemNonHeapCommittedM" : 71.84375,
"MemNonHeapMaxM" : -1.0,
"MemHeapUsedM" : 189.16359,
"MemHeapCommittedM" : 323.5,
"MemHeapMaxM" : 1287.5,
"MemMaxM" : 1287.5,
"GcCount" : 22,
"GcTimeMillis" : 3708,
"GcNumWarnThresholdExceeded" : 0,
"GcNumInfoThresholdExceeded" : 1,
"GcTotalExtraSleepTime" : 10874,
"ThreadsNew" : 0,
"ThreadsRunnable" : 6,
"ThreadsBlocked" : 0,
"ThreadsWaiting" : 11,
"ThreadsTimedWaiting" : 39,
"ThreadsTerminated" : 0,
"LogFatal" : 0,
"LogError" : 0,
"LogWarn" : 11,
"LogInfo" : 3922
},
...
]
因为Prometheus中采用的是维度模型,所以,需要将JMX中的数据转换为维度模型。例如:字符串类型为维度、数值类型为指标。而有哪些指标、指标对应需要有什么标签,需要我们自己来定义、实现。
目标维度指标结构
上述数据为扁平化的事实指标数据,要转换为维度结构。首先需要确定哪些是指标数据哪些是维度数据,简单点来理解就是。这个部分针对不同的MBean都是不一样的。所以需要在应用中实现转换。例如:针对NameNode、DataNode。还有一些指标是不能进行分类的,就可以直接将指标,以及使用预定义维度,例如:cluster、instance等。
目标维度指标结构表示Prometheus能够识别的指标数据。在项目中,它存储为一个二维字典结构。例如:
_hadoop_namenode_metrics[指标分类(通常是MBean的名称)][指标名称]
MetricCol类
以hdfs_namenode.py为例。
创建MetricsCollector类并初始化
class MetricCol(object):
'''
MetricCol是所有MetricsCollector的超类,它构建了写通用的参数,例如:cluster、url、component、service等。
'''
def __init__(self, cluster, url, component, service):
'''
@param cluster: 集群名称, 在配置文件配置或者通过命令行设置.
@param url: 每个组件暴露指标的URL。例如:通过http://ip:9870/jmx可以获取hdfs集群的指标。
而通过http://ip:8088/jmx可以获取ResourceManager的指标。
@param component: 组件名称. 例如:"hdfs", "resourcemanager", "mapreduce", "hive", "hbase".
@param service: 服务名称. 例如:"namenode", "resourcemanager", "mapreduce".
'''
self._cluster = cluster
# 删除末尾的/
self._url = url.rstrip('/')
self._component = component
# 指标前缀, 以 hadoop_组件名_服务名 命名
self._prefix = 'hadoop_{0}_{1}'.format(component, service)
# 获取以服务名命名的所有JSON文件列表,例如:namenode,会将namenode中的所有文件夹中的json文件加载
# 获取到的是文件名
self._file_list = utils.get_file_list(service)
# 获取common目录中的所有json文件
self._common_file = utils.get_file_list("common")
# 整合所有json文件
self._merge_list = self._file_list + self._common_file
# 用于保存指标对象
self._metrics = {}
for i in range(len(self._file_list)):
# 设置文件名,并读取对应的指标配置文件(JSON文件)
self._metrics.setdefault(self._file_list[i], utils.read_json_file(service, self._file_list[i]))
JSON指标配置文件语法格式如下:
{
"指标名": "指标描述."
}
例如:
{
"MemNonHeapUsedM": "Current non-heap memory used in MB.",
"MemNonHeapCommittedM": "Current non-heap memory committed in MB.",
"MemNonHeapMaxM": "Max non-heap memory size in MB.",
"MemHeapUsedM": "Current heap memory used in MB.",
"MemHeapCommittedM": "Current heap memory committed in MB.",
"MemHeapMaxM": "Max heap memory size in MB.",
"MemMaxM": "Max memory size in MB.",
"ThreadsNew": "Current number of NEW threads.",
"ThreadsRunnable": "Current number of RUNNABLE threads.",
"ThreadsBlocked": "Current number of BLOCKED threads.",
"ThreadsWaiting": "Current number of WAITING threads.",
"ThreadsTimedWaiting": "Current number of TIMED_WAITING threads.",
"ThreadsTerminated": "Current number of TERMINATED threads.",
"GcCount": "Total number of Gc count",
"GcTimeMillis": "Total GC time in msec.",
"GcCountParNew": "ParNew GC count.",
"GcTimeMillisParNew": "ParNew GC time in msec.",
"GcCountConcurrentMarkSweep": "ConcurrentMarkSweep GC count.",
"GcTimeMillisConcurrentMarkSweep": "ConcurrentMarkSweep GC time in msec.",
"GcNumWarnThresholdExceeded": "Number of times that the GC warn threshold is exceeded.",
"GcNumInfoThresholdExceeded": "Number of times that the GC info threshold is exceeded.",
"GcTotalExtraSleepTime": "Total GC extra sleep time in msec.",
"LogFatal": "Total number of FATAL logs.",
"LogError": "Total number of ERROR logs.",
"LogWarn": "Total number of WARN logs.",
"LogInfo": "Total number of INFO logs."
}
创建NameNodeMetricsCollector类并初始化
class NameNodeMetricCollector(MetricCol):
def __init__(self, cluster, url):
# 手动调用父类初始化,传入cluster名称、jmx url、组件名称、服务名称
# 注意:服务名称应与JSON配置的文件夹名称保持一致
MetricCol.__init__(self, cluster, url, "hdfs", "namenode")
self._hadoop_namenode_metrics = {}
for i in range(len(self._file_list)):
# 读取JSON配置文件,设置每个导出指标对象
self._hadoop_namenode_metrics.setdefault(self._file_list[i], {})
注意:
- 所有想要导出的指标都应该放在对应服务名称目录下的JSON文件中
- 组件名称为指标的前缀:hadoop_hdfs_
实现collect
collect方法为针对组件进行指标数据转换的核心实现。它的实现主要分为以下几步:
- 请求JMX
- 设置Label
- 设置Metric
def collect(self):
# 发送HTTP请求从JMX URL中获取指标数据。
# 获取JMX中对应bean JSON数组。
try:
# 发起HTTP请求JMX JSON数据
beans = utils.get_metrics(self._url)
except:
logger.info("Can't scrape metrics from url: {0}".format(self._url))
pass
else:
# 设置监控需要关注的每个MBean,并设置好指标对应的标签以及描述
self._setup_metrics_labels(beans)
# 设置每个指标值
self._get_metrics(beans)
# 将通用的指标更新到NameNode对应的指标中
common_metrics = common_metrics_info(self._cluster, beans, "hdfs", "namenode")
self._hadoop_namenode_metrics.update(common_metrics())
# 遍历每一个指标分类(包含NameNode以及Common的指标分类)
# 返回每一个指标和标签
for i in range(len(self._merge_list)):
service = self._merge_list[i]
for metric in self._hadoop_namenode_metrics[service]:
yield self._hadoop_namenode_metrics[service][metric]
_setup_metrics_labels实现
_setup_metrics_labels
表示从MBean中加载标签。
def _setup_metrics_labels(self, beans):
# The metrics we want to export.
# 遍历每一个MBean,设置需要关注的Label
for i in range(len(beans)):
# 只处理指定的MBean
if 'NameNodeActivity' in beans[i]['name']:
self._setup_nnactivity_labels()
if 'StartupProgress' in beans[i]['name']:
self._setup_startupprogress_labels()
if 'FSNamesystem' in beans[i]['name']:
self._setup_fsnamesystem_labels()
if 'FSNamesystemState' in beans[i]['name']:
self._setup_fsnamesystem_state_labels()
if 'RetryCache' in beans[i]['name']:
self._setup_retrycache_labels()
处理Label实现
下面的代码表示对JSON文件中配置的指标进行遍历,并对匹配到的指标进行处理。下面的实现逻辑为以下:
-
遍历每一个NameNode中的NameNodeActivity.json中定义的指标
-
定义Label(此处只有cluster、method)
-
因为NameNodeActivity对应的指标有以下几类:
- 以NumOps结尾的
- 以AvgTime结尾的
- 其他(其他操作都放入到Operation中)
程序对这些数据进行维度抽象处理。
-
设置指标以及维度标签
def _setup_nnactivity_labels(self):
# 记录是否已处理(1表示需要处理,0表示无需处理)
num_namenode_flag,avg_namenode_flag,ops_namenode_flag = 1,1,1
# 遍历NameNodeActivity MBean对应的指标
for metric in self._metrics['NameNodeActivity']:
# 对指标名称进行处理(驼峰式转下划线分隔名称)
snake_case = re.sub('([a-z0-9])([A-Z])', r'\1_\2', metric).lower()
# 提前定义预先的label
label = ["cluster", "method"]
# 按照MBean中的后缀做分类,生成指标名称、以及对应的label
# 例如:hadoop_hdfs_namenode_nnactivity_method_avg_time_milliseconds{cluster="hadoop-ha",method="BlockReport"}
if "NumOps" in metric:
if num_namenode_flag:
key = "MethodNumOps"
# 构建Guage类型指标
# 第一个参数为指标名称
# 第二个参数为指标描述信息
# 第三个参数为标签
self._hadoop_namenode_metrics['NameNodeActivity'][key] = GaugeMetricFamily("_".join([self._prefix, "nnactivity_method_ops_total"]),
"Total number of the times the method is called.",
labels=label)
# 设置为0,表示下一次同样类型的指标不做处理
num_namenode_flag = 0
else:
continue
elif "AvgTime" in metric:
if avg_namenode_flag:
key = "MethodAvgTime"
self._hadoop_namenode_metrics['NameNodeActivity'][key] = GaugeMetricFamily("_".join([self._prefix, "nnactivity_method_avg_time_milliseconds"]),
"Average turn around time of the method in milliseconds.",
labels=label)
avg_namenode_flag = 0
else:
continue
else:
# 如果没有进行指标分类维度化,则统一放到nnactivity_operations_total中存储
if ops_namenode_flag:
ops_namenode_flag = 0
key = "Operations"
self._hadoop_namenode_metrics['NameNodeActivity'][key] = GaugeMetricFamily("_".join([self._prefix, "nnactivity_operations_total"]),
"Total number of each operation.",
labels=label)
else:
continue
_get_metrics实现
该方法表示设置指标值到之前生成的指标中。
def _get_metrics(self, beans):
# 遍历每一个MBean
for i in range(len(beans)):
# 根据每个MBean进行相应处理
if 'NameNodeActivity' in beans[i]['name']:
self._get_nnactivity_metrics(beans[i])
if 'StartupProgress' in beans[i]['name']:
self._get_startupprogress_metrics(beans[i])
if 'FSNamesystem' in beans[i]['name'] and 'FSNamesystemState' not in beans[i]['name']:
self._get_fsnamesystem_metrics(beans[i])
if 'FSNamesystemState' in beans[i]['name']:
self._get_fsnamesystem_state_metrics(beans[i])
if 'RetryCache' in beans[i]['name']:
self._get_retrycache_metrics(beans[i])
处理指标值实现
加载指标值其实就是从MBean中将指标取出来,然后设置好即可。注意:加载指标值需要在加载label之后执行。
def _get_nnactivity_metrics(self, bean):
# 遍历对应分类的所有指标
for metric in self._metrics['NameNodeActivity']:
# 不同的指标类别进行不同处理
if "NumOps" in metric:
# 获取method操作Label
method = metric.split('NumOps')[0]
# 设置Label
label = [self._cluster, method]
key = "MethodNumOps"
elif "AvgTime" in metric:
method = metric.split('AvgTime')[0]
label = [self._cluster, method]
key = "MethodAvgTime"
else:
if "Ops" in metric:
method = metric.split('Ops')[0]
else:
method = metric
label = [self._cluster, method]
key = "Operations"
# 调用promethues的add_metric,设置label值和metric值
self._hadoop_namenode_metrics['NameNodeActivity'][key].add_metric(label,
bean[metric] if metric in bean else 0)
处理common指标实现
在组件中有不少的指标是公共的,例如:JVM相关的、操作系统相关的、RPC相关的、UGI、运行时相关的等等。这些指标定义在common文件中的JSON文件中。
将这些指标抽取出来处理,可以复用代码。毕竟,大数据集群中有不少的组件其实都是基于JVM的。
def common_metrics_info(cluster, beans, component, service):
'''
为所有服务实现的处理相同的指标数据定义的闭包。
@return a 名为common_metrics的闭包, 从指定的beans中维度处理后的所有指标。
'''
tmp_metrics = {}
common_metrics = {}
_cluster = cluster
_prefix = 'hadoop_{0}_{1}'.format(component, service)
# 读取common下的所有json指标配置
_metrics_type = utils.get_file_list("common")
for i in range(len(_metrics_type)):
common_metrics.setdefault(_metrics_type[i], {})
# 加载所有指标到字典
# 这里取名为tmp,因为它总是会被添加到具体组件实现中
tmp_metrics.setdefault(_metrics_type[i], utils.read_json_file("common", _metrics_type[i]))
...
def get_metrics():
'''
给setup_labels模块的输出结果进行赋值,从url中获取对应的数据,挨个赋值
'''
common_metrics = setup_labels(beans)
for i in range(len(beans)):
if 'name=JvmMetrics' in beans[i]['name']:
get_jvm_metrics(beans[i])
if 'OperatingSystem' in beans[i]['name']:
get_os_metrics(beans[i])
if 'RpcActivity' in beans[i]['name']:
get_rpc_metrics(beans[i])
if 'RpcDetailedActivity' in beans[i]['name']:
get_rpc_detailed_metrics(beans[i])
if 'UgiMetrics' in beans[i]['name']:
get_ugi_metrics(beans[i])
if 'MetricsSystem' in beans[i]['name'] and "sub=Stats" in beans[i]['name']:
get_metric_system_metrics(beans[i])
if 'Runtime' in beans[i]['name']:
get_runtime_metrics(beans[i])
return common_metrics
return get_metrics
可以看到,common指标的实现和之前的组件类似,也是先处理Label、然后处理指标值。
common指标Label实现
根据不同的MBean,调用不同设置Label方法。
def setup_labels(beans):
'''
预处理,分析各个模块的特点,进行分类,添加label
'''
for i in range(len(beans)):
if 'name=JvmMetrics' in beans[i]['name']:
setup_jvm_labels()
if 'OperatingSystem' in beans[i]['name']:
setup_os_labels()
if 'RpcActivity' in beans[i]['name']:
setup_rpc_labels()
if 'RpcDetailedActivity' in beans[i]['name']:
setup_rpc_detailed_labels()
if 'UgiMetrics' in beans[i]['name']:
setup_ugi_labels()
if 'MetricsSystem' in beans[i]['name'] and "sub=Stats" in beans[i]['name']:
setup_metric_system_labels()
if 'Runtime' in beans[i]['name']:
setup_runtime_labels()
return common_metrics
下面以JVM指标为例。可以看到,与之前的处理方式类似,也是根据不同类似的指标进行维度分组,然后设置标签。
def setup_jvm_labels():
for metric in tmp_metrics["JvmMetrics"]:
'''
Processing module JvmMetrics
'''
snake_case = "_".join(["jvm", re.sub('([a-z0-9])([A-Z])', r'\1_\2', metric).lower()])
if 'Mem' in metric:
name = "".join([snake_case, "ebibytes"])
label = ["cluster", "mode"]
if "Used" in metric:
key = "jvm_mem_used_mebibytes"
descriptions = "Current memory used in mebibytes."
elif "Committed" in metric:
key = "jvm_mem_committed_mebibytes"
descriptions = "Current memory committed in mebibytes."
elif "Max" in metric:
key = "jvm_mem_max_mebibytes"
descriptions = "Current max memory in mebibytes."
else:
key = name
label = ["cluster"]
descriptions = tmp_metrics['JvmMetrics'][metric]
elif 'Gc' in metric:
label = ["cluster", "type"]
if "GcCount" in metric:
key = "jvm_gc_count"
descriptions = "GC count of each type GC."
elif "GcTimeMillis" in metric:
key = "jvm_gc_time_milliseconds"
descriptions = "Each type GC time in milliseconds."
elif "ThresholdExceeded" in metric:
key = "jvm_gc_exceeded_threshold_total"
descriptions = "Number of times that the GC threshold is exceeded."
else:
key = snake_case
label = ["cluster"]
descriptions = tmp_metrics['JvmMetrics'][metric]
elif 'Threads' in metric:
label = ["cluster", "state"]
key = "jvm_threads_state_total"
descriptions = "Current number of different threads."
elif 'Log' in metric:
label = ["cluster", "level"]
key = "jvm_log_level_total"
descriptions = "Total number of each level logs."
else:
label = ["cluster"]
key = snake_case
descriptions = tmp_metrics['JvmMetrics'][metric]
common_metrics['JvmMetrics'][key] = GaugeMetricFamily("_".join([_prefix, key]),
descriptions,
labels=label)
return common_metrics
common指标处理指标值实现
与之前的处理指标值类似。
def get_jvm_metrics(bean):
for metric in tmp_metrics['JvmMetrics']:
name = "_".join(["jvm", re.sub('([a-z0-9])([A-Z])', r'\1_\2', metric).lower()])
if 'Mem' in metric:
if "Used" in metric:
key = "jvm_mem_used_mebibytes"
mode = metric.split("Used")[0].split("Mem")[1]
label = [_cluster, mode]
elif "Committed" in metric:
key = "jvm_mem_committed_mebibytes"
mode = metric.split("Committed")[0].split("Mem")[1]
label = [_cluster, mode]
elif "Max" in metric:
key = "jvm_mem_max_mebibytes"
if "Heap" in metric:
mode = metric.split("Max")[0].split("Mem")[1]
else:
mode = "max"
label = [_cluster, mode]
else:
key = "".join([name, 'ebibytes'])
label = [_cluster]
elif 'Gc' in metric:
if "GcCount" in metric:
key = "jvm_gc_count"
if "GcCount" == metric:
typo = "total"
else:
typo = metric.split("GcCount")[1]
label = [_cluster, typo]
elif "GcTimeMillis" in metric:
key = "jvm_gc_time_milliseconds"
if "GcTimeMillis" == metric:
typo = "total"
else:
typo = metric.split("GcTimeMillis")[1]
label = [_cluster, typo]
elif "ThresholdExceeded" in metric:
key = "jvm_gc_exceeded_threshold_total"
typo = metric.split("ThresholdExceeded")[0].split("GcNum")[1]
label = [_cluster, typo]
else:
key = name
label = [_cluster]
elif 'Threads' in metric:
key = "jvm_threads_state_total"
state = metric.split("Threads")[1]
label = [_cluster, state]
elif 'Log' in metric:
key = "jvm_log_level_total"
level = metric.split("Log")[1]
label = [_cluster, level]
else:
key = name
label = [_cluster]
common_metrics['JvmMetrics'][key].add_metric(label,
bean[metric] if metric in bean else 0)
return common_metrics
注册组件到Prometheus
实现好Collector后,需要到hadoop_exporter.py中的register_prometheus中进行处理。例如:
if 'NAMENODE' in k:
if namenode_flag:
namenode_url = v['jmx']
logger.info("namenode_url = {0}, start to register".format(namenode_url))
# 注册组件
REGISTRY.register(NameNodeMetricCollector(cluster, namenode_url))
namenode_flag = 0
continue
开发环境部署
Hadoop运行指标使用github上的Hadoop exporter。项目地址:https://github.com/IloveZiHan/hadoop_exporter
安装pip
# 创建目录
mkdir /opt/prometheus/exporters/hadoop_exporter/modules /opt/prometheus/exporters/hadoop_exporter/scripts
# 下载get-pip
curl https://bootstrap.pypa.io/pip/2.7/get-pip.py -o /opt/prometheus/exporters/hadoop_exporter/scripts/get-pip.py
cd /opt/prometheus/exporters/hadoop_exporter/scripts
# 下载setuptools
wget https://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz
# 解压到当前目录
tar -zxvf setuptools-0.6c11.tar.gz
cd setuptools-0.6c11
# 安装setuptools
sudo python setup.py install
cd /opt/prometheus/exporters/hadoop_exporter/scripts
python get-pip.py install requests --target=/opt/prometheus/exporters/hadoop_jmx_exporter/hadoop_exporter/modules
python get-pip.py install prometheus_client --target=/opt/prometheus/exporters/hadoop_jmx_exporter/hadoop_exporter/modules
python get-pip.py install python-consul --target=/opt/prometheus/exporters/hadoop_jmx_exporter/hadoop_exporter/modules
python get-pip.py install pyyaml --target=/opt/prometheus/exporters/hadoop_jmx_exporter/hadoop_exporter/modules
部署Hadoop集群服务发现
1、部署tomcat
# 上传tomcat包
[prometheus@ha-node1 exporters]$ ll | grep tomcat
-rw-r--r-- 1 root root 10515248 3月 9 14:38 apache-tomcat-8.5.63.tar.gz
cd /opt/prometheus/exporters
# 解压tomcat
[prometheus@ha-node1 exporters]$ tar -xvzf apache-tomcat-8.5.63.tar.gz
# 修改tomcat端口配置
cd /opt/prometheus/exporters/apache-tomcat-8.5.63/conf
[prometheus@ha-node1 conf]$ vim server.xml
<!-- 修改第72行 -->
<Connector port="9035" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" />
2、添加服务发现json
# 创建配置文件
vim /opt/prometheus/exporters/apache-tomcat-8.5.63/webapps/ROOT/cluster_config.json
{
"hadoop-ha": [
{
"ha-node1": {
"NAMENODE": {
"jmx": "http://ha-node1:9870/jmx"
},
"DATANODE": {
"jmx": "http://ha-node1:9864/jmx"
},
"HISTORYSERVER": {
"jmx": "http://ha-node1:19888/jmx"
},
"JOURNALNODE": {
"jmx": "http://ha-node1:8480/jmx"
},
"RESOURCEMANAGER": {
"jmx": "http://ha-node1:8088/jmx"
},
"NODEMANAGER": {
"jmx": "http://ha-node1:8042/jmx"
}
}
},
{
"ha-node2": {
"NAMENODE": {
"jmx": "http://ha-node2:9870/jmx"
},
"DATANODE": {
"jmx": "http://ha-node2:9864/jmx"
},
"JOURNALNODE": {
"jmx": "http://ha-node2:8480/jmx"
},
"RESOURCEMANAGER": {
"jmx": "http://ha-node2:8088/jmx"
},
"NODEMANAGER": {
"jmx": "http://ha-node2:8042/jmx"
}
}
},
{
"ha-node3": {
"NAMENODE": {
"jmx": "http://ha-node3:9870/jmx"
},
"DATANODE": {
"jmx": "http://ha-node3:9864/jmx"
},
"JOURNALNODE": {
"jmx": "http://ha-node3:8480/jmx"
},
"RESOURCEMANAGER": {
"jmx": "http://ha-node3:8088/jmx"
},
"NODEMANAGER": {
"jmx": "http://ha-node3:8042/jmx"
}
}
},
{
"ha-node4": {
"DATANODE": {
"jmx": "http://ha-node4:9864/jmx"
},
"JOURNALNODE": {
"jmx": "http://ha-node4:8480/jmx"
},
"NODEMANAGER": {
"jmx": "http://ha-node4:8042/jmx"
}
}
},
{
"ha-node5": {
"NAMENODE": {
"jmx": "http://ha-node5:9870/jmx"
},
"DATANODE": {
"jmx": "http://ha-node5:9864/jmx"
},
"HISTORYSERVER": {
"jmx": "http://ha-node5:19888/jmx"
},
"JOURNALNODE": {
"jmx": "http://ha-node5:8480/jmx"
},
"NODEMANAGER": {
"jmx": "http://ha-node5:8042/jmx"
}
}
}
]
}
3、启动tomcat
http://ha-node1:9035/cluster_config.json
修改hadoop_exporter.py源码
# 找到第40行,将URL地址调整为
url = 'http://{0}/cluster_config.json'.format(rest_url)
启动NameNode监控
export PYTHONPATH=${PYTHONPATH}:/opt/prometheus/exporters/hadoop_exporter/modules
python /opt/prometheus/exporters/hadoop_exporter/cmd/hdfs_namenode.py -c "hadoop-ha" -hdfs "http://ha-node1:9870/jmx" -host "ha-node1" -P 9131 -s "ha-node1:9035"
启动Exporter
export PYTHONPATH=${PYTHONPATH}:/opt/prometheus/exporters/hadoop_exporter/modules
python /opt/prometheus/exporters/hadoop_exporter/hadoop_exporter.py -host "ha-node1" -P 9131 -s "ha-node1:9035"