ganglia问题小结
1.gmetad和rrdtool的关系
gmetad负责将轮询gmond拉取到的数据存入rrdtool的文件中,rrdtool
2.gemtad.conf
①命令:/usr/sbin/gmetad -d 1
增加debug_level参数可以帮助我们查找gmetad失败原因
#-------------------------------------------------------------------------------
# Setting the debug_level to 1 will keep daemon in the forground and
# show only error messages. Setting this value higher than 1 will make
# gmetad output debugging information and stay in the foreground.
# default: 0
# debug_level 10
②data_source "my cluster" [polling interval] address1:port addreses2:port ...
gmetad在轮询的时候会自动识别data_source是cluster还是grid,如果是cluster会收集明细指标数据,如果是grid只会收集汇总数据(相当于remote grid,如果想看它的明细指标,需要在该grid的gmetad上开启gweb服务,详见多层gemetad的介绍)(如果是从grid,即从另一个gmetad收集数据,端口号应设置为8651,当然如果该gmetad仅为gmetad节点,无gmond应用,则不用特别设置端口号8651,会默认取8651)
# The data_source tag specifies either a cluster or a grid to
# monitor. If we detect the source is a cluster, we will maintain a complete
# set of RRD databases for it, which can be used to create historical
# graphs of the metrics. If the source is a grid (it comes from another gmetad),
# we will only maintain summary RRDs for it.
③data_source "my cluster" [polling interval] address1:port addreses2:port ...
polling interval需要紧跟在data_source后,指定了gmetad轮询该data_source的时间间隔,默认是15秒;
如果要自定义这个间隔时间,需要注意一点,gweb前台判断一台注意是否down了的标准是,响应时间是否在最长响应时间内;
该响应时间是4 * TMAX (20sec by default),即80秒,如果设置的间隔超过了80是,gweb就会认为主机down了;
# The keyword 'data_source' must immediately be followed by a unique
# string which identifies the source, then an optional polling interval in
# seconds. The source will be polled at this interval on average.
# If the polling interval is omitted, 15sec is asssumed.
# If you choose to set the polling interval to something other than the default,
# note that the web frontend determines a host as down if its TN value is less
# than 4 * TMAX (20sec by default). Therefore, if you set the polling interval
# to something around or greater than 80sec, this will cause the frontend to
# incorrectly display hosts as down even though they are not.
④data_source "my cluster" [polling interval] address1:port addreses2:port ...
无特别指定默认port是8649
# A list of machines which service the data source follows, in the
# format ip:port, or name:port. If a port is not specified then 8649
# (the default gmond port) is assumed.
# default: There is no default value
3.
4.
#-------------------------------------------------------------------------------
# Scalability mode. If on, we summarize over downstream grids, and respect
# authority tags. If off, we take on 2.5.0-era behavior: we do not wrap our output
# in <GRID></GRID> tags, we ignore all <GRID> tags we see, and always assume
# we are the "authority" on data source feeds. This approach does not scale to
# large groups of clusters, but is provided for backwards compatibility.
# default: on
# scalable off