




  Metrics Collector。指标收集器,是用来收集相关的指标的,每个收集器都提供不同的指标集。目前集群度量扩展提供了默认的两种实现:akka.cluster.metrics.SigarMetricsCollector、akka.cluster.metrics.JmxMetricsCollector。第一种收集器收集的指标比较精细、准确,但比较耗资源。第二种则刚好相反。其实JMX在Cluster源码分析中有提到过,但没有深入讲解。当然了收集器的加载是有一定顺序的,用户自定义优先级最高,SigarMetricsCollector次之,JmxMetricsCollector最低。有优先级就意味着收集器只能有一个。

  Metrics Events。指标事件。我们知道akka体系中最重要的就是actor和消息,那么收集器收集的结果就是指标事件,会在固定周期内发布出去。



class MetricsListener extends Actor with ActorLogging {
  val selfAddress = Cluster(context.system).selfAddress
  val extension = ClusterMetricsExtension(context.system)

  // Subscribe unto ClusterMetricsEvent events.
  override def preStart(): Unit = extension.subscribe(self)

  // Unsubscribe from ClusterMetricsEvent events.
  override def postStop(): Unit = extension.unsubscribe(self)

  def receive = {
    case ClusterMetricsChanged(clusterMetrics) ⇒
      clusterMetrics.filter(_.address == selfAddress) foreach { nodeMetrics ⇒
    case state: CurrentClusterState ⇒ // Ignore.

  def logHeap(nodeMetrics: NodeMetrics): Unit = nodeMetrics match {
    case HeapMemory(address, timestamp, used, committed, max) ⇒
      log.info("Used heap: {} MB", used.doubleValue / 1024 / 1024)
    case _ ⇒ // No heap info.

  def logCpu(nodeMetrics: NodeMetrics): Unit = nodeMetrics match {
    case Cpu(address, timestamp, Some(systemLoadAverage), cpuCombined, cpuStolen, processors) ⇒
      log.info("Load: {} ({} processors)", systemLoadAverage, processors)
    case _ ⇒ // No cpu info.


 * Cluster metrics extension.
 * Cluster metrics is primarily for load-balancing of nodes. It controls metrics sampling
 * at a regular frequency, prepares highly variable data for further analysis by other entities,
 * and publishes the latest cluster metrics data around the node ring and local eventStream
 * to assist in determining the need to redirect traffic to the least-loaded nodes.
 * Metrics sampling is delegated to the [[MetricsCollector]].
 * Smoothing of the data for each monitored process is delegated to the
 * [[EWMA]] for exponential weighted moving average.
class ClusterMetricsExtension(system: ExtendedActorSystem) extends Extension {

   * Metrics extension configuration.
  val settings = ClusterMetricsSettings(system.settings.config)
  import settings._

   * Supervision strategy.
  private[metrics] val strategy = system.dynamicAccess.createInstanceFor[SupervisorStrategy](
    SupervisorStrategyProvider, immutable.Seq(classOf[Config] → SupervisorStrategyConfiguration))
    .getOrElse {
      val log: LoggingAdapter = Logging(system, getClass.getName)
      log.error(s"Configured strategy provider ${SupervisorStrategyProvider} failed to load, using default ${classOf[ClusterMetricsStrategy].getName}.")
      new ClusterMetricsStrategy(SupervisorStrategyConfiguration)

   * Supervisor actor.
   * Accepts subtypes of [[CollectionControlMessage]]s to manage metrics collection at runtime.
  val supervisor = system.systemActorOf(

   * Subscribe user metrics listener actor unto [[ClusterMetricsEvent]]
   * events published by extension on the system event bus.
  def subscribe(metricsListener: ActorRef): Unit = {
    system.eventStream.subscribe(metricsListener, classOf[ClusterMetricsEvent])

   * Unsubscribe user metrics listener actor from [[ClusterMetricsEvent]]
   * events published by extension on the system event bus.
  def unsubscribe(metricsListenter: ActorRef): Unit = {
    system.eventStream.unsubscribe(metricsListenter, classOf[ClusterMetricsEvent])







 * Metrics sampler.
 * Implementations of cluster system metrics collectors extend this trait.
trait MetricsCollector extends Closeable {
   * Samples and collects new data points.
   * This method is invoked periodically and should return
   * current metrics for this node.
  def sample(): NodeMetrics


 * The snapshot of current sampled health metrics for any monitored process.
 * Collected and gossipped at regular intervals for dynamic cluster management strategies.
 * Equality of NodeMetrics is based on its address.
 * @param address [[akka.actor.Address]] of the node the metrics are gathered at
 * @param timestamp the time of sampling, in milliseconds since midnight, January 1, 1970 UTC
 * @param metrics the set of sampled [[akka.cluster.metrics.Metric]]
final case class NodeMetrics(address: Address, timestamp: Long, metrics: Set[Metric] = Set.empty[Metric])


 * Metrics key/value.
 * Equality of Metric is based on its name.
 * @param name the metric name
 * @param value the metric value, which must be a valid numerical value,
 *   a valid value is neither negative nor NaN/Infinite.
 * @param average the data stream of the metric value, for trending over time. Metrics that are already
 *   averages (e.g. system load average) or finite (e.g. as number of processors), are not trended.
final case class Metric private[metrics] (name: String, value: Number, average: Option[EWMA])
  extends MetricNumericConverter 



 * Loads JVM and system metrics through JMX monitoring beans.
 * @param address The [[akka.actor.Address]] of the node being sampled
 * @param decayFactor how quickly the exponential weighting of past data is decayed
class JmxMetricsCollector(address: Address, decayFactor: Double) extends MetricsCollector 

   JmxMetricsCollector通过JMX监控bean加载JVM和系统指标,JMX 是什么这里先不具体解释。这个类有两个参数,第一个不再解释,第二个比较重要。decayFactor代表历史数据指数加权的衰败因子,我想就是一个过期限制吧。

   * Samples and collects new data points.
   * Creates a new instance each time.
  def sample(): NodeMetrics = NodeMetrics(address, newTimestamp, metrics)


   * Generate metrics set.
   * Creates a new instance each time.
  def metrics(): Set[Metric] = {
    val heap = heapMemoryUsage
    Set(systemLoadAverage, heapUsed(heap), heapCommitted(heap), heapMax(heap), processors).flatten

   而metrics这个方法又分别调用了systemLoadAverage, heapUsed(heap), heapCommitted(heap), heapMax(heap), processors这5个方法,我们只分析第一个。

   * (JMX) Returns the OS-specific average load on the CPUs in the system, for the past 1 minute.
   * On some systems the JMX OS system load average may not be available, in which case a -1 is
   * returned from JMX, and None is returned from this method.
   * Creates a new instance each time.
  def systemLoadAverage: Option[Metric] = Metric.create(
    name = SystemLoadAverage,
    value = osMBean.getSystemLoadAverage,
    decayFactor = None)


private val memoryMBean: MemoryMXBean = ManagementFactory.getMemoryMXBean

private val osMBean: OperatingSystemMXBean = ManagementFactory.getOperatingSystemMXBean



ClassLoadingMXBean 用于 Java 虚拟机的类加载系统的管理接口。
CompilationMXBean 用于 Java 虚拟机的编译系统的管理接口。
GarbageCollectorMXBean 用于 Java 虚拟机的垃圾回收的管理接口。
MemoryManagerMXBean 内存管理器的管理接口。
MemoryMXBean Java 虚拟机的内存系统的管理接口。
MemoryPoolMXBean 内存池的管理接口。
OperatingSystemMXBean 用于操作系统的管理接口,Java 虚拟机在此操作系统上运行。
RuntimeMXBean Java 虚拟机的运行时系统的管理接口。

Java 虚拟机线程系统的管理接口。

  其实源码看到这里基本就差不多了,因为跟我们预计的差不多,其实就是创建了一个actor,这个actor通过java.lang.management获取JVM相关的信息,然后通过eventStream把数据分发出去,需要度量事件的节点的actor订阅相关的事件就可以了。当然了akka又往前走了一步,既然akka说集群度量体系的初衷是为了提供负载均衡的,它就真的提供了AdaptiveLoadBalancingPool / AdaptiveLoadBalancingGroup 这两个自适应性的负载均衡策略,这两个路由策略基于度量指标收集到的信息把消息分散到集群中的对应的actor,以达到负载均衡的目的。这两个类的源码今天就不再深入研究了,感兴趣的读者可自行研究。

