hystrix熔断器之metrics

Metric概述

  HystrixCommands和HystrixObservableCommands执行过程中,会产生执行的数据,这些数据对于观察调用的性能表现非常有用。

  命令产生数据后,Metrics会根据不同纬度进行统计,主要有一下三个纬度:一段时间内(窗口期)的累计统计数据、持续的累计统计数据、一段时间内(窗口期)的数据分布。

Metric实现

  Metrics实现主要的流程如下:

  1.命令在开始执行前会向开始消息流(HystrixCommandStartStream)发送开始消息(HystrixCommandExecutionStarted)。  

  2.如果是线程池执行,执行前会向线程池开始消息流(HystrixThreadPoolStartStream)发送开始消息(HystrixCommandExecutionStarted)。

  3.如果是线程池执行,执行后会向线程池结束消息流(HystrixThreadPoolCompletionStream)发送完成消息(HystrixCommandCompletion)。

  4.命令在结束执行前会向完成消息流(HystrixCommandCompletionStream)发送完成消息(HystrixCommandCompletion)。

  5.不同类型的统计流,会监听开始消息流或完成消息流,根据接受到的消息内容,进行统计。

Hystrix消息类型

  HystrixCommandCompletion有一下消息类型

 

 一段时间内统计

   统计流首先监听一个消息流(开始消息流或者完成消息流),统计一段时间内各个类型消息的累计数据(时间为:metrics.rollingStats.timeInMilliseconds/metrics.rollingStats.numBuckets)。然后再对累计的数据进行累加(个数为:metrics.rollingStats.numBuckets),即为最终累计数据。

  RollingCommandEventCounterStream消息流监听了HystrixCommandCompletionStream消息流,并统计各种消息类型次数。      

  RollingCollapserEventCounterStream消息流监听了HystrixCollapserEventStream消息流,并统计各种消息类型次数。 

  RollingThreadPoolEventCounterStream消息流监听了HystrixThreadPoolCompletionStream消息流,并统计各种消息类型次数。 

  HealthCountsStream消息流监听了HystrixThreadPoolCompletionStream消息流,并统计成功次数,失败次数,失败率。 

持续统计

   统计流首先监听一个消息流(开始消息流或者完成消息流),统计一段时间内各个类型消息的累计数据(时间为:metrics.rollingStats.timeInMilliseconds/metrics.rollingStats.numBuckets)。然后不断的累加累计数据。

  CumulativeCommandEventCounterStream监听了HystrixCommandCompletionStream消息流,并统计各种消息类型次数。    

  CumulativeCollapserEventCounterStream监听了HystrixCollapserEventStream消息流,并统计各种消息类型次数。    

  CumulativeThreadPoolEventCounterStream监听了HystrixThreadPoolCompletionStream消息流,并统计各种消息类型次数。    

一段时间内分布统计

  RollingDistributionStream监听一个消息流,例如HystrixCommandStartStream,然后通过RX java对一段时间内的数值进行运算操作,生成统计值放在Histogram对象中,然后重新发射,对窗口期内的Histogram对象进行运算操作,并生成统计值重新发射。

    子类RollingCommandLatencyDistributionStream监听了HystrixCommandCompletionStream消息流,并且通过RX java监听窗口期内的executelatency,通过Histogram计算窗口期内延时的分布。

    子类RollingCommandUserLatencyDistributionStream监听了HystrixCommandCompletionStream消息流,并且通过RX java监听窗口期内的totalLatency,通过Histogram计算窗口期内延时的分布。

    子类RollingCollapserBatchSizeDistributionStream监听了HystrixCollapserEventStream消息流,并且通过RX java监听窗口期内的ADDED_TO_BATCH消息类型次数,通过Histogram计算窗口期内延时的分布。

  RollingConcurrencyStream监听一个消息流,例如HystrixCommandStartStream,然后通过RX java对一段时间内的执行并发量取最大值,重新发射,对窗口期内的执行并发量取最大值,重新发射。

    子类RollingCommandMaxConcurrencyStream监听了HystrixCommandStartStream,然后通过RX java对窗口期内的执行并发量取最大值。

    子类RollingThreadPoolMaxConcurrencyStream监听了HystrixThreadPoolStartStream,然后通过RX java对窗口期内的执行并发量取最大值。

其他数据流

  还有一些独立于消息流的数据流,对于理解系统信息也非常有帮助。

  配置流HystrixConfigurationStream,通过该数据流可以定时获取hystrix最新的properties配置信息,com.netflix.hystrix.contrib.sample.stream.HystrixConfigSseServlet就是用该流来获取配置信息。

  数据格式:

data: {"type":"HystrixConfig","commands":{"CreditCardCommand":{"threadPoolKey":"CreditCard","groupKey":"CreditCard","execution":{"isolationStrategy":"THREAD","threadPoolKeyOverride":null,"requestCacheEnabled":true,"requestLogEnabled":true,"timeoutEnabled":true,"fallbackEnabled":true,"timeoutInMilliseconds":3000,"semaphoreSize":10,"fallbackSemaphoreSize":10,"threadInterruptOnTimeout":true},"metrics":{"healthBucketSizeInMs":500,"percentileBucketSizeInMilliseconds":60000,"percentileBucketCount":10,"percentileEnabled":true,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"circuitBreaker":{"enabled":true,"isForcedOpen":false,"isForcedClosed":false,"requestVolumeThreshold":20,"errorPercentageThreshold":50,"sleepInMilliseconds":5000}},"GetUserAccountCommand":{"threadPoolKey":"User","groupKey":"User","execution":{"isolationStrategy":"THREAD","threadPoolKeyOverride":null,"requestCacheEnabled":true,"requestLogEnabled":true,"timeoutEnabled":true,"fallbackEnabled":true,"timeoutInMilliseconds":50,"semaphoreSize":10,"fallbackSemaphoreSize":10,"threadInterruptOnTimeout":true},"metrics":{"healthBucketSizeInMs":500,"percentileBucketSizeInMilliseconds":60000,"percentileBucketCount":10,"percentileEnabled":true,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"circuitBreaker":{"enabled":true,"isForcedOpen":false,"isForcedClosed":false,"requestVolumeThreshold":20,"errorPercentageThreshold":50,"sleepInMilliseconds":5000}},"GetOrderCommand":{"threadPoolKey":"Order","groupKey":"Order","execution":{"isolationStrategy":"THREAD","threadPoolKeyOverride":null,"requestCacheEnabled":true,"requestLogEnabled":true,"timeoutEnabled":true,"fallbackEnabled":true,"timeoutInMilliseconds":1000,"semaphoreSize":10,"fallbackSemaphoreSize":10,"threadInterruptOnTimeout":true},"metrics":{"healthBucketSizeInMs":500,"percentileBucketSizeInMilliseconds":60000,"percentileBucketCount":10,"percentileEnabled":true,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"circuitBreaker":{"enabled":true,"isForcedOpen":false,"isForcedClosed":false,"requestVolumeThreshold":20,"errorPercentageThreshold":50,"sleepInMilliseconds":5000}},"GetPaymentInformationCommand":{"threadPoolKey":"PaymentInformation","groupKey":"PaymentInformation","execution":{"isolationStrategy":"THREAD","threadPoolKeyOverride":null,"requestCacheEnabled":true,"requestLogEnabled":true,"timeoutEnabled":true,"fallbackEnabled":true,"timeoutInMilliseconds":1000,"semaphoreSize":10,"fallbackSemaphoreSize":10,"threadInterruptOnTimeout":true},"metrics":{"healthBucketSizeInMs":500,"percentileBucketSizeInMilliseconds":60000,"percentileBucketCount":10,"percentileEnabled":true,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"circuitBreaker":{"enabled":true,"isForcedOpen":false,"isForcedClosed":false,"requestVolumeThreshold":20,"errorPercentageThreshold":50,"sleepInMilliseconds":5000}}},"threadpools":{"User":{"coreSize":8,"maxQueueSize":-1,"queueRejectionThreshold":5,"keepAliveTimeInMinutes":1,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"CreditCard":{"coreSize":8,"maxQueueSize":-1,"queueRejectionThreshold":5,"keepAliveTimeInMinutes":1,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"Order":{"coreSize":8,"maxQueueSize":-1,"queueRejectionThreshold":5,"keepAliveTimeInMinutes":1,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10},"PaymentInformation":{"coreSize":8,"maxQueueSize":-1,"queueRejectionThreshold":5,"keepAliveTimeInMinutes":1,"counterBucketSizeInMilliseconds":10000,"counterBucketCount":10}},"collapsers":{}}

  功能流HystrixUtilizationStream,通过该数据流可以获得并发量,线程池状况等信息。com.netflix.hystrix.contrib.sample.stream.HystrixUtilizationSseServlet就是用该流来获取配置信息。

  数据格式:  

data: {"type":"HystrixUtilization","commands":{"CreditCardCommand":{"activeCount":0},"GetUserAccountCommand":{"activeCount":0},"GetOrderCommand":{"activeCount":1},"GetPaymentInformationCommand":{"activeCount":0}},"threadpools":{"User":{"activeCount":0,"queueSize":0,"corePoolSize":8,"poolSize":2},"CreditCard":{"activeCount":0,"queueSize":0,"corePoolSize":8,"poolSize":1},"Order":{"activeCount":1,"queueSize":0,"corePoolSize":8,"poolSize":2},"PaymentInformation":{"activeCount":0,"queueSize":0,"corePoolSize":8,"poolSize":2}}}

  请求数据流HystrixRequestEventsStream,通过该数据流可以获得http请求相关的信息,com.netflix.hystrix.contrib.requests.stream.HystrixRequestEventsSseServlet就是用该流来获取配置信息。

  数据格式:

data: {"name":"GetOrderCommand","events":["SUCCESS"],"latencies":[111]},{"name":"GetPaymentInformationCommand","events":["SUCCESS"],"latencies":[15]},{"name":"GetUserAccountCommand","events":["TIMEOUT","FALLBACK_SUCCESS"],"latencies":[59],"cached":2},{"name":"CreditCardCommand","events":["SUCCESS"],"latencies":[957]}],[{"name":"GetUserAccountCommand","events":["SUCCESS"],"latencies":[3],"cached":2},{"name":"GetOrderCommand","events":["SUCCESS"],"latencies":[77]},{"name":"GetPaymentInformationCommand","events":["SUCCESS"],"latencies":[21]},{"name":"CreditCardCommand","events":["SUCCESS"],"latencies":[1199]}

MetricPublisher

  有时,我们需要发布Hystrix中的metrics到其他地方,Hystrix提供了相应的接口(HystrixMetricsPublisherCollapser,HystrixMetricsPublisherCommand,HystrixMetricsPublisherThreadPool),实现这些接口,并在initial方法中实现发送hystrix的metrics。并且实现HystrixMetricsPublisher,来创建这些实现类。

  Hystrix原理:

  Hystrix运行时,HystrixMetricsPublisherFactory通过HystrixPlugins获取HystrixMetricsPublisher的实现类。并且通过该实现类来创建(HystrixMetricsPublisherCollapser,HystrixMetricsPublisherCommand,HystrixMetricsPublisherThreadPool)的实现类,并在初次创建时调用其initial方法。

  coda hale 实现了将hystrix 的metrics信息输出到指定metrics监控系统中。

  引入jar包:

  <dependency>

            <groupId>com.netflix.hystrix</groupId>

            <artifactId>hystrix-codahale-metrics-publisher</artifactId>

            <version>1.5.9</version>

        </dependency>

  创建HystrixMetricsPublisher对象并注册到HystrixPlugins:

  @Bean

    HystrixMetricsPublisher hystrixMetricsPublisher() {

        HystrixCodaHaleMetricsPublisher publisher = new HystrixCodaHaleMetricsPublisher(metricRegistry);

        HystrixPlugins.getInstance().registerMetricsPublisher(publisher);

        return publisher;

    }

  coda hale实现源码如下:

  public class HystrixCodaHaleMetricsPublisher extends HystrixMetricsPublisher {

      private final String metricsRootNode;

      private final MetricRegistry metricRegistry;

      public HystrixCodaHaleMetricsPublisher(MetricRegistry metricRegistry) {

          this(null, metricRegistry);

      }

      public HystrixCodaHaleMetricsPublisher(String metricsRootNode, MetricRegistry metricRegistry) {

          this.metricsRootNode = metricsRootNode;

          this.metricRegistry = metricRegistry;

      }

      @Override

      public HystrixMetricsPublisherCommand getMetricsPublisherForCommand(HystrixCommandKey commandKey, HystrixCommandGroupKey commandGroupKey, HystrixCommandMetrics metrics, HystrixCircuitBreaker circuitBreaker, HystrixCommandProperties properties) {

          return new HystrixCodaHaleMetricsPublisherCommand(metricsRootNode, commandKey, commandGroupKey, metrics, circuitBreaker, properties, metricRegistry);

      }

      @Override

      public HystrixMetricsPublisherThreadPool getMetricsPublisherForThreadPool(HystrixThreadPoolKey threadPoolKey, HystrixThreadPoolMetrics metrics, HystrixThreadPoolProperties properties) {

          return new HystrixCodaHaleMetricsPublisherThreadPool(metricsRootNode, threadPoolKey, metrics, properties, metricRegistry);

      }

      @Override

      public HystrixMetricsPublisherCollapser getMetricsPublisherForCollapser(HystrixCollapserKey collapserKey, HystrixCollapserMetrics metrics, HystrixCollapserProperties properties) {

          return new HystrixCodaHaleMetricsPublisherCollapser(collapserKey, metrics, properties, metricRegistry);

      }

  }

  HystrixCodaHaleMetricsPublisher负责创建HystrixCodaHaleMetricsPublisherCommand,HystrixCodaHaleMetricsPublisherThreadPool,HystrixCodaHaleMetricsPublisherCollapser。这三个对象实现基本逻辑是在initialize方法中向metricRegistry中设置相应信息。

   public void initialize() {

      metricRegistry.register(createMetricName("isCircuitBreakerOpen"), new Gauge<Boolean>() {

      @Override

      public Boolean getValue() {

           return circuitBreaker.isOpen();

      }

    .....

 }

  HystrixCodaHaleMetricsPublisherCommand向metricRegistry设置了一下metrics信息:

  isCircuitBreakerOpen  是否熔断,通过熔断器获得。

  currentTime  当前系统时间

  countBadRequests  通过HystrixCommandMetrics获得。

 

统计流类实现类

Hystrix的Metrics功能模块中存储了与Hystrix运行相关的度量信息,主要有三类类型:

  1)HystrixCommandMetrics:

    保存hystrix命令执行的度量信息。

    markCommandStart 当命令开始执行,调用该方法。

    markCommandDone 命令执行完成,调用该方法。

    getRollingCount 获取某一事件类型窗口期内的统计数值

    getCumulativeCount 获取某一事件类型持续的统计数值

        getExecutionTimePercentile  获取某一百分比的请求执行时间

        getExecutionTimeMean  获取平均请求执行时间

    getTotalTimePercentile,获取某一百分比的请求执行总时间

    getTotalTimeMean,获取平均请求执行总时间

    getRollingMaxConcurrentExecutions 获取上一个窗口期内最大的并发数

        getHealthCountsStream 获取窗口期内的失败次数,总次数,失败比率

    记录以下事件类型:

    HystrixRollingNumberEvent.SUCCESS 命令执行成功的

    FAILURE

    TIMEOUT(1), SHORT_CIRCUITED(1), THREAD_POOL_REJECTED(1), SEMAPHORE_REJECTED(1), BAD_REQUEST(1),     FALLBACK_SUCCESS(1), FALLBACK_FAILURE(1), FALLBACK_REJECT

 

  2)HystrixThreadPoolMetrics 保存hystrix线程池执行的度量信息。

  markThreadCompletion 当线程吃执行一个任务时调用。

  markThreadExecution  当线程池完成一个任务时调用。

  getRollingCount 获取某一事件类型窗口期内的统计数值。

  getCumulativeCount  获取某一事件类型持续的统计数值。

 

 

 

HystrixCommandMetrics实现:

  当调用markCommandStart方法时,实际向消息流对象HystrixCommandStartStream 写入HystrixCommandExecutionStarted消息。消息流是消息传输的中间件,其内部是一个RX java Subject。消息监听者通过订阅这些消息流来监听这些消息。如果是线程模式执行,还需要向消息流对象HystrixThreadPoolStartStream 写入HystrixCommandExecutionStarted消息。

  当调用markCommandDone方法时, 实际向消息流对象HystrixCommandCompletionStream 写入HystrixCommandCompletion消息。如果是线程模式执行,还需要向消息流对象HystrixThreadPoolCompletionStream 写入HystrixCommandCompletion消息。

  当调用collapserResponseFromCache方法时,实际向消息流对象HystrixCollapserEventStream写入HystrixCollapserEvent消息。消息流是消息传输的中间件,其内部是一个RX java Subject。消息监听者通过订阅这些消息流来监听这些消息。

  当调用collapserBatchExecuted方法时,实际向消息流对象HystrixCollapserEventStream写入HystrixCollapserEvent消息。消息流是消息传输的中间件,其内部是一个RX java Subject。消息监听者通过订阅这些消息流来监听这些消息。

  当调用getRollingCount方法时,实际从消息流对象RollingCommandEventCounterStream获取相应的信息。RollingCommandEventCounterStream消息流监听了HystrixCommandCompletionStream消息流,并且通过RX java 对各个消息类型进行一段时间内数据的统计。      

  当调用getCumulativeCount方法时,实际从消息流对象CumulativeCommandEventCounterStream获取相应的信息。CumulativeCommandEventCounterStream消息流监听了HystrixCommandCompletionStream消息流,并且通过RX java 对各个消息类型进行持续的数据的统计。   

       当调用getExecutionTimePercentile,getExecutionTimeMean方法时,实际从消息流对象RollingCommandLatencyDistributionStream获取相应的信息。RollingCommandLatencyDistributionStream消息流监听了HystrixCommandCompletionStream消息流。并且通过RX java对窗口期内的请求的executionLatency的分布进行计算。

       当调用getTotalTimePercentile,getTotalTimeMean方法时,实际从消息流对象RollingCommandUserLatencyDistributionStream获取相应的信息。RollingCommandUserLatencyDistributionStream消息流监听了HystrixCommandCompletionStream消息流。并且通过RX java对窗口期内的请求的totalLatency的分布进行计算。

       当调用getHealthCountsStream,实际从消息流对象HealthCountsStream获取想要信息,HealthCountsStream消息流监听了HystrixCommandCompletionStream消息流,并且通过RX java对窗口期内的请求成功,失败,超时,进行统计得出失败次数,总次数,失败比率。

   当调用getRollingMaxConcurrentExecutions,实际从消息流对象RollingCommandMaxConcurrencyStream获取相应的信息。RollingCommandMaxConcurrencyStream消息流监听了HystrixCommandStartStream消息流,并且通过RX java 获取窗口期内最大的并发数。

HystrixThreadPoolMetrics实现:

  当调用getRollingCount方法时,实际从消息流对象RollingThreadPoolEventCounterStream获取相应的信息。RollingThreadPoolEventCounterStream消息流监听了HystrixCommandExecutionStarted消息流,并且进行一段时间内数据的统计。

 

 

 

当前服务的健康状况, 包括服务调用总次数和服务调用失败次数等. 根据Metrics的计数, 熔断器从而能计算出当前服务的调用失败率, 用来和设定的阈值比较从而决定熔断器的状态切换逻辑. 因此Metrics的实现非常重要。

HystrixRollingNumber统计一定时间内的统计数值,基本思想就是分段统计,比如说要统计qps,即1秒内的请求总数。如下图所示,我们可以将1s的时间分成10段,每段100ms。在第一个100ms内,写入第一个段中进行计数,在第二个100ms内,写入第二个段中进行计数,这样如果要统计当前时间的qps,我们总是可以通过统计当前时间前1s(共10段)的计数总和值。

 

posted @ 2017-09-02 19:20  zwh1988  阅读(4059)  评论(0编辑  收藏  举报