circuit

在Hystrix调用服务时,难免会遇到异常,如对方服务不可用,在这种情况下如果仍然不停地调用就是不必要的,在Hystrix中可以配置使用circuit,当达到一定程度错误,就会自动调用fallback方法而不是用run方法。

配置

在Command的构造函数的CommandPropertiesDefaults中可以配置以下的参数

    circuitBreakerRequestVolumeThreshold; // 在时间窗口(默认10s)中需要达到的访问数量	默认20
    circuitBreakerSleepWindowInMilliseconds; // 触发circuit后重试的间隔 默认5s
    circuitBreakerEnabled; // 是否启用circuit 默认启用
    circuitBreakerErrorThresholdPercentage; // 在时间窗口中触发circuit的错误访问百分比 默认50%

从配置可以粗略看出,circuit就是在一个时间窗口内,当访问达到一定数量且错误率达到一定阈值就直接调用fallback,过一定时间之后会尝试,如果成功就重新开始时间窗口,如果失败继续调用fallback,如此往复。

实现

Hystrix的默认circuit实现是HystrixCircuitBreakerImpl

//这个是主要的逻辑,用于判断是否进行circuit,返回false则直接调用fallback,返回true则调用run
@Override
public boolean allowRequest() {
	//判断是否强制开启circuit,使用circuitBreakerForceOpen配置
    if (properties.circuitBreakerForceOpen().get()) {
        // properties have asked us to force the circuit open so we will allow NO requests
        return false;
    }
	//判断是否强制关闭circuit,使用circuitBreakerForceClosed配置
    if (properties.circuitBreakerForceClosed().get()) {
        // we still want to allow isOpen() to perform it's calculations so we simulate normal behavior
        isOpen();
        // properties have asked us to ignore errors so we will ignore the results of isOpen and just allow all traffic through
        return true;
    }
	//调用isOpen和allowSingleTest,前者是控制流量和错误率,后者是使用
    return !isOpen() || allowSingleTest();
}
//判断是否触发circuit
@Override
public boolean isOpen() {
    if (circuitOpen.get()) {
        // if we're open we immediately return true and don't bother attempting to 'close' ourself as that is left to allowSingleTest and a subsequent successful test to close
        return true;
    }
	//healthCounts是一个记录请求详细信息的subject,底层是注册在Rxjava上的
    // we're closed, so let's see if errors have made us so we should trip the circuit open
    HealthCounts health = metrics.getHealthCounts();

	//校验是否到达对应的流量
    // check if we are past the statisticalWindowVolumeThreshold
    if (health.getTotalRequests() < properties.circuitBreakerRequestVolumeThreshold().get()) {
        // we are not past the minimum volume threshold for the statisticalWindow so we'll return false immediately and not calculate anything
        return false;
    }
	//校验是否到达对应的错误率
    if (health.getErrorPercentage() < properties.circuitBreakerErrorThresholdPercentage().get()) {
        return false;
    } else {
		//开启circuit,使用CAS来处理多线程的征用
        // our failure rate is too high, trip the circuit
        if (circuitOpen.compareAndSet(false, true)) {
            // if the previousValue was false then we want to set the currentTime
            circuitOpenedOrLastTestedTime.set(System.currentTimeMillis());
            return true;
        } else {
            // How could previousValue be true? If another thread was going through this code at the same time a race-condition could have
            // caused another thread to set it to true already even though we were in the process of doing the same
            // In this case, we know the circuit is open, so let the other thread set the currentTime and report back that the circuit is open
            return true;
        }
    }
}

//当开启circuit在一段时间后需要重试,allowRequest中使用||短路符号来触发时间校验
public boolean allowSingleTest() {
	//获得上次成功运行或测试时间
    long timeCircuitOpenedOrWasLastTested = circuitOpenedOrLastTestedTime.get();
    // 1) if the circuit is open
    // 2) and it's been longer than 'sleepWindow' since we opened the circuit
	//如果circuit开启并时间间隔大于响应的时间则进行测试,为了避免多线程的问题,这里也使用CAS进行比较时间
    if (circuitOpen.get() && System.currentTimeMillis() > timeCircuitOpenedOrWasLastTested + properties.circuitBreakerSleepWindowInMilliseconds().get()) {
        // We push the 'circuitOpenedTime' ahead by 'sleepWindow' since we have allowed one request to try.
        // If it succeeds the circuit will be closed, otherwise another singleTest will be allowed at the end of the 'sleepWindow'.
        if (circuitOpenedOrLastTestedTime.compareAndSet(timeCircuitOpenedOrWasLastTested, System.currentTimeMillis())) {
            // if this returns true that means we set the time so we'll return true to allow the singleTest
            // if it returned false it means another thread raced us and allowed the singleTest before we did
            return true;
        }
    }
    return false;
}

总结

Hystrix的circuit可以减少发生错误时的无用调用,其中最关键的是其中的HealthCounts用来记录错误率和流量,同时这个记录不是实时的是有一定的时间间隔的,这个是使用RxJava来实现的,后续会就Rxjava进行研究。