Sentinel的熔断降级源码分析(二十)

限流原理分析

在Sentinel中,所有的资源都对应一个资源名称以及一个Entry。每一个entry可以表示一个请求。而Sentinel中,会针对当前请求基于规则的判断来实现流控的控制,原理如下图所示。
 
当一个外部请求过来之后,会创建一个Entry,而创建Entry的同时,也会创建一系列的slot 组成一个责任链,每个slot有不同的工作职责。
  • NodeSelectorSlot 负责收集资源的路径,并将这些资源的调用路径,以树状结构存储起来,用于根据调用路径来限流降级;
  • ClusterBuilderSlot 则用于存储资源的统计信息以及调用者信息,例如该资源的 RT, QPS,thread count 等等,这些信息将用作为多维度限流,降级的依据;
  • StatisticSlot 则用于记录、统计不同纬度的 runtime 指标监控信息;
  • FlowSlot 则用于根据预设的限流规则以及前面 slot 统计的状态,来进行流量控制;
  • AuthoritySlot 则根据配置的黑白名单和调用来源信息,来做黑白名单控制;
  • DegradeSlot 则通过统计信息以及预设的规则,来做熔断降级;
  • SystemSlot 则通过系统的状态,例如 load1 等,来控制总的入口流量;
  • LogSlot 在出现限流、熔断、系统保护时负责记录日志
Sentinel 将 ProcessorSlot 作为 SPI 接口进行扩展(1.7.2 版本以前 SlotChainBuilder 作为SPI),使得 Slot Chain 具备了扩展的能力。您可以自行加入自定义的 slot 并编排 slot 间的顺序,从而可以给 Sentinel 添加自定义的功能。

Spring Cloud 集成Sentinel的原理

Spring Cloud 中集成Sentinel限流,是基于过滤器来实现,具体的实现路径如下。
  • SentinelWebAutoConfifiguration
    • addInterceptors
      • SentinelWebInterceptor->AbstractSentinelInterceptor

自动装配类

    public void addInterceptors(InterceptorRegistry registry) {
        if (this.sentinelWebInterceptorOptional.isPresent()) {
            Filter filterConfig = this.properties.getFilter();
            registry.addInterceptor((HandlerInterceptor)this.sentinelWebInterceptorOptional.get()).order(filterConfig.getOrder()).addPathPatterns(filterConfig.getUrlPatterns());
            log.info("[Sentinel Starter] register SentinelWebInterceptor with urlPatterns: {}.", filterConfig.getUrlPatterns());
        }
    }

进入针对SentinelWebAutoConfiguration的拦截器中,里面会添加一个新的拦截器,他是针对SentinelWebInterceptor的拦截,里面有一个getResourceName方法,这是得到资源名称的类里面没找到什么核心代码

    protected String getResourceName(HttpServletRequest request) {
        Object resourceNameObject = request.getAttribute(HandlerMapping.BEST_MATCHING_PATTERN_ATTRIBUTE);
        if (resourceNameObject != null && resourceNameObject instanceof String) {
            String resourceName = (String)resourceNameObject;
            UrlCleaner urlCleaner = this.config.getUrlCleaner();
            if (urlCleaner != null) {
                resourceName = urlCleaner.clean(resourceName);
            }

            if (StringUtil.isNotEmpty(resourceName) && this.config.isHttpMethodSpecify()) {
                resourceName = request.getMethod().toUpperCase() + ":" + resourceName;
            }

            return resourceName;
        } else {
            return null;
        }
    }

看下父类,发现通过ContextUtil.enter(contextName, origin);设置了一个请求来源的信息,然后通过SphU.entry去调用

    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        try {
            String resourceName = this.getResourceName(request);
            if (StringUtil.isEmpty(resourceName)) {
                return true;
            } else if (this.increaseReferece(request, this.baseWebMvcConfig.getRequestRefName(), 1) != 1) {
                return true;
            } else {
                String origin = this.parseOrigin(request);
                String contextName = this.getContextName(request);
                ContextUtil.enter(contextName, origin);
                Entry entry = SphU.entry(resourceName, 1, EntryType.IN);
                request.setAttribute(this.baseWebMvcConfig.getRequestAttributeName(), entry);
                return true;
            }
        } catch (BlockException var12) {
            BlockException e = var12;

            try {
                this.handleBlockException(request, response, e);
            } finally {
                ContextUtil.exit();
            }

            return false;
        }
    }

Dubbo集成Sentinel的原理

在集成Dubb的案例中,引入了一个适配的包,这个包会提供集成Sentinel的功能。
 
<dependency>
<groupId>com.alibaba.csp</groupId>
<artifactId>sentinel-apache-dubbo-adapter</artifactId>
</dependency>
分别提供了两个类
  • SentinelDubboConsumerFilter
  • SentinelDubboProviderFilter
分别提供针对服务消费端的过滤器,和服务提供端的过滤器。
最终,当触发限流或者降级时,可以实现一个全局的DubboFallback回调,分别为服务提供端、服务消费端提供fallback的实现。然后需要调用 DubboFallbackRegistry 的 setConsumerFallback 和 setProviderFallback 方法分别注册消费端,服务端相关的监听器。通常只需要在启动应用的时候,将其进行注册即可。

SphU.entry

不管是集成dubbo也好,还是集成到spring cloud中也好,最终都是调用SphU.entry这个方法来进行限流判断的,接下来我们从SphU.entry这个方法中去了解它的实现原理。
  • ResourceWrapper 表示sentinel的资源,做了封装
  • count表示本次请求的占用的并发数量,默认是1
  • prioritized,优先级
 
   private Entry entryWithPriority(ResourceWrapper resourceWrapper, int count, boolean prioritized, Object... args)
        throws BlockException {
        //获取上下文环境,存储在ThreadLocal中,context中会存储整个调用链
        Context context = ContextUtil.getContext();
        if (context instanceof NullContext) {//上下文context已经超过阈值,不进行规则检查, 只初始化CtEntry。
            // The {@link NullContext} indicates that the amount of context has exceeded the threshold,
            // so here init the entry only. No rule checking will be done.
            return new CtEntry(resourceWrapper, null, context);
        }

        if (context == null) {//使用默认context
            // Using default context.
            context = InternalContextUtil.internalEnter(Constants.CONTEXT_DEFAULT_NAME);
        }

        // Global switch is close, no rule checking will do.
        if (!Constants.ON) {//全局限流开关是否已经开启,如果关闭了,就不进行限流规则检查
            return new CtEntry(resourceWrapper, null, context);
        }
        //构建一个slot链表
        ProcessorSlot<Object> chain = lookProcessChain(resourceWrapper);

        /*
         * Means amount of resources (slot chain) exceeds {@link Constants.MAX_SLOT_CHAIN_SIZE},
         * so no rule checking will be done.
         */
        //在生成chain的里面有个判断,如果chainMap.size大于一个值就返回null,也不进行规则检测
        if (chain == null) {
            return new CtEntry(resourceWrapper, null, context);
        }
        //下面这里才真正开始,生成个entry
        Entry e = new CtEntry(resourceWrapper, chain, context);
        try {
            //开始检测限流规则
            chain.entry(context, resourceWrapper, null, count, prioritized, args);
        } catch (BlockException e1) {
            e.exit(count, args);//被限流,抛出异常。
            throw e1;
        } catch (Throwable e1) {
            // This should not happen, unless there are errors existing in Sentinel internal.
            RecordLog.info("Sentinel unexpected exception", e1);
        }
        return e;//返回正常的结果
    }

lookProcessChain

构建一个slot链,链路的组成为DefaultProcessorSlotChain -> NodeSelectorSlot -> ClusterBuilderSlot -> LogSlot ->StatisticSlot -> AuthoritySlot -> SystemSlot -> ParamFlowSlot -> FlowSlot -> DegradeSlot
 
    ProcessorSlot<Object> lookProcessChain(ResourceWrapper resourceWrapper) {
        ProcessorSlotChain chain = chainMap.get(resourceWrapper);
        if (chain == null) {
            synchronized (LOCK) {
                chain = chainMap.get(resourceWrapper);
                if (chain == null) {
                    //chainMap大小大于一个值,也就是entry数量大小限制了,一个chain对应一个 entry
                    // Entry size limit.
                    if (chainMap.size() >= Constants.MAX_SLOT_CHAIN_SIZE) {
                        return null;
                    }
                    //构建一个slot chain
                    chain = SlotChainProvider.newSlotChain();
                    //这里是逻辑是,新建一个Map大小是oldMap+1
                    Map<ResourceWrapper, ProcessorSlotChain> newMap = new HashMap<ResourceWrapper, ProcessorSlotChain>(
                        chainMap.size() + 1);
                    //然后先整体放入oldMap,再放新建的chain
                    newMap.putAll(chainMap);
                    newMap.put(resourceWrapper, chain);//添加到newMap, 这里应该是考虑 避免频繁扩容
                    chainMap = newMap;
                }
            }
        }
        return chain;
    }

 

整体流程图

整体的执行流程如下。
  • NodeSelectorSlot:主要用于构建调用链。
  • ClusterBuilderSlot:用于集群限流、熔断。
  • LogSlot:用于记录日志。
  • StatisticSlot:用于实时收集实时消息。
  • AuthoritySlot:用于权限校验的。
  • SystemSlot:用于验证系统级别的规则。
  • FlowSlot:实现限流机制。
  • DegradeSlot:实现熔断机制。

NodeSelectorSlot

回退代码进入开始检测限流规则这个类主要用于构建调用链,这个需要讲解一下,在后续过程中会比较关键,代码如下。

@Override
public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
throws Throwable {
/*
* It's interesting that we use context name rather resource name as the map key.
*
* Remember that same resource({@link ResourceWrapper#equals(Object)}) will share
* the same {@link ProcessorSlotChain} globally, no matter in which context. So if
* code goes into {@link #entry(Context, ResourceWrapper, DefaultNode, int, Object...)},
* the resource name must be same but context name may not.
*
* If we use {@link com.alibaba.csp.sentinel.SphU#entry(String resource)} to
* enter same resource in different context, using context name as map key can
* distinguish the same resource. In this case, multiple {@link DefaultNode}s will be created
* of the same resource name, for every distinct context (different context name) each.
*
* Consider another question. One resource may have multiple {@link DefaultNode},
* so what is the fastest way to get total statistics of the same resource?
* The answer is all {@link DefaultNode}s with same resource name share one
* {@link ClusterNode}. See {@link ClusterBuilderSlot} for detail.
*/
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
if (node == null) {
node = new DefaultNode(resourceWrapper, null);
HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
cacheMap.putAll(map);
cacheMap.put(context.getName(), node);
map = cacheMap;
// Build invocation tree
((DefaultNode) context.getLastNode()).addChild(node);
}

}
}

context.setCurNode(node);
//释放entry,可以跟进
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
这里有几个对象要单独说明:
  • context:表示上下文,一个线程对应一个context,其中包含一些属性如下
    • name:名字
    • entranceNode:调用链入口
    • curEntry:当前entry
    • origin:调用者来源
    • async:异步
  • Node: 表示一个节点,这个节点会保存某个资源的各个实时统计数据,通过访问某个节点,就可以获得对应资源的实时状态,根据这个信息来进行限流和降级,它有几种节点类型
    • StatisticNode:统计节点
    • DefaultNode:默认节点,NodeSelectorSlot中创建的就是这个节点
    • ClusterNode:集群节点
    • EntranceNode:该节点表示一棵调用链树的入口节点,通过他可以获取调用链树中所有的子节点
    • NodeSelectorSlot运行结束之后,context的初始结构如下图所示,其中两个Node指向的是同一个对象
    @Override
    public void fireEntry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
        throws Throwable {
        if (next != null) {
            next.transformEntry(context, resourceWrapper, obj, count, prioritized, args);
        }
    }
    @SuppressWarnings("unchecked")
    void transformEntry(Context context, ResourceWrapper resourceWrapper, Object o, int count, boolean prioritized, Object... args)
        throws Throwable {
        T t = (T)o;
        entry(context, resourceWrapper, t, count, prioritized, args);
    }

StatisticSlot

在整个slot链路中,比较重要的,就是流量数据统计以及流量规则检测这两个slot,我们先来分析一下StatisticSlot这个对象。StatisticSlot 是 Sentinel 的核心功能插槽之一,用于统计实时的调用数据。
  • clusterNode :资源唯一标识的 ClusterNode 的 runtime 统计
  • origin :根据来自不同调用者的统计信息
  • defaultnode : 根据上下文条目名称和资源 ID 的 runtime 统计
  • 入口的统计

 

 @Override
    public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
                      boolean prioritized, Object... args) throws Throwable {
        try {
            // 先交由后续的限流&降级等processorSlot处理,然后根据处理结果进行统计
            // Sentinel责任链的精华(不使用 for 循环遍历调用 ProcessorSlot 的原因)
            fireEntry(context, resourceWrapper, node, count, prioritized, args);

            // Request passed, add thread count and pass count.
            node.increaseThreadNum();//当前节点的请求线程数加1
            node.addPassRequest(count);//滑动窗口的实现

            //针对不同类型的node记录线程数量和请求通过数量的统计。
            if (context.getCurEntry().getOriginNode() != null) {
                // Add count for origin node.
                context.getCurEntry().getOriginNode().increaseThreadNum();
                context.getCurEntry().getOriginNode().addPassRequest(count);
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseThreadNum();
                Constants.ENTRY_NODE.addPassRequest(count);
            }
             //可调用 StatisticSlotCallbackRegistry#addEntryCallback 静态方法注册 ProcessorSlotEntryCallback
            // Handle pass event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
                handler.onPass(context, resourceWrapper, node, count, args);
            }
            //优先级等待异常,这个在FlowRule中会有涉及到。
        } catch (PriorityWaitException ex) {
            node.increaseThreadNum();
            if (context.getCurEntry().getOriginNode() != null) {
                // Add count for origin node.
                context.getCurEntry().getOriginNode().increaseThreadNum();
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseThreadNum();
            }
            // Handle pass event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
                handler.onPass(context, resourceWrapper, node, count, args);
            }
        } catch (BlockException e) {
            // Blocked, set block exception to current entry.
            context.getCurEntry().setBlockError(e);//设置限流异常到当前entry中
            //根据不同Node类型增加阻塞限流的次数
            // Add block count.
            node.increaseBlockQps(count);//增加被限流的数量
            if (context.getCurEntry().getOriginNode() != null) {
                context.getCurEntry().getOriginNode().increaseBlockQps(count);
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseBlockQps(count);
            }

            // Handle block event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
                handler.onBlocked(e, context, resourceWrapper, node, count, args);
            }

            throw e;
        } catch (Throwable e) {
            // Unexpected internal error, set error to current entry.
            context.getCurEntry().setError(e);

            throw e;
        }
    }
仔细观察这段代码可以看到,StatisticSlot会分别对线程数和QPS进行递增,这个递增操作会涉及到不同纬度的请求数量的统计。

DefaultNode.addPassRequest

    @Override
    public void addPassRequest(int count) {
        // 调用父类(StatisticNode)来进行统计
        super.addPassRequest(count);
        // 根据clusterNode 汇总统计(背后也是调用父类StatisticNode)
        this.clusterNode.addPassRequest(count);
    }

StatisticNode.addPassRequest

分别调用两个时间窗口来递增请求数量。内部实际调用的是ArrayMetric来进行请求数量的统计
    //按照秒来统计,分成两个窗口,每个窗口500ms,用来统计QPS
    private transient volatile Metric rollingCounterInSecond = new ArrayMetric(SampleCountProperty.SAMPLE_COUNT,
        IntervalProperty.INTERVAL);

    /**
     * Holds statistics of the recent 60 seconds. The windowLengthInMs is deliberately set to 1000 milliseconds,
     * meaning each bucket per second, in this way we can get accurate statistics of each second.
     */
    //按照分钟统计,生成60个窗口,每个窗口1000ms
    private transient Metric rollingCounterInMinute = new ArrayMetric(60, 60 * 1000, false);
    @Override
    public void addPassRequest(int count) {
        rollingCounterInSecond.addPass(count);
        rollingCounterInMinute.addPass(count);
    }
这里采用的是滑动窗口的方式来记录请求的次数。

滑动窗口有关的核心类图

整个类的关系图实际上是比较清晰的,ArrayMetric实际上是一个包装类,内部通过LeapArray来实现具体的统计逻辑,而LeapArray中维护了多个WindowWrap(滑动窗口),而WindowWrap中采用了MetricBucket来进行指标数据的统计。
  • Metric: 指标收集的接口,定义滑动窗口中成功数量、异常数量、阻塞数量、TPS、响应时间等数据
  • ArrayMetric 滑动窗口核心实现类
  • LeapArray
  • WindowWrap 每一个滑动窗口的包装类,内部的数据结构采用MetricBucket
  • MetricBucket, 表示指标桶,包含阻塞数量、异常数量、成功数、响应时间等
  • MetricEvent 指标类型,通过数、阻塞数、异常数、成功数等

ArrayMetric.addPass

继续沿着代码往下看,进入到ArrayMetric.addPass方法。
  • 从LeapArray中根据得到当前时间点对应的窗口
  • 调用MetricBucket中的addPass方法,增加当前窗口中的统计次数
    @Override
    public void addPass(int count) {
        WindowWrap<MetricBucket> wrap = data.currentWindow();
        wrap.value().addPass(count);
    }
其中,data对象的实例是,sampleCount=2 , intervalInMs=1000ms。这两个参数表示,滑动窗口的大小是2个,每一个滑动窗口的时间单位是500ms
    public ArrayMetric(int sampleCount, int intervalInMs) {
        this.data = new OccupiableBucketLeapArray(sampleCount, intervalInMs);
    }
OccupiableBucketLeapArray: 实现的思想是当前抽样统计中的“令牌”已耗尽,即达到用户设定的相关指标的阔值后,可以向下一个时间窗口,即借用未来一个采样区间。这块在后续会涉及到,暂时先不讲。

滑动窗口初始化之后的形态

在以秒为单位的时间窗口中,会初始化两个长度的数组: AtomicReferenceArray<WindowWrap<T>>array ,这个数组表示滑动窗口的大小。其中,每个窗口会占用500ms的时间。

LeapArray.currentWindow

根据当前时间,从滑动窗口中获得指定的WindowWrap对象,然后更新对应的各项指标。由于后续做限流规则判断的时候使用。
    public WindowWrap<T> currentWindow(long timeMillis) {
        if (timeMillis < 0) {
            return null;
        }
        //计算当前时间在滑动窗口中的索引,计算方式比较简单,当前时间除以单个时间窗口的时间长度,
        // 再 从整个时间窗口长度进行取模
        int idx = calculateTimeIdx(timeMillis);
        // Calculate current bucket start time.
        // 计算当前时间在时间窗口中的开始时间
        long windowStart = calculateWindowStart(timeMillis);

        /*
         * Get bucket item at given time from the array.
         *
         * (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
         * (2) Bucket is up-to-date, then just return the bucket.
         * (3) Bucket is deprecated, then reset current bucket and clean all deprecated buckets.
         */
        while (true) {
            WindowWrap<T> old = array.get(idx);//通过索引找到制定的窗口
            if (old == null) {//如果为空,说明此处还未初始化
                /*
                 *     B0       B1      B2    NULL      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            bucket is empty, so create new and update
                 *
                 * If the old bucket is absent, then we create a new bucket at {@code windowStart},
                 * then try to update circular array via a CAS operation. Only one thread can
                 * succeed to update, while other threads yield its time slice.
                 */
                //构建一个新的windowWrap对象
                WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
                if (array.compareAndSet(idx, null, window)) {//通过cas操作替换到array窗 口数组中。
                    // Successfully updated, return the created bucket.
                    return window;//更新成功,直接返回.
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart == old.windowStart()) {//如果windowStart得到的窗口就 是当前索引位置的窗口,则直接把该位置的窗口返回
                /*
                 *     B0       B1      B2     B3      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            startTime of Bucket 3: 800, so it's up-to-date
                 *
                 * If current {@code windowStart} is equal to the start timestamp of old bucket,
                 * that means the time is within the bucket, so directly return the bucket.
                 */
                return old;
            } else if (windowStart > old.windowStart()) { //如果大于,则表示应该在下一个 滑动窗口中
                /*
                 *   (old)
                 *             B0       B1      B2    NULL      B4
                 * |_______||_______|_______|_______|_______|_______||___
                 * ...    1200     1400    1600    1800    2000    2200  timestamp
                 *                              ^
                 *                           time=1676
                 *          startTime of Bucket 2: 400, deprecated, should be reset
                 *
                 * If the start timestamp of old bucket is behind provided time, that means
                 * the bucket is deprecated. We have to reset the bucket to current {@code windowStart}.
                 * Note that the reset and clean-up operations are hard to be atomic,
                 * so we need a update lock to guarantee the correctness of bucket update.
                 *
                 * The update lock is conditional (tiny scope) and will take effect only when
                 * bucket is deprecated, so in most cases it won't lead to performance loss.
                 */
                if (updateLock.tryLock()) {//加锁,并重置MetricBucket
                    try {
                        // Successfully get the update lock, now we reset the bucket.
                        return resetWindowTo(old, windowStart);
                    } finally {
                        updateLock.unlock();
                    }
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart < old.windowStart()) { //异常情况
                // Should not go through here, as the provided time is already behind.
                return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
            }
        }
    }
 
 回退代码
    @Override
    public void addPass(int count) {
        WindowWrap<MetricBucket> wrap = data.currentWindow();
        wrap.value().addPass(count);
    }
    public void addPass(int n) {
        add(MetricEvent.PASS, n);
    }

MetricBucket.add

    public long get(MetricEvent event) {
        return counters[event.ordinal()].sum();
    }
    //counters的数组长度是6,其中event.ordinal表示MetricEvent,
    // 那么这里记录的就是,根据 MetricEvent统计指定Event的次数。
    public MetricBucket add(MetricEvent event, long n) {
        counters[event.ordinal()].add(n);
        return this;
    }

LongAdder.add

这段代码就是ConcurrentHashMap里面,用来记录请求数量的操作,也就是采用分段锁的方式来做计数处理,从而提升整体的性能。
    public void add(long x) {
        Cell[] as;
        long b, v;
        HashCode hc;
        Cell a;
        int n;
        if ((as = cells) != null || !casBase(b = base, b + x)) {
            boolean uncontended = true;
            int h = (hc = threadHashCode.get()).code;
            if (as == null || (n = as.length) < 1 ||
                (a = as[(n - 1) & h]) == null ||
                !(uncontended = a.cas(v = a.value, v + x))) { retryUpdate(x, hc, uncontended); }
        }
    }

FlowSlot

这个 slot 主要根据预设的资源的统计信息,按照固定的次序,依次生效。如果一个资源对应两条或者多条流控规则,则会根据如下次序依次检验,直到全部通过或者有一个规则生效为止:
  • 指定应用生效的规则,即针对调用方限流的;
  • 调用方为 other 的规则;
  • 调用方为 default 的规则。
    @Override
    public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
                      boolean prioritized, Object... args) throws Throwable {
        checkFlow(resourceWrapper, context, node, count, prioritized);

        fireEntry(context, resourceWrapper, node, count, prioritized, args);
    }

checkFlow

进入到FlowRuleChecker.checkFlow方法中。
  • 根据资源名称,找到限流规则列表
  • 如果限流规则不为空,则遍历规则,调用canPassCheck方法进行校验。
    public void checkFlow(Function<String, Collection<FlowRule>> ruleProvider, ResourceWrapper resource,
                          Context context, DefaultNode node, int count, boolean prioritized) throws BlockException {
        if (ruleProvider == null || resource == null) {
            return;
        }
        Collection<FlowRule> rules = ruleProvider.apply(resource.getName());
        if (rules != null) {
            for (FlowRule rule : rules) {
                if (!canPassCheck(rule, context, node, count, prioritized)) {
                    throw new FlowException(rule.getLimitApp(), rule);
                }
            }
        }
    }

canPassCheck

判断是否是集群限流模式,如果是,则走passClusterCheck,否则,调用passLocalCheck方法。
    public boolean canPassCheck(/*@NonNull*/ FlowRule rule, Context context, DefaultNode node, int acquireCount,
                                                    boolean prioritized) {
        String limitApp = rule.getLimitApp();
        if (limitApp == null) {
            return true;
        }

        if (rule.isClusterMode()) {
            return passClusterCheck(rule, context, node, acquireCount, prioritized);
        }

        return passLocalCheck(rule, context, node, acquireCount, prioritized);
    }

passLocalCheck

  • selectNodeByRequesterAndStrategy,根据请求和策略来获得Node
  • rule.getRater(), 根据不同的限流控制行为来,调用canPass进行校验。
    private static boolean passLocalCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount,
                                          boolean prioritized) {
        Node selectedNode = selectNodeByRequesterAndStrategy(rule, context, node);
        if (selectedNode == null) {
            return true;
        }

        return rule.getRater().canPass(selectedNode, acquireCount, prioritized);
    }

DefaultController.canPass

通过默认的限流行为(直接拒绝),进行限流判断。
   public boolean canPass(Node node, int acquireCount, boolean prioritized) {
        //先根据node获取资源当前的使用数量,这里会根据qps或者并发数策略来获得相关的值
        int curCount = avgUsedTokens(node);
        //当前已使用的请求数加上本次请求的数量是否大于阈值
        if (curCount + acquireCount > count) {
            //如果为true,说明应该被限流
            // 如果此请求是一个高优先级请求,并且限流类型为qps,则不会立即失败,而是去占用未来的时
            // 间窗口,等到下一个时间窗口通过请求。
            if (prioritized && grade == RuleConstant.FLOW_GRADE_QPS) {
                long currentTime;
                long waitInMs;
                currentTime = TimeUtil.currentTimeMillis();
                waitInMs = node.tryOccupyNext(currentTime, acquireCount, count);
                if (waitInMs < OccupyTimeoutProperty.getOccupyTimeout()) {
                    node.addWaitingRequest(currentTime + waitInMs, acquireCount);
                    node.addOccupiedPass(acquireCount);
                    sleep(waitInMs);

                    // PriorityWaitException indicates that the request will pass after waiting for {@link @waitInMs}.
                    throw new PriorityWaitException(waitInMs);
                }
            }
            return false;
        }
        return true;
    }
一旦被拒绝,则抛出 FlowException 异常

PriorityWait

在DefaultController.canPass中,调用如下代码去借后续的窗口
node.addWaitingRequest(currentTime + waitInMs, acquireCount);
node.addOccupiedPass(acquireCount);
addWaitingRequest -> ArrayMetric.addWaiting->OccupiableBucketLeapArray.addWaiting
borrowArray,它是一个FutureBucketLeapArray对象,这里定义的是未来的时间窗口,然后获得未来时间的窗口去增加计数
    @Override
    public void addWaiting(long time, int acquireCount) {
        WindowWrap<MetricBucket> window = borrowArray.currentWindow(time);
        window.value().add(MetricEvent.PASS, acquireCount);
    }
最终,在StatisticSlot.entry中,捕获异常如果存在优先级比较高的任务,并且当前的请求已经达到阈值,抛出这个异常,实际上是去占用未来的一个时间窗口去进行计数,抛出这个异常之后,会进入到StatisticSlot中进行捕获。然后直接通过
   @Override
    public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
                      boolean prioritized, Object... args) throws Throwable {
        try {
            // 先交由后续的限流&降级等processorSlot处理,然后根据处理结果进行统计
            // Sentinel责任链的精华(不使用 for 循环遍历调用 ProcessorSlot 的原因)
            fireEntry(context, resourceWrapper, node, count, prioritized, args);

            // Request passed, add thread count and pass count.
            node.increaseThreadNum();//当前节点的请求线程数加1
            node.addPassRequest(count);//滑动窗口的实现

            //针对不同类型的node记录线程数量和请求通过数量的统计。
            if (context.getCurEntry().getOriginNode() != null) {
                // Add count for origin node.
                context.getCurEntry().getOriginNode().increaseThreadNum();
                context.getCurEntry().getOriginNode().addPassRequest(count);
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseThreadNum();
                Constants.ENTRY_NODE.addPassRequest(count);
            }
             //可调用 StatisticSlotCallbackRegistry#addEntryCallback 静态方法注册 ProcessorSlotEntryCallback
            // Handle pass event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
                handler.onPass(context, resourceWrapper, node, count, args);
            }
            //优先级等待异常,这个在FlowRule中会有涉及到。
        } catch (PriorityWaitException ex) {
            node.increaseThreadNum();
            if (context.getCurEntry().getOriginNode() != null) {
                // Add count for origin node.
                context.getCurEntry().getOriginNode().increaseThreadNum();
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseThreadNum();
            }
            // Handle pass event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
                handler.onPass(context, resourceWrapper, node, count, args);
            }
        } catch (BlockException e) {
            // Blocked, set block exception to current entry.
            context.getCurEntry().setBlockError(e);//设置限流异常到当前entry中
            //根据不同Node类型增加阻塞限流的次数
            // Add block count.
            node.increaseBlockQps(count);//增加被限流的数量
            if (context.getCurEntry().getOriginNode() != null) {
                context.getCurEntry().getOriginNode().increaseBlockQps(count);
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseBlockQps(count);
            }

            // Handle block event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
                handler.onBlocked(e, context, resourceWrapper, node, count, args);
            }

            throw e;
        } catch (Throwable e) {
            // Unexpected internal error, set error to current entry.
            context.getCurEntry().setError(e);

            throw e;
        }
    }

集群限流原理

集群限流,每次会发起一个请求到token-server端进行限流判断。

FlowRuleChecker.passClusterCheck

    private static boolean passClusterCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount,
                                            boolean prioritized) {
        try {
            TokenService clusterService = pickClusterService();
            if (clusterService == null) {
                return fallbackToLocalOrPass(rule, context, node, acquireCount, prioritized);
            }
            long flowId = rule.getClusterConfig().getFlowId();
            TokenResult result = clusterService.requestToken(flowId, acquireCount, prioritized);
            return applyTokenResult(result, rule, context, node, acquireCount, prioritized);
            // If client is absent, then fallback to local mode.
        } catch (Throwable ex) {
            RecordLog.warn("[FlowRuleChecker] Request cluster token unexpected failed", ex);
        }
        // Fallback to local flow control when token client or server for this rule is not available.
        // If fallback is not enabled, then directly pass.
        return fallbackToLocalOrPass(rule, context, node, acquireCount, prioritized);
    }

 

 
 
 
 
posted @ 2022-01-04 22:37  童话述说我的结局  阅读(260)  评论(0编辑  收藏  举报