Loading

56-Sentinel-Algorithm

1. 算法概述

对于滑动时间窗算法的源码解析分为两部分:〈对数据的统计〉与〈对统计数据的使用〉。不过,在分析源码之前,需要先理解该算法原理。

1.1 时间窗限流算法

该算法原理是,系统会自动选定一个时间窗口的起始零点,然后按照固定长度将时间轴划分为若干定长的时间窗口。所以该算法也称为“固定时间窗算法”。

当请求到达时,系统会查看该请求到达的时间点所在的时间窗口当前统计的数据是否超出了预先设定好的阈值。未超出,则请求通过,否则被限流。

该算法存在这样的问题:连续两个时间窗口中的统计数据都没有超出阈值,但在跨窗口的时间窗长度范围内的统计数据却超出了阈值。

1.2 滑动时间窗限流算法

滑动时间窗限流算法解决了固定时间窗限流算法的问题。其没有划分固定的时间窗起点与终点,而是将每一次请求的到来时间点作为统计时间窗的终点,起点则是终点向前推时间窗长度的时间点。这种时间窗称为“滑动时间窗”。

存在的问题:

针对以上问题,系统采用了一种“折中”的改进措施:将整个时间轴拆分为若干“样本窗口”,样本窗口的长度是小于滑动时间窗口长度的。当等于滑动时间窗口长度时,就变为了“固定时间窗口算法”。 一般时间窗口长度会是样本窗口长度的整数倍。

那么是如何判断一个请求是否能够通过呢?当到达样本窗口终点时间时,每个样本窗口会统计一次本样本窗口中的流量数据并记录下来。当一个请求到达时,会统计出当前请求时间点所在样本窗口中的流量数据,然后再获取到当前请求时间点所在时间窗中其它样本窗口的统计数据,求和后,如果没有超出阈值,则通过,否则被限流。

// 再看下面内容之前,再去重新回顾下 Node 之间的关系(2.c)。

2. StatisticNode*

/**
 * The statistic node keep three kinds of real-time statistics metrics:
 *
 *   - metrics in second level ({@code rollingCounterInSecond})
 *   - metrics in minute level ({@code rollingCounterInMinute})
 *   - thread count
 *
 * Sentinel use sliding window to record and count the resource statistics in real-time.
 * The sliding window infrastructure behind the {@link ArrayMetric} is {@code LeapArray}.
 *
 *
 * case 1: When the first request comes in, Sentinel will create a new window bucket of
 * a specified time-span to store running statics, such as total response time(rt),
 * incoming request(QPS), block request(bq), etc. And the time-span is defined by sample count.
 *
 *
 * 	0      100ms
 *  +-------+--→ Sliding Windows
 * 	    ^
 * 	    |
 * 	  request
 *
 *
 * Sentinel use the statics of the valid buckets to decide whether this request can be passed.
 * For example, if a rule defines that only 100 requests can be passed,
 * it will sum all qps in valid buckets, and compare it to the threshold defined in rule.
 *
 *
 * case 2: continuous requests
 *
 *  0    100ms    200ms    300ms
 *  +-------+-------+-------+-----→ Sliding Windows
 *                      ^
 *                      |
 *                   request
 *
 *
 * case 3: requests keeps coming, and previous buckets become invalid
 *
 *  0    100ms    200ms	  800ms	   900ms  1000ms    1300ms
 *  +-------+-------+ ...... +-------+-------+ ...... +-------+-----→ Sliding Windows
 *                                                      ^
 *                                                      |
 *                                                    request
 *
 *
 * The sliding window should become:
 *
 * 300ms     800ms  900ms  1000ms  1300ms
 *  + ...... +-------+ ...... +-------+-----→ Sliding Windows
 *                                                      ^
 *                                                      |
 *                                                    request
 *
 */
public class StatisticNode implements Node {

  /**
   * =====> 定义了一个使用数组保存数据的计量器 (ArrayMetric)!
   *
   * Holds statistics of the recent {@code INTERVAL} seconds. The {@code INTERVAL}
   * is divided into time spans by given {@code sampleCount}.
   *
   * -> SAMPLE_COUNT   样本窗口数量,默认值 2
   * -> INTERVAL       时间窗长度,默认值 1000ms
   */
  private transient volatile Metric rollingCounterInSecond =
      new ArrayMetric(SampleCountProperty.SAMPLE_COUNT, IntervalProperty.INTERVAL);

  /**
   * Holds statistics of the recent 60 seconds. The windowLengthInMs is deliberately
   * set to 1000 milliseconds, meaning each bucket per second, in this way
   * we can get accurate statistics of each second.
   */
  private transient Metric rollingCounterInMinute = new ArrayMetric(60, 60 * 1000, false);

  /**
   * The counter for thread count.
   */
  private LongAdder curThreadNum = new LongAdder();

  /**
   * The last timestamp when metrics were fetched.
   */
  private long lastFetchTime = -1;

  @Override
  public void addPassRequest(int count) {
    // =====> [b/c] 为滑动计数器增加本次访问的数据
    rollingCounterInSecond.addPass(count);
    rollingCounterInMinute.addPass(count);
  }


}

2.1 ArrayMetric

/**
 * 这是一个使用数组保存数据的计量器类
 *
 * The basic metric class in Sentinel using a {@link BucketLeapArray} internal.
 *
 */
public class ArrayMetric implements Metric {

  /**
   * 数据就保存在这个 LeapArray<MetricBucket> 中
   */
  private final LeapArray<MetricBucket> data;

  public ArrayMetric(int sampleCount, int intervalInMs) {
    this.data = new OccupiableBucketLeapArray(sampleCount, intervalInMs);
  }

  public ArrayMetric(int sampleCount, int intervalInMs, boolean enableOccupy) {
    if (enableOccupy) {
      this.data = new OccupiableBucketLeapArray(sampleCount, intervalInMs);
    } else {
      this.data = new BucketLeapArray(sampleCount, intervalInMs);
    }
  }

  // ...

}

2.2 BucketLeapArray

/**
 * The fundamental data structure for metric statistics in a time span.
 */
public class BucketLeapArray extends LeapArray<MetricBucket> {

  public BucketLeapArray(int sampleCount, int intervalInMs) {
    super(sampleCount, intervalInMs);
  }

  @Override
  public MetricBucket newEmptyBucket(long time) {
    return new MetricBucket();
  }

  @Override
  protected WindowWrap<MetricBucket> resetWindowTo(WindowWrap<MetricBucket> w, long startTime) {
    // Update the start time and reset value. 更新窗口起始时间
    w.resetTo(startTime);
    // 将多维度统计数据清零#2.3.b
    w.value().reset();
    return w;
  }
}

2.3 LeapArray

/**
 * Basic data structure for statistic metrics in Sentinel.
 *
 * Leap array use sliding window algorithm to count data. Each bucket cover
 * {@code windowLengthInMs} time span, and the total time span is
 * {@link #intervalInMs}, so the total bucket amount is:
 * {@code sampleCount = intervalInMs / windowLengthInMs}.
 *
 * @param <T> type of statistic data
 */
public abstract class LeapArray<T> {

  /**
   * 样本窗口长度
   */
  protected int windowLengthInMs;

  /**
   * 一个时间窗中包含的样本窗口数量
   */
  protected int sampleCount;

  /**
   * 时间窗长度(毫秒)
   */
  protected int intervalInMs;

  /**
   * 时间窗长度(秒)
   */
  private double intervalInSecond;

  /**
   * =====> 数组的元素为 WindowWrap 样本窗口,这里的泛型 T 实际为 MetricBucket 类型!
   */
  protected final AtomicReferenceArray<WindowWrap<T>> array;

  /**
   * The conditional (predicate) update lock is used only when current bucket is deprecated.
   */
  private final ReentrantLock updateLock = new ReentrantLock();

  /**
   * The total bucket count is: {@code sampleCount = intervalInMs / windowLengthInMs}.
   *
   * @param sampleCount  bucket count of the sliding window
   * @param intervalInMs the total time interval of this {@link LeapArray} in milliseconds
   */
  public LeapArray(int sampleCount, int intervalInMs) {
    AssertUtil.isTrue(sampleCount > 0, "bucket count is invalid: " + sampleCount);
    AssertUtil.isTrue(intervalInMs > 0, "total time interval of the sliding window should be positive");
    AssertUtil.isTrue(intervalInMs % sampleCount == 0, "time span needs to be evenly divided");

    this.windowLengthInMs = intervalInMs / sampleCount;
    this.intervalInMs = intervalInMs;
    this.intervalInSecond = intervalInMs / 1000.0;
    this.sampleCount = sampleCount;

    this.array = new AtomicReferenceArray<>(sampleCount);
  }

  // ...

}

a. WindowWrap

/**
 * 样本窗口实例,泛型 T 为 MetricBucket。
 *
 * Wrapper entity class for a period of time window.
 */
public class WindowWrap<T> {

  /**
   * Time length of a single window bucket in milliseconds. 样本窗口长度
   */
  private final long windowLengthInMs;

  /**
   * Start timestamp of the window in milliseconds. 样本窗口的起始时间戳
   */
  private long windowStart;

  /**
   * Statistic data. 当前样本窗口中的统计数据,其类型为 MetricBucket
   */
  private T value;

  /**
   * @param windowLengthInMs  a single window bucket's time length in milliseconds.
   * @param windowStart       the start timestamp of the window
   * @param value      		  statistic data
   */
  public WindowWrap(long windowLengthInMs, long windowStart, T value) {
    this.windowLengthInMs = windowLengthInMs;
    this.windowStart = windowStart;
    this.value = value;
  }

  public long windowLength() {
    return windowLengthInMs;
  }

  public long windowStart() {
    return windowStart;
  }

  public T value() {
    return value;
  }

  public void setValue(T value) {
    this.value = value;
  }

  /**
   * Reset start timestamp of current bucket to provided time.
   *
   * @param startTime valid start timestamp
   * @return bucket after reset
   */
  public WindowWrap<T> resetTo(long startTime) {
    this.windowStart = startTime;
    return this;
  }

  /**
   * Check whether given timestamp is in current bucket.
   *
   * @param timeMillis valid timestamp in ms
   * @return true if the given time is in current bucket, otherwise false
   * @since 1.5.0
   */
  public boolean isTimeInWindow(long timeMillis) {
    return windowStart <= timeMillis && timeMillis < windowStart + windowLengthInMs;
  }

  @Override
  public String toString() {
    return "WindowWrap{" +
      "windowLengthInMs=" + windowLengthInMs +
      ", windowStart=" + windowStart +
      ", value=" + value +
      '}';
  }
}

b. MetricBucket

/**
 * Represents metrics data in a period of time span. (多维度)统计数据的封装类
 */
public class MetricBucket {

  /**
   * 统计的数据存放在这里,这里要统计的数据是多维度的,这些维度类型在 MetricEvent 枚举中。
   */
  private final LongAdder[] counters;

  private volatile long minRt;

  public MetricBucket() {
    MetricEvent[] events = MetricEvent.values();
    this.counters = new LongAdder[events.length];
    for (MetricEvent event : events) {
      counters[event.ordinal()] = new LongAdder();
    }
    initMinRt();
  }

  public MetricBucket reset(MetricBucket bucket) {
    for (MetricEvent event : MetricEvent.values()) {
      counters[event.ordinal()].reset();
      counters[event.ordinal()].add(bucket.get(event));
    }
    initMinRt();
    return this;
  }

  private void initMinRt() {
    this.minRt = SentinelConfig.statisticMaxRt();
  }

  /**
   * Reset the adders. 将每个维度的统计数据清零
   *
   * @return new metric bucket in initial state
   */
  public MetricBucket reset() {
    for (MetricEvent event : MetricEvent.values()) {
      counters[event.ordinal()].reset();
    }
    initMinRt();
    return this;
  }

  public long get(MetricEvent event) {
    return counters[event.ordinal()].sum();
  }

  public MetricBucket add(MetricEvent event, long n) {
    counters[event.ordinal()].add(n);
    return this;
  }

  public long pass() {
    return get(MetricEvent.PASS);
  }

  public long occupiedPass() {
    return get(MetricEvent.OCCUPIED_PASS);
  }

  public long block() {
    return get(MetricEvent.BLOCK);
  }

  public long exception() {
    return get(MetricEvent.EXCEPTION);
  }

  public long rt() {
    return get(MetricEvent.RT);
  }

  public long minRt() {
    return minRt;
  }

  public long success() {
    return get(MetricEvent.SUCCESS);
  }

  public void addPass(int n) {
    // 向 pass 维度中增加统计数据
    add(MetricEvent.PASS, n);
  }

  public void addOccupiedPass(int n) {
    add(MetricEvent.OCCUPIED_PASS, n);
  }

  public void addException(int n) {
    add(MetricEvent.EXCEPTION, n);
  }

  public void addBlock(int n) {
    add(MetricEvent.BLOCK, n);
  }

  public void addSuccess(int n) {
    add(MetricEvent.SUCCESS, n);
  }

  public void addRT(long rt) {
    add(MetricEvent.RT, rt);

    // Not thread-safe, but it's okay.
    if (rt < minRt) {
      minRt = rt;
    }
  }

  @Override
  public String toString() {
    return "p: " + pass() + ", b: " + block() + ", w: " + occupiedPass();
  }
}

c. MetricEvent

/**
 * 数据统计的维度
 */
public enum MetricEvent {

    /**
     * Normal pass.
     */
    PASS,

    /**
     * Normal block.
     */
    BLOCK,

    EXCEPTION,

    SUCCESS,

    RT,

    /**
     * Passed in future quota (pre-occupied, since 1.5.0).
     */
    OCCUPIED_PASS
}

d. LongAdder

/**
 * One or more variables that together maintain an initially zero
 * {@code long} sum.  When updates (method {@link #add}) are contended
 * across threads, the set of variables may grow dynamically to reduce
 * contention. Method {@link #sum} (or, equivalently, {@link
 * #longValue}) returns the current total combined across the
 * variables maintaining the sum.
 *
 * <p>This class is usually preferable to {@link AtomicLong} when
 * multiple threads update a common sum that is used for purposes such
 * as collecting statistics, not for fine-grained synchronization
 * control.  Under low update contention, the two classes have similar
 * characteristics. But under high contention, expected throughput of
 * this class is significantly higher, at the expense of higher space
 * consumption.
 *
 * <p>LongAdders can be used with a {@link
 * java.util.concurrent.ConcurrentHashMap} to maintain a scalable
 * frequency map (a form of histogram or multiset). For example, to
 * add a count to a {@code ConcurrentHashMap<String,LongAdder> freqs},
 * initializing if not already present, you can use {@code
 * freqs.computeIfAbsent(k -> new LongAdder()).increment();}
 *
 * <p>This class extends {@link Number}, but does <em>not</em> define
 * methods such as {@code equals}, {@code hashCode} and {@code
 * compareTo} because instances are expected to be mutated, and so are
 * not useful as collection keys.
 *
 * @since 1.8
 * @author Doug Lea
 */
public class LongAdder extends Striped64 implements Serializable {
  private static final long serialVersionUID = 7249069246863182397L;

  // ...

}

该加法器类的结构:

3. 对数据做统计

起点是上一节 StatisticSlot(4.1.c)中的如下两个方法调用:

// ...

// =====> 增加线程数据
node.increaseThreadNum();
// =====> [1] 增加通过的请求数量
node.addPassRequest(count);

// ...

3.1 DefaultNode

/**
 * A Node used to hold statistics for specific resource name in the specific context.
 * Each distinct resource in each distinct Context will corresponding to a DefaultNode.
 *
 * This class may have a list of sub DefaultNodes. Child nodes will be created when
 * calling SphU#entry() or SphO#entry() multiple times in the same Context.
 */
public class DefaultNode extends StatisticNode {

  @Override
  public void increaseThreadNum() {
    super.increaseThreadNum();
    this.clusterNode.increaseThreadNum();
  }

  @Override
  public void addPassRequest(int count) {
    // 增加当前入口的 DefaultNode 中的统计数据
    // =====> [2] StatisticNode#addPassRequest <this:DefaultNode>
    super.addPassRequest(count);
    // 增加当前资源的 ClusterNode 中的全局统计数据
    // =====> StatisticNode#addPassRequest <this:ClusterNode>
    this.clusterNode.addPassRequest(count);
  }
}

3.2 StatisticNode

@Override
public void addPassRequest(int count) {
  // =====> [3] 为滑动计数器增加本次访问的数据
  rollingCounterInSecond.addPass(count);
  rollingCounterInMinute.addPass(count);
}

3.3 ArrayMetric

@Override
public void addPass(int count) {
  // =====> [4] 获取当前时间点所在的样本窗口
  WindowWrap<MetricBucket> wrap = data.currentWindow();
  // =====> [7] 将当前请求的计数量添加到当前样本窗口的统计数据中#2.3.b
  wrap.value().addPass(count);
}

3.4 LeapArray

/**
 * Get the bucket at current timestamp.
 *
 * @return the bucket at current timestamp
 */
public WindowWrap<T> currentWindow() {
  // =====> [5] 获取当前时间点所在的样本窗口
  return currentWindow(TimeUtil.currentTimeMillis());
}

// =====> [i] 计算出当前时间在哪个样本窗口
private int calculateTimeIdx(long timeMillis) {
  long timeId = timeMillis / windowLengthInMs;
  // Calculate current index so we can map the timestamp to the leap array.
  return (int)(timeId % array.length());
}

// =====> [ii] 计算当前样本窗口的开始时间点
protected long calculateWindowStart(long timeMillis) {
  return timeMillis - timeMillis % windowLengthInMs;
}

/**
 * =====> [6] Get bucket item at provided timestamp.
 *
 * @param  timeMillis a valid timestamp in milliseconds
 * @return current bucket item at provided timestamp if
 *         the time is valid; null if time is invalid.
 */
public WindowWrap<T> currentWindow(long timeMillis) {
  if (timeMillis < 0) {
    return null;
  }

  // =====> [i] 计算当前时间所在的样本窗口idx,即在计算数组 LeapArray 中的索引
  int idx = calculateTimeIdx(timeMillis);
  // =====> [ii] Calculate current bucket start time. 计算当前样本窗口的开始时间点
  long windowStart = calculateWindowStart(timeMillis);

  /*
   * Get bucket item at given time from the array.
   *
   * (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
   * (2) Bucket is up-to-date, then just return the bucket.
   * (3) Bucket is deprecated, then reset current bucket and clean all deprecated buckets.
   */
  while (true) {
    // 获取到当前时间所在的样本窗口
    WindowWrap<T> old = array.get(idx);
    // (1) 若当前时间所在样本窗口为 null,说明该样本窗口还不存在,则创建一个
    if (old == null) {
      /*
       *     B0       B1      B2    NULL      B4
       * ||_______|_______|_______|_______|_______||___
       * 200     400     600     800     1000    1200  timestamp
       *                             ^
       *                          time=888
       *            bucket is empty, so create new and update
       *
       * If the old bucket is absent, then we create a new bucket at {@code windowStart},
       * then try to update circular array via a CAS operation. Only one thread can
       * succeed to update, while other threads yield its time slice.
       */
      // 创建一个时间窗
      WindowWrap<T> window = new WindowWrap<>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
      // 通过 CAS 方式将新建窗口放入到 array
      if (array.compareAndSet(idx, null, window)) {
        // Successfully updated, return the created bucket.
        return window;
      } else {
        // Contention failed, the thread will yield its time slice to wait for bucket available.
        Thread.yield();
      }

    // (2) 若当前样本窗口的起始时间点与计算出的样本窗口起始时间点相同,则说明这两个是同一个样本窗口
    } else if (windowStart == old.windowStart()) {
      /*
       *     B0       B1      B2     B3      B4
       * ||_______|_______|_______|_______|_______||___
       * 200     400     600     800     1000    1200  timestamp
       *                             ^
       *                          time=888
       *            startTime of Bucket 3: 800, so it's up-to-date
       *
       * If current {@code windowStart} is equal to the start timestamp of old bucket,
       * that means the time is within the bucket, so directly return the bucket.
       */
      return old;

    // (3) 若当前样本窗口的起始时间点 < 计算出的样本窗口起始时间点,
    //     说明计算出的样本窗口已经过时了,需要将原来的样本窗口替换
    } else if (windowStart > old.windowStart()) {
      /*
       *   (old)
       *             B0       B1      B2    NULL      B4
       * |_______||_______|_______|_______|_______|_______||___
       * ...    1200     1400    1600    1800    2000    2200  timestamp
       *                              ^
       *                           time=1676
       *          startTime of Bucket 2: 400, deprecated, should be reset
       *
       * If the start timestamp of old bucket is behind provided time, that means
       * the bucket is deprecated. We have to reset the bucket to current {@code windowStart}.
       * Note that the reset and clean-up operations are hard to be atomic,
       * so we need a update lock to guarantee the correctness of bucket update.
       *
       * The update lock is conditional (tiny scope) and will take effect only when
       * bucket is deprecated, so in most cases it won't lead to performance loss.
       */
      if (updateLock.tryLock()) {
        try {
          // Successfully get the update lock, now we reset the bucket.
          // =====> 替换掉老的样本窗口#2.2
          return resetWindowTo(old, windowStart);
        } finally {
          updateLock.unlock();
        }
      } else {
        // Contention failed, the thread will yield its time slice to wait for bucket available.
        Thread.yield();
      }

    // (4) 当前样本窗口的起始时间点 > 计算出的样本窗口起始时间点,这种情况一般不会出现,除非人为修改了系统时钟。
    } else if (windowStart < old.windowStart()) {
      // Should not go through here, as the provided time is already behind.
      return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
    }
  }
}

3.5 流程图

4. 统计数据的使用

4.1 DefaultController

/**
 * 快速失败的流控效果中的通过性判断
 * @param node resource node
 * @param acquireCount count to acquire
 * @param prioritized whether the request is prioritized
 * @return
 */
@Override
public boolean canPass(Node node, int acquireCount, boolean prioritized) {
  // 获取当前时间窗中已经统计的数据
  int curCount = avgUsedTokens(node);

  // 若已经统计的数据与本次请求的数量之和 > 设置的阈值,则返回 false,表示没有通过检测
  // ...

}

private int avgUsedTokens(Node node) {
  // 若没有选择出 node,则说明没有做统计工作,直接返回 0
  if (node == null) {
    return DEFAULT_AVG_USED_TOKENS;
  }
  // 若阈值类型为线程数,则直接返回当前的线程数量;
  // 若阈值类型为 QPS,则返回统计的当前的 QPS
  return grade == RuleConstant.FLOW_GRADE_THREAD ? node.curThreadNum() : (int)(node.passQps());
}

4.2 StatisticNode

@Override
public double passQps() {
  // rollingCounterInSecond.pass()                    当前时间窗中统计的通过的请求数量
  // rollingCounterInSecond.getWindowIntervalInSec()  时间窗长度
  // 这两个数相除所计算出的就是 QPS
  return rollingCounterInSecond.pass() / rollingCounterInSecond.getWindowIntervalInSec();
}

4.3 ArrayMetric

@Override
public long pass() {
  // 更新 array 中当前时间点所在的样本窗口实例中的数据
  data.currentWindow();

  long pass = 0;
  // =====> 将当前时间窗口中的所有样本窗口统计的数据记录返回
  List<MetricBucket> list = data.values();

  // 将 list 中所有 pass 维度的统计数据取出并求和
  for (MetricBucket window : list) {
    pass += window.pass();
  }
  return pass;
}

4.4 LeapArray

/**
 * Get aggregated value list for entire sliding window.
 * The list will only contain value from "valid" buckets.
 *
 * @return aggregated value list for entire sliding window
 */
public List<T> values() {
  return values(TimeUtil.currentTimeMillis());
}

public List<T> values(long timeMillis) {
  if (timeMillis < 0) {
    return new ArrayList<T>();
  }
  int size = array.length();
  List<T> result = new ArrayList<T>(size);

  // 逐个遍历 array 中的每一个样本窗口实例
  for (int i = 0; i < size; i++) {
    WindowWrap<T> windowWrap = array.get(i);
    // 若当前遍历实例为空或已经过时,则继续下一个
    if (windowWrap == null || isWindowDeprecated(timeMillis, windowWrap)) {
      continue;
    }
    // 将当前遍历的样本窗口统计的数据记录到 result 中
    result.add(windowWrap.value());
  }
  return result;
}

public boolean isWindowDeprecated(long time, WindowWrap<T> windowWrap) {
  // 当前时间与当前样本窗口的时间差 > 时间窗长度,说明当前样本窗口已经过时
  return time - windowWrap.windowStart() > intervalInMs;
}

4.5 流程图

posted @ 2022-04-10 17:08  tree6x7  阅读(44)  评论(0编辑  收藏  举报