Reading Software Defined Traffic Measurement with OpenSketch
NSDI ’13
概要
- OpenSketch是一个通用的、抽象的测量框架, 与SDN 网络架构类似, OpenSketch 提出将测量控制层和数据层解耦。 数据层运行设为可动态配置的3阶段流水线, 首先对数据流进行Hash运算以减少需要测量的数据量; 其次在分类阶段, 通过定制并匹配通配符规则实现对流量的分类; 最后在流量计数阶段, 根据不同精度需求, 每个流对应一个或多个计数器以实现流信息统计、整合和回溯.
- 文章中还提到, 通过灵活地组合Hash运算、分类和计数功能实现对不同sketch 方法的支持.
- 控制层最主要的功能在于根据任务需求动态选取合适的sketch 方法,同时根据准确性要求和当前可用资源进行资源分配,获取最优的测量结果. OpenSketch 的分层设计与SDN 架构契合, 可实现细粒度的流量测量, 亦可从流识别5 元组中提取IP, MAC地址等实现对终端的流量统计, 整个过程耗费较小的开销, 获取较高的准确性. 其流水线设计能将测量算法抽象细化为若干步骤, 便于网络管理员实现不同的测量算法. 同时,
OpenSketch 目前已被广泛用于数据中心网络的标准化制定, 并向商用交换机拓展, 商用价值潜力巨大. - 但OpenSketch 需要网络交换机硬件支持, 这对网络运营商来说耗资巨大, 成为推广OpenSketch 的巨大阻碍.
sketch更适合sdn之处
- 在软件方面,sketch部署简单,易于更新迭代,添加新的功能,适于环境的变化
- 在控制平面有全局视野,可以实现丰富的动态配置、资源分配的功能,进行动态回溯。
- 需要网络交换机硬件支持, 这对网络运营商来说耗资巨大, 成为推广OpenSketch 的巨大阻碍。
sketch布在sdn和传统网络上的差别
- OpenSketch采取数据平面硬件,控制平面软件
- 有一个控制平面可以进行动态配置(sketch的选择、资源分配),而传统网络的方法相对固定。
- 在控制平面可以进行统一计数,数据分析
- 控制平面可以自动安装sketch的数据结构
- sketch在硬件上实现难度较高,只能选择尽量少的哈希函数减少硬件部署的复杂度。
Background
- Most network management tasks in software-defined networks (SDN) involve two stages: measurement and control. While many efforts have been focused on network control APIs for SDN, little attention goes into measurement.
- The key challenge of designing a new measurement API is to strike a careful balance between generality (supporting a wide variety of measurement
tasks) and efficiency (enabling high link speed and low
cost). - Flow-base measurements provide generic support for different measurement tasks, but consume too resources.
- Many sketch-based algorithm are not deployed in practice because of their lack of generality.
- Sketches are compact data structures used in streaming algorithms to store summary information about the state of packets.
On Sketch
- Low memory usage
- Provable tradeoffs of memory and accuracy
Contribution
- First, OpenSketch allows more customized and thus more efficient data collection with respect to choosing which flow to measure (using both hashing and wildcard rules)
- Second, OpenSketch makes measurement programming easier at the controllers by freeing operators from understanding the complex switch implementations and
parameter tuning in diverse sketches. We build a measurement library which automatically configures the data plane pipeline for different sketches and allocates the switch memory across tasks to maximize accuracy. - we rely on the software in the controller to implement these complex data structures and algorithms using simpler sketches in the data plane.
Solution
- We propose a software defined traffic measurement architecture OpenSketch, which separates the measurement data plane from the control plane.
- In the data plane, OpenSketch provides a simple three-stage pipeline (hashing, filtering, and counting), which can be implemented with commodity switch components and support many measurement tasks. In the control plane, OpenSketch provides a measurement library that automatically configures the pipeline and allocates resources for different measurement tasks.
OpenSketch Data Plane
- picking the packets to measure and storing/exporting the measurement data
Picking the packets to measure:
Hash
- Hashes can be used to provide a compact summary of the set of flows to measure
- To count the number of redundant packets with the same content, we can hash on the packet body into a short fingerprint rather than store and compare the entire packet body every time. Hashes also enable a provable accuracy and memory tradeoff
Classification
- Classification is also useful for focusing on some specific flows.
- we need a classification stage to measure different flows with different number of counters or with different levels of accuracy.
- For classifying flows, we can specify wildcard rules that match packets on flow fields and allow some bits in the flow fields to be “don’t care”.
Storing and exporting the data:
- OpenSketch uses a small table with complex indexing.
- To get such flexibility and memory saving, Open-Sketch requires more complex indexing using the hashing and classification modules.
OpenSketch data plane:
- OpenSketch data plane has three stages: a hashing stage to reduce the measurement
data, a classification stage to select flows, and a counting stage to accumulate traffic statistics - First, the hashing stage picks the packet source field and calculates a single hash function.
- Next, the classification stage picks the packet destination field and filters all the packets matching the rule (dst : 192.168.1.0/24→1). Each rule has an index field, which can be used to calculate the counter location in the counting stage.
Build on existing switch components
A few simple hash functions
- 4-8 three-wise or five-wise independent hash functions are enough for many measurement requirements, and can be implemented efficiently
in hardware
A few TCAM entries for classification:
Flexible counters in SRAM
- store all the counters in the SRAM, because SRAMs are much cheaper, more energy-efficient, and thus larger than TCAMs
Supporting diverse sketches
Bit checking operations
Picking packets with a given probability
Picking packets with different granularity:
OpenSketch Controller
- A sketch manager that automatically configures the sketches with the best memory-accuracy tradeoff;
- and a *resource allocator that divides switch memory resources
across measurement tasks. - are not directly supported by sketches, we can still install simpler sketches and implement the complex data analysis part in software in the controller
Combining Count-Min sketch and bitmap
Sampling source-destination pairs to reduce memory usage
Querying in the control plane.
Automatic config. with sketch manager
- right configurations in the measurement data plane is notoriously difficult, because it depends on the available resources at switches, the accuracy requirements of the measurement tasks, and the traffic distribution.
- The sketch manager automatically picks the right sketch
- automatically install new sketches
Related work
- Flow-based measurements such as NetFlow [2] and
sFlow [42] provide generic support for different measurement
tasks, but consume too resources - OpenSketch redesigns the measurement APIs
at switches to be both generic and efficient - choosing which flow to measure
- a three-stage data
plane pipeline - makes measurement programming
easier at the controllers by freeing operators from
understanding the complex switch implementations and
parameter tuning in diverse sketches. We build a measurement
library which automatically configures the data
plane pipeline for different sketches and allocates the
switch memory across tasks to maximize accuracy. - prototype on NetFPGA, which shows no additional overhead
on switch data plane. - we rely on the software in the controller
to implement these complex data structures and
algorithms using simpler sketches in the data plane.
sketch部署在sdn与传统网络的不同
- 在软件方面,软件易于更新迭代基于sketch的测量方法、添新的功能,功能较为丰富,花费较少易于推广
- 而控制平面具有全局视野,可以实现丰富的动态配置(选择测量方法)、资源分配的功能,进行统一的数据分析
- 在传统的硬件上sketch实现较复杂,只能选择尽量少的哈希函数减少部署的复杂度