Compute Express Link™ (CXL™)

1、CXL内存的延迟

根据基于 CXL 的大内存池化 - Macan的文章 - 知乎的数据：

控制器直连内存延迟 ~100ns
访问NUMA链路内存 ~ 180ns
访问CXL内存延迟 ~170-250ns

2、CXL的内容和用途

CXL在PCIe 5.0的基础上复用三种类型的协议，分别 CXL.io,CXL.cache,CXL.memory。

CXL.io 用来发现，配置，寄存器访问、中断等。
CXL.cache当设备访问处理器内存用来cache来自处理器的内存。
CXL.memory 用来处理来自处理器对设备内部的内存的访问。

CXL在CPU和设备之间维护了一致性的内存空间。

CXL主要有三种应用场景。

TYPE 1是常用于网卡这类高速缓存设备，它通常只支持io和cache的协议。
TYPE 2是常见于GPU, AI等应用的内存加速器，它支持io,cahce,memory三种协议。
TYPE 3通常是作为内存缓冲器，常用作内存带宽或者是容量的扩展，它支持io和memory两种协议。

不管哪种类型的设备，都要支持io的协议，因为CXL.io 是用来发现，配置，寄存器访问、中断等。我们要配置，访问设备的寄存器，所以要支持io。

以上内容来自于 CXL简介 - 追寻内心的宁静的文章 - 知乎

3、CXL设备类型

结合来自维基的内容 Compute Express Link

Type 1 (CXL.io and CXL.cache) – specialised accelerators (such as smart NIC) with no local memory. Devices rely on coherent access to host CPU memory.
Type 2 (CXL.io, CXL.cache and CXL.mem) – general-purpose accelerators (GPU, ASIC or FPGA) with high-performance GDDR or HBM local memory. Devices can coherently access host CPU's memory and/or provide coherent or non-coherent access to device local memory from the host CPU.
Type 3 (CXL.io and CXL.mem) – memory expansion boards and persistent memory. Devices provide host CPU with low-latency access to local DRAM or byte-addressible non-volatile storage.

Type 2 devices implement two memory coherence modes, managed by device driver. In device bias mode, device directly accesses local memory and no caching is performed by the CPU; in host bias mode, the host CPU's cache controller handles all access to device memory. Coherence mode can be set individually for each 4 KB page, stored in a translation table in local memory of Type 2 devices. Unlike other CPU-to-CPU memory coherency protocols, this arrangement only requires the host CPU memory controller to implement the cache agent; such asymmetric approach reduces implementation complexity and reduces latency.

简要概括（以下把host memory称为内存，把device local memory称为缓存）：

类型1，没有本地缓存的专用加速器（例如智能网卡）。设备可以对主机CPU内存进行一致性访问
类型2，具有高性能GDDR或者HBM的通用加速器（GPU/ASIC/FPGA）,设备可以对host cpu' memory进行一致性访问和/或者从host cpu对设备缓存进行一致性/非一致性的访问
类型3，内存扩展和持久内存，设备可以为host cpu提供DRAM和非易失性存储的低延迟访问

类型2实现了两种存储一致性模型

device bais mode，设备直接访问本地存储，host cpu不做缓存
host bias mode，host cpu的缓存控制器负责对设备本地存储的所有访问

可以为每个4 KB页面单独设置一致性模式，存储在类型2设备本地存储的转换表中。与其他 CPU 到 CPU 内存一致性协议不同，这种安排只需要主机 CPU 内存控制器来实现缓存代理；这种不对称的方法降低了实现的复杂性并减少了延迟。

4、其他链接

1、Compute Express Link 标准介绍
2、聊一聊CXL - 夏晶晶的文章 - 知乎
3、基于PCIe 5.0的CXL是什么？ - 老狼的文章 - 知乎
4、

posted @ 2022-12-12 16:19 又是火星人阅读(934) 评论(0) 收藏举报

刷新页面返回顶部

又是火星人

Compute Express Link™ (CXL™)

1、CXL内存的延迟

2、CXL的内容和用途

3、CXL设备类型

4、其他链接

公告