FFT ip core
The FFT core provides four architecture options to offer a trade-off权衡取舍 between core size and
transform time.
• Pipelined Streaming I/O – Allows continuous data processing.
• Radix-4 Burst I/O – Loads and processes data separately, using an iterative approach. It
is smaller in size than the pipelined solution, but has a longer transform time.
• Radix-2 Burst I/O – Uses the same iterative approach as Radix-4, but the butterfly is
smaller. This means it is smaller in size than the Radix-4 solution, but the transform
time is longer.
Radix-2 Lite Burst I/O – Based on the Radix-2 architecture, this variant uses a
time-multiplexed approach to the butterfly for an even smaller core, at the cost of
longer transform time.
throughput 吞吐量
对于全精度无压缩结构,(unscaled)数据通道内任意一位有意义的整数都将被保留,在运算过程中产生的小数部分都被截断或者取整。此种结构,对于定点算法,经过多级乘法操作以后,数据位宽将加倍递增,其输出位宽为(输入位宽+log2(数据转换长度)+1)bits。
对于块浮点型,(block floating point)对于一帧数据里面的任何一数据点有相同的压缩比,这个压缩比值由块指数(Block Exponent)作为输出值显示,而且只有在FFT IP核检测到将会产生数据溢出的时候,才会进行压缩运算。
Burst I/O Architectures
The scaling performed during successive stages can be set using the appropriate
SCALE_SCH field in the Configuration channel. For the Radix-4, Burst I/O and Radix-2
architectures, the value of the SCALE_SCH field is used as pairs of bits [... N4, N3, N2, N1,
N0], each pair representing the scaling value for the corresponding stage. Stages are
computed starting with stage 0 as the two LSBs. There are log4(point size) stages for
Radix-4 and log2(point size) stages for Radix-2. In each stage, the data can be shifted by 0,
1, 2, or 3 bits, which corresponds to SCALE_SCH values of 00, 01, 10, and 11. For example,
for Radix-4, when N = 1024, [01 10 00 11 10] translates to a right shift by 2 for stage 0, shift
by 3 for stage 1, no shift for stage 2, a shift of 2 for stage 3, and a shift of 1 for stage 4 (there
are log4(1024) = 5 Radix-4 stages). This scaling schedule scales by a total of 8 bits which
gives a scaling factor of 1/256. The conservative schedule SCALE_SCH = [10 10 10 10 11]
completely avoids overflows in the Radix-4, Burst I/O architecture. For the Radix-2, Burst I/
O and Radix-2 Lite, Burst I/O architectures, the conservative scaling schedule of [01 01 01 01
01 01 01 01 01 10] prevents overflow for N = 1024 (there are log2(1024) = 10 Radix-2
stages).
本文所采用的是定点压缩结构。(scaled)该结构相对于全精度无压缩结构,能够大大减少FPGA内部资源Xtreme DSP Slices和块RAM的使用,而相对于块浮点型,可灵活调节压缩比。定点压缩结构的压缩比例表(Scale_SCH)完全由用户自定义得到。压缩比例是按照1、2、4或者8对每一阶进行压缩,即对应于分别向右移位0、1、2或者3。如果压缩不充分,则蝶形输出结果会超出其动态范围,引起数据溢出。对于Burst I/O结构,Scale_SCH的表示方法:对于每一阶的压缩比都由指定的一个2bits的数表示,零阶的2bits数为最低位,具体形式为[N4,N3,N2,N1,N0],每一个2bits数分别对应着相应阶数的压缩比。例:对于基4结构,数据转换长度N=1024,Scale_SCH=[0110 00 1110]则表示对阶0右移位2,对阶1右移位3,对阶2右移位0,对阶3右移位2,对阶4右移位1。经验总结(可以防止产生数据溢出):对于1024点的基4,Burst I/O结构,Scale_SCH=[10 10 10 10 11];而对于1024点的基2结构,Scale_SCH=[01 01 01 01 01 01 01 01 0110]。对于流水线,Streaming I/O结构,把临近的一对基2阶组在一起,即阶0和阶1为组0,阶2和阶3为组1,等等。Scale_SCH的表示方法:对于每一组的压缩比都由指定的一个2bits的数表示,零组的2bits数为最低位,具体形式为[N4,N3,N2,N1,N0],每一个2bits数分别对应着相应组的压缩比,表示同组内的两个基2阶有相同的压缩比。例:数据长度N=1024,Scale_SCH=[10 10 00 01 11]表示对组0(阶0和阶1)右移位3,对组1(阶2和阶3)右移位1,对组2(阶4和阶5)没有移位,对组3(阶6和阶7)右移位2,对组4(阶8和阶9)右移位2。若变换长度N不是4的幂次方的时候,最后一组只包含一个基2阶,只能用00或者01表示。经验总结(可以防止产生数据溢出):N=512时,Scale_SCH=[01 10 10 10 11];N=1024时,Scale_SCH=[1010 10 10 11]。压缩比例Scale_SCH的位宽,对于流水线,Streaming I/O结构和基4,Burst I/O结构,为2*ceil(0.5*log2(N));对于基2,Burst I/O结构和基2Lite Burst I/O结构,为2* log2(N),其中N为转换数据长度。