Color space converter: RGB to YCbCr

2011-03-23 22:38 yucan 阅读(1126) 评论(0) 编辑收藏举报

今天完成了RGB到YCbCr色彩空间转换VHDL模块，总的来讲不是很难。我并没有自己编写代码，像这种典型的应用一般都有现成的比较好的代码。我在opencore网站找到了Color Converter IP Core代码，但是这个代码太复杂（有很多可调参数），很像altera VIP IP核中的CSC；另外我还找到了Xilinx的两个版本的RGB2YCbCr的VHDL代码，一个是Color-Space Converter: RGB to YCrCb，XAPP930，这个代码包还有对应的说明文档，这些都能在Xilinx官网上下载到；另一个是在国内某论坛上下载的，相对简单一点，而且思路清晰，也是Xilinx公司的，于是我就选择了这个版本，细细研读了一番。由于我没有写过多少VHDL代码，这时候多读一些好代码，学习良好的VHDL代码编写规范，无疑是一件很好的事情。

      程序使用了四拍流水线技术进行色彩高速转换，以面积换取速度。下面是代码的算法思路说明：
-- Assumptions:
-- (a) R,G,B are 8-bit gamma-corrected values with full 0-255 range
-- (b) ITU-R BT.601-2 component video standards (SDTV), 4:4:4 encoding
-- (c) possible YCbCr range is 0-255, but ITU specifies:
--      Y has a range of 16-235, 8-bit
--      Cb has a range of 16-240, 8-bit
--      Cr has a range of 16-240, 8-bit
-- (d) code 0 and 255 are used for synchronization when 4:2:2
--
-- 这是普通意义上的转换公式
-- Analog Equations (normalized color-difference):
-- Y601 = 16 +    219 * ( 0.299*R + 0.587*G + 0.114*B )
-- Cb   = 128 + 224*Kb * ( B - (0.299*R + 0.587*G + 0.114*B) ), where Kb=0.5/0.886
-- Cr   = 128 + 224*Kr * ( R - (0.299*R + 0.587*G + 0.114*B) ), where Kr=0.5/0.701
-- where,
-- R,G,B are in [0, +1] range
-- Y601 is in [16, 235] range (0-219, offset=16)
-- Cb, Cr are in [16, 240] range (+/- 112, offset=16)
-- Component valid range: 0 (sync), 1 - 254 (valid video), 255 (sync)
--
--
-- Digital Equations (R,G,B are 8-bit as specified in ITU-R BT.601)
-- Y601 = 16 + ( 0.257*R + 0.504*G + 0.098*B)
-- Cb   = 128 + (-0.148*R - 0.291*G + 0.439*B)
-- Cr   = 128 + ( 0.439*R - 0.368*G - 0.071*B)
-- where,
-- R,G,B are in [0, 255] range, 8-bit values
-- Y601 is in [16, 235] range (0-219, offset=16)
-- Cb, Cr are in [16, 240] range (+/- 112, offset=128)
-- Component valid range: 0 (sync), 1 - 254 (valid video), 255 (sync)
--
-- 算法本身十分简单，但是为了适应FPGA硬件特性，提高计算速度，需要对公式进行如下变换：
-- ------------------------------------------------------------------------------
-- [Initially] Implemented CSC function: development of 8-bit quantized equations
-- Specific group of terms ease hardware implementation.
-- ------------------------------------------------------------------------------
--
-- Y601 = 16 + ( 0.257*R + 0.504*G + 0.098*B)
--       = 16 + (1/256) * [ 256 * (0.257*R + 0.504*G + 0.098*B) ]
--       = 16 + (1/256) * [ 65.792*R + 129.024*G + 25.088*B ]
--       = 16 + (1/256) * [ 66*R + 129*G + 25*B ]
--       = (1/256) * [ 16*256 + ( 66*R + 129*G + 25*B ) ]
--       = (1/256) * [ ( 16*256 + 129*G )+ ( 66*R + 25*B ) ]
--
-- Cb   = 128 + (-0.148*R - 0.291*G + 0.439*B)
--       = 128 + (1/256) * [ 256 * (-0.148*R - 0.291*G + 0.439*B) ]
--       = 128 + (1/256) * [ -37.888*R - 74.496*G + 112.384*B ]
--       = 128 + (1/256) * [ -38*R - 74*G + 112*B ]
--       = (1/256) * [ 128*256 + ( -38*R + -74*G + 112*B ) ]
--       = (1/256) * [ ( 128*256 + 112*B ) - ( 38*R + 74*G ) ]
--
-- Cr   = 128 + ( 0.439*R - 0.368*G - 0.071*B )
--       = 128 + (1/256) * [ 256 * ( 0.439*R - 0.368*G - 0.071*B) ]
--       = 128 + (1/256) * [ 112.384*R - 94.208*G - 18.176*B ]
--       = 128 + (1/256) * [ 112*R - 94*G - 18*B ]
--       = (1/256) * [ 128*256 + ( 112*R + -94*G + -18*B ) ]
--       = (1/256) * [ ( 128*256 + 112*R ) - ( 94*G + 18*B ) ]

-- 每个并行分量所需的资源为：
-- Y601 will need three constant multipliers, three adders
-- Cb will need three constant multipliers, two adders and one subtractor
-- Cr will need three constant multipliers, two adders and one subtractor
-- Round up performed to bring output values to 8-bit...
--
-- However, the above CSC function suffers from 8-bit quantized coefficients...
-- A better accuracy is achieved by increasing the precision of the coefficients.
-- That will influence the implemented area used for the core (initial 8-bit
-- coefficient uses 150 slices and runs at 100MHz, but shows last digit error).

这次之前，我花了半天在折腾Video and Image Processing IP核，这个IP核里面有现成的高度优化好的色彩空间转换模块、伽玛校正模块等等，如果能直接使用这些模块，那么我的工作将会变得轻松许多。一开始我确实天真的认为可以直接使用这些模块，直到遇到使用第三方EDA仿真工具编译错误，才突然意识到天底下没有免费的午餐。license对第三方的EDA工具不支持，只能生成time-limited（one hour）的sof文件，在没有授权时opencore是不允许生成Netlist的。原因是对于使用Quartus II进行开发，不仅软件本身需要破解，其封装集成的部分IP核也是需要破解的。

下午我就老老实实的研读VHDL代码，一开始对于代码中实现四拍流水线设计不是很理解，觉得仿真结果中出现四个时钟的延时是乘法器的延时造成的。其实不然，在本程序中，乘法器在一个时钟周期内就完成了运算。四个时钟周期构成了流水线的一个指令周期，同一时刻有四个指令在执行，只是每个指令处于不同的状态。我花了一个多小时来弄清楚这些概念，对流水线操作有了进一步的认识。

会员力量，点亮园子希望

刷新页面返回顶部

yucan

Color space converter: RGB to YCbCr

About