H264学习第一篇（编码结构分析）

　　学习H264之前，最好阅读一下维基百科中有关H264的相关介绍，里面包含了其的发展历程、主要特点、参考文献、参考网站等。

　　研究H264的主要文件包括两份参考手册（一份是语法结构参考手册，一份是JM开发手册）和软件开发模型JM.x.y(Joint Model H264编解码参考开发模型)。

一、编码框架

1H264的编码框架主要如图1所示：

原来的英语解释是：

The Encoder includes two dataflow paths, a “forward” path (left to right, shown in blue) and a “reconstruction” path (right to left, shown in magenta).
1 Encoder (forward path)
An input frame Fn is presented for encoding. The frame is processed in units of a macroblock (corresponding to 16x16 pixels in the original image). Each macroblock is encoded in intra or inter
mode. In either case, a prediction macroblock P is formed based on a reconstructed frame. In Intra mode, P is formed from samples in the current frame n that have previously encoded, decoded and reconstructed (uF’n in the Figures; note that the unfiltered samples are used to form P). In Inter mode, P is formed by motion-compensated prediction from one or more reference frame(s). In the Figures, the reference frame is shown as the previous encoded frame F’n-1 ; however, the predicton for each
macroblock may be formed from one or two past or future frames (in time order) that have already been encoded and reconstructed.
The prediction P is subtracted from the current macroblock to produce a residual or difference macroblock Dn. This is transformed (using a block transform) and quantized to give X, a set of quantized transform coefficients. These coefficients are re-ordered and entropy encoded. The entropyencoded coefficients, together with side information required to decode the macroblock (such as the macroblock prediction mode, quantizer step size, motion vector information describing how the macroblock was motion-compensated, etc) form the compressed bitstream. This is passed to a Network Abstraction Layer (NAL) for transmission or storage.
2 Encoder (reconstruction path)
The quantized macroblock coefficients X are decoded in order to reconstruct a frame for encoding of further macroblocks. The coefficients X are re-scaled (Q-1) and inverse transformed (T-1) to produce a difference macroblock Dn’. This is not identical to the original difference macroblock Dn ; the quantization process introduces losses and so Dn’ is a distorted version of Dn.
The prediction macroblock P is added to Dn’ to create a reconstructed macroblock uF’n (a distorted version of the original macroblock). A filter is applied to reduce the effects of blocking distortion and reconstructed reference frame is created from a series of macroblocks F’n.

View Code

即首先从当前输入的视频图像中取一待编码的宏块(F_n)，该宏块以帧内或帧间的方法进行编码，生成一个预测宏块P,在帧间模式编码下，P由一个或多个参考图像进行运动补偿MC预测ME得到。当前宏块(F_n)减去预测宏块P，得到残差块D_n，对残差块D_n进行整数变换T、量化Q后得到一组系数X,对系数X再进行重排序和熵编码，就完成了一个宏块的编码过程。经过熵编码的码流加上宏块解码所需的一些信息如预测模式、量化步长运动矢量信息等，就组成了该宏块的压缩后的码流，片（slice)中所有的宏块码流加上片的头信息就组成了片的编码码流，再经过网络抽象层NAL进行传输或存储（图像参考集PPS和序列参数集SPS则单独传输）。

而重建分支主要用来编码时的预测和解码时的相关信息的确认，它的流程：宏块系数X经过反量化和反变换后，得到残差宏块D_n的近似值 D^‘_n，预测块P加上D^‘_n得到未滤波的重构宏块uF_n^'，再做环路滤波来减少块效应，即得到了最终的重构宏块uF_n^'，当前图像中所有的宏块都重建完成后，就形成了重建图像。

2 相关过程的相关术语

（1）帧内预测

由于图片在空间的相关性（空间冗余），故而可以将其压缩编码，以减少其存储空间或传输质量，基于空间冗余的编码，称为帧内编码。在H264中帧内预测中，色度信息和亮度信息是分开预测的。主要有9种4X4块亮度预测，4种16X16块亮度预测，4种3X8块色度预测。

（2）帧间预测

H264帧间预测是利用已编码的帧或场和基于块的运动补偿。主要有P和B类型

（3）变换

（4）量化

（5）熵编码

二、码流结构

1 一种说法是H264采用了分支思想，即将图像压缩成网络抽象层NAL和视频编码层VCL,进而实现了压缩编码和网络传输分离，使编码层可以移植到不同的网络结构中。NAL的基本单元是NALU，VCL的自上而下包括[序列[图像组n[图像n[片组n[片n[宏块组n[宏块n[块n]]]]]]]。

NALU单元的结构如图2所示

图2 NAL单元序列结构

即每个NAL单元由头信息和RBSP组成，头信息由8位bit表示，如图3所示

图3 NAL头结构

其中F,在H264规范中，这一位必须为0，NRI设置NALU的重要性，Type表示该RBSP所携带的信息类型，相关的类型所图4所示

图4 NAL头类型

2另一种说法是H264句法元素共被组织成序列、图像、片、宏块、子宏块五个层次。在这样的结构中，第一层的头部和它的数据部分形成管理与被管理的强依赖关系，头部的句法元素是该层数据的核心，而一旦头部丢失，数据部分的信息几乎不可能再被正确的解码出来，尤其在序列层和图像层。

Each picture is compressed by partitioning it as one or more slices; each slice consists of macroblocks, which are blocks of 16x16 luma samples with corresponding chroma samples. However, each macroblock is also divided into sub-macroblock partitions for motion-compensated prediction. The prediction partitions can have seven different sizes –16x16, 16x8, 8x16, 8x8, 8x4, 4x8 and 4x4. In past standards, motion compensation used entire macroblocks or, in the case of newer designs, 16x16 or 8x8 partitions, so the larger variety of partition shapes provides enhanced prediction accuracy. The spatial transform for the residual data is then either 8x8 (a size supported only in FRExt) or 4x4. In pastmajor standards, the transform block size has always been 8x8, so the 4x4 block size provides an enhanced specificity in locating residual difference signals. The block size used for the spatial transform is always either the same or smaller than the block size used for prediction.

View Code

http://www.cnblogs.com/fengyv/archive/2006/01/04/2423977.html

http://www.cnblogs.com/xkfz007/archive/2012/07/31/2616659.html

posted on 2014-01-07 10:16 鹰之翔阅读(1455) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

H264学习第一篇（编码结构分析）

导航

公告