视频编码名词参数解释——非常全面详细

http://blog.csdn.net/bytxl/article/details/50436875

GOP（Group of Pictures）

策略影响编码质量：所谓GOP，意思是画面组，一个GOP就是一组连续的画面。MPEG编码将画面（即帧）分为I、P、B三种，I是内部编码帧，P是前向预测帧，B是双向内插帧。简单地讲，I帧是一个完整的画面，而P帧和B帧记录的是相对于I帧的变化。没有I帧，P帧和B帧就无法解码，这就是MPEG格式难以精确剪辑的原因，也是我们之所以要微调头和尾的原因。GOP 越长，B 帧所占比例更高，编码的率失真性能越高。

In Video coding, a group of pictures specifies the order in which intra- and inter-frames are arranged.

The GOP is a group of successive pictures within a coded video stream.

Each coded video stream consists of successive GOPs.

From the pictures contained in it, the visible frames are generated.

A GOP can contain the following picture types:

§ I-picture or I-frame (intra coded picture) - reference picture,

which represents a fixed image and which is independent of other picture types.

Each GOP begins with this type of picture.

§ P-picture or P-frame (predictive coded picture) - contains motion-compensated difference

information from the preceding I- or P-frame.

§ B-picture or B-frame (bidirectionally predictive coded picture) - contains difference information

from the preceding and following I- or P-frame within a GOP.

§ D-picture or D-frame (DC direct coded picture) - serves the fast advance.

A GOP always begins with an I-frame. Afterwards several P-frames follow, in each case with

some frames distance. In the remaining gaps are B-frames. A few video codecs allow for more than

one I-frame in a GOP.

The I-frames contain the full image and do not require any additional information to reconstruct it.

Therefore any errors within the GOP structure are corrected by the next I-frame.

B-frames within a GOP only propagate errors in H.264, where B-frames can be referenced by other pictures in order to increase compression efficiency.

The more I-frames the video stream has, the more editable it is. However, having more I-frames increases the stream size. In order to save bandwidth and disk space, videos

prepared for internet broadcast often have only one I-frame per GOP.

The GOP structure is often referred by two numbers, for example M=3, N=12. The first one

tells the distance between two anchor frames (I or P). The second one tells the distance

between two full images (I-frames): it is the GOP length <就是说GOP长度是两个I帧的距离>.

For the example M=3 N=12, the GOP structure is IBBPBBPBBPBBI.

QP <quantization parameter> 量化参数

Wikipedia上居然没有对这个做一个解释，至少现在还木有。只好查了别的资料，解释如下：

H.264编解码器中，量化参数QP和量化步长Qstep的关系：

量化步长Qstep共有52个值。（对于亮度编码而言）

量化参数QP是量化步长Qstep的序号，取值0~51。

QP取最小值0 时，表示量化最精细；相反，QP取最大值51时，表示量化是最粗糙的。

Qstep随着QP的增加而增加，QP每增加6，Qstep增加一倍。

对于色度编码，QP的最大值是39。

在深度视频实验里我用的QP分别是22,27,32,37；结果可见22的最清晰，37的最模糊。

Bit Rate 码率

In telecommunications and computing, bitrate (sometimes written bit rate, data rate or as a variable R[1])

is the number of bits that are conveyed or processed per unit of time.

码率就是数据传输时单位时间传送的数据位数,一般我们用的单位是kbps即千位每秒。通俗一点的理解就是取样率，单位时间内取样率越大，精度就越高，处理出来的文件就越接近原始文件，也就是说画面的细节就越丰富，但压缩率也就越小。

码流 x 时间 = 总容量

Multimedia encoding

In digital multimedia, bit rate often refers to the number of bits used per unit of playback time to

represent a continuous medium such as audio or video aftersource coding (data compression).

The encoding bit rate of a multimedia file is the size of a multimedia file in bytes divided by the

playback time of the recording (in seconds), multiplied by eight.

For realtime streaming multimedia, the encoding bit rate is the goodput that is required to avoid interrupt:

Encoding bit rate = Required goodput

The term average bitrate is used in case of variable bitrate multimedia source coding schemes. In this context,

the peak bit rate is the maximum number of bits required for any short-term block of compressed data.[12]

A theoretical lower bound for the encoding bit rate for lossless data compression is the source information rate,

also known as the entropy rate.(熵率)

Entropy rate ≤ Multimedia bit rate

PSNR Peak signal-to-noise ratio

The phrase peak signal-to-noise ratio, often abbreviated PSNR, is an engineering term

for the ratio between the maximum possible power of a signal and the power of

corrupting noise that affects the fidelity(保真度) of its representation.

Because many signals have a very wide dynamic range, PSNR is usually expressed

in terms of the logarithmic decibel scale.

The PSNR is most commonly used as a measure of quality of reconstruction of

lossy compression codecs (e.g., for image compression). The signal in this case

is the original data, and the noise is the error introduced by compression.

When comparing compression codecs it is used as an approximation to human perception of

reconstruction quality, therefore in some cases one reconstruction may appear

to be closer to the original than another, even though it has a lower PSNR

(a higher PSNR would normally indicate that the reconstruction is of higher quality).

One has to be extremely careful with the range of validity of this metric;

it is only conclusively valid when it is used to compare results from the same codec

(or codec type) and same content.

PSNR值越大，就代表失真越少。

熵编码

熵编码即编码过程中按熵原理不丢失任何信息的编码。信息熵为信源的平均信息量（不确定性的度量）。常见的熵编码有：香农(Shannon)编码、哈夫曼(Huffman)编码和算术编码(arithmetic coding)。在视频编码中，熵编码把一系列用来表示视频序列的元素符号转变为一个用来传输或是存储的压缩码流。输入的符号可能包括量化后的变换系数，运动向量，头信息（宏块头，图象头，序列的头等）以及附加信息（对于正确解码来说重要的标记位信息）。

SODB

SODB是String of Data Bits，最原始的编码数据，没有任何附加数据。

RBSP

RBSP是原始字节序列载荷:Raw Byte Sequence Payload，在SODB的后面填加了结尾比特（RBSP trailing bits　一个bit“1”和若干比特“0”），以便字节对齐。

EBSP

EBSP是Encapsulation Byte Sequence Packets，在RBSP基础上填加了仿校验字节（0X03），它的原因是：　在NALU加到Annexb上时，需要添加每组NALU之前的开始码StartCodePrefix，如果该NALU对应的slice为一帧的开始则用4字节表示（0x00000001）,否则用3字节表示（0x000001）。为了使NALU主体中不包括与开始码相冲突的连续字节码，在编码时，每遇到两个字节连续为0，就插入一个字节的0x03。解码时将0x03去掉。也称为脱壳操作。即：

在对整帧图像的数据比特串(SODB)添加原始字节序列载荷(RBSP)结尾比特（RBSP trailing bits，添加一比特的“1”和若干比特“0”，以便字节对齐）后，再检查RBSP 中是否存在连续的三字节“00000000 00000000 000000xx”；若存在这种连续的三字节码，则在第三字节前插入一字节的“0x03”，以免与起始码竞争，形成EBSP码流。这需要将近两倍的整帧图像码流大小。为了减小存储器需求，在每个宏块编码结束后即检查该宏块SODB中的起始码竞争问题，并保留SODB最后两字节的零字节个数，以便与下一宏块的SODB的开始字节形成连续的起始码竞争检测；对一帧图像的最后一个宏块，先添加结尾停止比特，再检测起始码竞争。

NAL（Network Abstraction Layer）基本上可分两种：1，以有序字节流方式传送的针对 H.320的NAL；2，针对IP网络的RTP打包方式的NAL。

NAL作用：specified to format that data and provide header information in a manner appropriate for conveyance on a variety of communication channels or storage media.

NAL的处理过程基本上分为两步：

1，将VCL层输出的SODB封装成nal_unit. Nal_unit是一个通用封装格式，可以适用于有序字节流方式和IP包交换方式。

2，针对不同的传送网络（电路交换|包交换），将nal_unit 封装成针对不同网络的封装格式。

第一步的具体过程： VCL层输出的比特流SODB（String Of Data Bits）到nal_unit之间，经过了以下三步处理：

1，SODB字节对齐处理后封装成RBSP（Raw Byte Sequence Payload）。

2，为防止RBSP的字节流与有序字节流传送方式下的SCP（start_code_prefix_one_3bytes ，0x000001）出现字节竞争情形，循环检测RBSP前三个字节，在出现字节竞争时在第三字节前加入emulation_prevention_three_byte （0x03），具体方法：

nal_unit( NumBytesInNALunit )
{
forbidden_zero_bit
nal_ref_idc
nal_unit_type
NumBytesInRBSP = 0
for( i = 1; i < NumBytesInNALunit; i++ )
{
if( i + 2 < NumBytesInNALunit && next_bits( 24 ) = = 0x000003 )
{
rbsp_byte[NumBytesInRBSP++ ]
rbsp_byte[ NumBytesInRBSP++ ]
i += 2
emulation_prevention_t
hree_byte /* equal to 0x03 */
}
else
rbsp_byte[ NumBytesInRBSP++ ]
}
}

3，防字节竞争处理后的RBSP再加一个字节的header(forbidden_zero_bit+ nal_ref_idc+ nal_unit_type)，封装成nal_unit。

第二步的具体过程：

case1：有序字节流的封装
byte_stream_nal_unit(NumBytesInNALunit)
{
while (next_bits(24) != 0x000001)
zero_byte /* equal to 0x00 */
if (more_data_in_byte_stream())
{
start_code_prefix_one_3bytes
/* equal to 0x000001 */ nal_unit(NumBytesInNALunit)
}
}
Case2：
IP网络的RTP打包封装。

H.264其它术语缩写

RDO, Rate Distortion Optimization, 率失真优化
QP, Quantization Parameter, 量化参数
RC, Rate Control, 码率控制
ABR, Average BitsRate, 平均码率
VBV, Video Buffer Verify(or Verifier), 视频缓冲验证(或验证器)
FMO :Flexable Macroblock Ordering ,灵活的宏块排序,就是各个宏块和slicegroup的对应关系.
SH:Slice Header.
EC, Error Concealment, 误码掩盖
ASO 任意条带顺序，条带编码可以不是光栅扫描顺序
BAB 二值Alpha块，用于标示区域的边界
BAP 身体动画参数
块做变换的宏块区域 8x8 4x4
块匹配矩形图象区域上的运动估计
块效应图象的方块或矩形失真区域
B帧双向运动补偿预测的编码图像(条带)
CABAC 基于上下文的自适应二值算数编码
CAE 基于上下文的算数编码
CAVLC 基于上下文的自适应变长编码
色度色彩差异分量
CIF 通用中间格式,一种彩色图像格式 352 x 288
色彩空间表示彩色图像的方法
DCT 离散余弦变换
直接预测一种编码模式,此模式中不传输运动适量
DPCM 差分脉冲编码调制
DSCQS 双激励连续质量等级，主管质量衡量的等级和方法
DWT 离散小波变换
熵编码降低冗余的编码方法
错误隐藏编码图像的后处理，以消除或减小可见的错误效应
Exp-Colomb 指数哥伦布变长编码
FAP 脸部动画参数
FBA 脸部和身体动画
FGS 精细颗粒分级
场隔行视频序列的偶数行或奇数行
流程图变换算法的图示表示
FMO 灵活的宏块顺序,宏块的编码顺序可以不是光栅扫描顺序
完全搜索一种运动估计算法
GMC 全局运动补偿，用于整个编码对象
GOP 图像组,编码视频图像的集合
哈夫曼编码降低冗余的编码方法
HVS 人体视觉系统，用来官场和解释视频图像的系统
混合以涌动补偿和变换为特征的编解码器
IEC 国际电工委员会,一个标准体；
帧间编码使用时间预测和补偿的视频帧编码
隔行扫描表示为一系列场的视频数据
帧内编码不使用时间预测的视频帧编码
ISO 国际标准化组织
ITU 国际电信联盟
JPEG 联合图像专家组，ISO的委员会
JPEG2000 一种编码标准
延迟通过通信系统的延迟
级别符合一定特性的参数集合(适用于档次)
环内滤波器编码或解码反馈环路里的空间滤波器
宏块帧编码的单元
宏块分块有自己的运动适量的宏块子区域
宏块子分块有自己的运动适量的宏块子区域
媒体处理器具有媒体编码和处理特性的处理器
运动补偿视频帧预测的运动模型
运动估计两个或多个视频帧之间的相对运动估计
运动适量用于运动补偿的指示位移块或区域的适量
NAL 网络提取层
主观质量衡量算法的视频图像视觉质量
OBMC 重叠块运动补偿
图像编码的视频帧
P帧使用参考帧作豫补偿预测的编码图像
档次视频编解码器的工具集
逐行扫描由完整帧的序列来表示视频数据
PSNR 峰值信噪比，一种客观质量的量度方法
QCIF 四分之一通用中间格式 176 x 144
量化降低标量或矢量地精度
码率控制编码视频信号的比特率的控制
码率失真编解码器性能的量度
RBSP 原始字节序列载荷
RGB 红/绿/蓝色彩空间
环状效应解码图像中尖锐边缘周围的"波纹"装痕迹
RTP 实时传输协议
RVLC 可逆变长编码
分级编码把信号编码成若干层
SI条带用于在编码的比特流之间切换的帧内条带
条带编码图像的一个区域
SNHC 合成的图像，自然的图像混合编码
SP条带用于在编码的比特流之间切换的帧间条带
Sprite 可以包含在一系列解码帧中的纹理区域
统计冗余由数据的统计分布引起的冗余
演播室质量无损或接近无损的视频质量
主观质量由人的眼睛感觉到的视频质量
主管冗余由主管上不重要的数据带来的冗余
半像素(运动补偿) 整数像素位插值形成的参考区域的运动补偿预测
测试模型描述视频编码标准的参考实现的软件模型和文档
纹理图像或残差数据
树结构运动补偿以分块尺寸的灵活可变为特征的运动补偿
TSS 三步搜索法，一种运动估计算法
VCEG 视频编码专家组
VCL 视频编码层
视频包合适打包的编码单元
VLC 变长码
VLD 变长解码器
VLE 变长编码器
VLSI 超级大规模集成电路
VO 视频对象
VOP 视频对象面
VQEG 视频质量专家组
加权预测来自两个参考区域的预测点做尺度变换的运动补偿
YCbCr 亮度,蓝色度,红色度的色彩空间
YUV 一个色彩空间