前言:Speex官网:http://speex.org/ 可以再Documentation下找到PDF版或HTML OL版的英文手册。可能会由于英文技能的匮乏或语音解码领域的不熟悉会有翻译错误,所以每段我都会付上英文原段落,也望各位发现后能够不吝赐教,大家共同进步。

 

PS: 1) 如需转载,注明出处,不胜感激; 2) 如侵您版权,及时通知,速删之

 

 

5.1 编码

5.2 解码

5.3 编解码选项(speex_*_ctl)

5.4 模式查询

5.5 封包和带内信令

 

补充

 

后记

 

The libspeex library contains all the functions for encoding and decoding speech with the Speex codec. When linking on a UNIX system, one must add -lspeex -lm to the compiler command line. One important thing to know is that libspeex calls are reentrant, but not thread-safe. That means that it is fine to use calls from many threads, but calls using the same state from multiple threads must be protected by mutexes. Examples of code can also be found in Appendix A and the complete API documentation is included in the Documentation section of the Speex website (http://www.speex.org/).

Speex编解码器的libspeex包囊括了所有的语音编码和解码函数。在Linux系统中连接时,必须在编译器命令行中加入-lspeex –lm。需要知道的是,虽然libspeex的函数调用是可重入的,但不是线程安全的,所以在多线程调用时,如果使用共享资源需要进行互斥保护。附录A中有代码实例,在Speex站点(http://www.speex.org/ )的文档部分能下到完整的API文档。

 

 

5.1 编码

In order to encode speech using Speex, one first needs to:
#include <speex/speex.h>
Then in the code, a Speex bit-packing struct must be declared, along with a Speex encoder state:
SpeexBits bits;

void *enc_state;
The two are initialized by:
speex_bits_init(&bits);
enc_state = speex_encoder_init(&speex_nb_mode);
For wideband coding, speex_nb_mode will be replaced by speex_wb_mode. In most cases, you will need to know the frame size used at the sampling rate you are using. You can get that value in the frame_size variable (expressed in samples, not
bytes) with:
speex_encoder_ctl(enc_state,SPEEX_GET_FRAME_SIZE,&frame_size);
In practice, frame_size will correspond to 20 ms when using 8, 16, or 32 kHz sampling rate. There are many parameters that can be set for the Speex encoder, but the most useful one is the quality parameter that controls the quality vs bit-rate tradeoff.
This is set by:
speex_encoder_ctl(enc_state,SPEEX_SET_QUALITY,&quality);
where quality is an integer value ranging from 0 to 10 (inclusively). The mapping between quality and bit-rate is described in Fig. 9.2 for narrowband.
Once the initialization is done, for every input frame:
speex_bits_reset(&bits);
speex_encode_int(enc_state, input_frame, &bits);
nbBytes = speex_bits_write(&bits, byte_ptr, MAX_NB_BYTES);
where input_frame is a (short *) pointing to the beginning of a speech frame, byte_ptr is a (char *) where the encoded frame will be written,MAX_NB_BYTES is the maximumnumber of bytes that can be written to byte_ptr without causing an overflow and nbBytes is the number of bytes actually written to byte_ptr (the encoded size in bytes). Before calling speex_bits_write, it is possible to find the number of bytes that need to be written by calling speex_bits_nbytes(&bits), which returns a number of bytes.
It is still possible to use the speex_encode() function, which takes a (float *) for the audio. However, this would make an eventual port to an FPU-less platform (like ARM) more complicated. Internally, speex_encode() and speex_encode_int() are processed in the same way. Whether the encoder uses the fixed-point version is only decided by the compile-time flags, not at the API level.
After you’re done with the encoding, free all resources with:
speex_bits_destroy(&bits);
speex_encoder_destroy(enc_state);
That’s about it for the encoder.

使用Speex进行语音编码,首先要:

#include < speex/speex.h >

在代码中,需要声明Speex比特包结构体,同时设置Speex编码器状态:

SpeexBits bits;

void * enc_state;

初始化两变量:

speex_bits_init( &bits );

enc_state = speex_encoder_init( &speex_nb_mode );

用speex_wb_mode代替为speex_nb_mode,即可转换为宽带编码。很多时候,你在使用采样率的需要知道帧的大小,可以通过变量frame_size(用样本中的单位表示,不以字节为单位)获得,调用下面函数:

speex_encoder_ctl( enc_state, SPEEX_GET_FRAME_SIZE, &frame_size );

实践表明,在采用8、16或32kHz采样率的时候,frame_size大约对应于20ms。Speex编码器还有很多参数可以设置,其中最有用的一个是质量参数,控制着比特率(bit-rate)交换的质量,通过下面函数设置:
speex_encoder_ctl( enc_state, SPEEX_SET_QUALITY, &quality );

quality是一个0~10(包含10)范围内的整数,窄带(narrowband)的质量和比特率(bit-rate)的对应关系如图9.2所示。

初始化成功后,对于每帧的输入:

speex_bits_reset( &bits );

speex_encode_int( enc_state, input_frame, &bits );

nbBytes = speex_bits_write( &bits, byte_ptr, MAX_NB_BYTES );

其中,input_frame是指向每个Speex帧开始的short型指针,byte_ptr是将写入已被编码的帧的char型指针,MAX_NB_BYTES是byte_ptr在不导致溢出时可被写入的最大字节数,nbBytes是byte_ptr实际被写入的字节数(编码大小以字节为单位)。在调用speex_bits_write之前,可能会通过speex_bits_nbytes(&bits)返回的字节数获得需要被写入的字节数,也可能使用speex_encode() 函数,它接受一个携带音频数据的float*型参数。不过这将使缺少浮点运算单元(FPU)的平台(如ARM)变的更为复杂。实际上,speex_encode和speex_encode_int()用同样的方法处理,编码器是否使用定点数取决于编译期的标志位,不由API来控制。

完成编码后,释放所有资源:
speex_bits_destroy( &bits );

speex_encoder_destroy( enc_state );

这是关于编码的部分。

  

5.2 解码

In order to decode speech using Speex, you first need to:
#include <speex/speex.h>
You also need to declare a Speex bit-packing struct
SpeexBits bits;
and a Speex decoder state
void *dec_state;
The two are initialized by:
speex_bits_init(&bits);
dec_state = speex_decoder_init(&speex_nb_mode);
For wideband decoding, speex_nb_mode will be replaced by speex_wb_mode. If you need to obtain the size of the frames that will be used by the decoder, you can get that value in the frame_size variable (expressed in samples, not bytes) with:
speex_decoder_ctl(dec_state, SPEEX_GET_FRAME_SIZE, &frame_size);
There is also a parameter that can be set for the decoder: whether or not to use a perceptual enhancer. This can be set by:
speex_decoder_ctl(dec_state, SPEEX_SET_ENH, &enh);
where enh is an int with value 0 to have the enhancer disabled and 1 to have it enabled. As of 1.2-beta1, the default is now to enable the enhancer.
Again, once the decoder initialization is done, for every input frame:
speex_bits_read_from(&bits, input_bytes, nbBytes);
speex_decode_int(dec_state, &bits, output_frame);
where input_bytes is a (char *) containing the bit-stream data received for a frame, nbBytes is the size (in bytes) of that bit-stream, and output_frame is a (short *) and points to the area where the decoded speech frame will be written. A NULL value as the second argument indicates that we don’t have the bits for the current frame. When a frame is lost, the Speex decoder will do its best to "guess" the correct signal.
As for the encoder, the speex_decode() function can still be used, with a (float *) as the output for the audio. After you’re done with the decoding, free all resources with:
speex_bits_destroy(&bits);
speex_decoder_destroy(dec_state);

使用Speex解码语音,首先要包含speex.h头文件。

#include < speex/speex.h>

需要声明Speex比特包的结构体和Speex解码器的状态

SpeexBits bits;

void* dec_state;

进行初始化
speex_bits_init( &bits );

dec_state = speex_decoder_init( &speex_nb_mode );

用speex_wb_mode代替speex_nb_mode,可转换为宽带(windband)解码。可能过变量frame_size来获得解码的帧大小

speex_decoder_ctl( dec_state, SPEEX_GET_FRAME_SIZE, &frame_size );

还可以能过下面函数设置是否使用“知觉增强”功能

speex_decoder_ctl( dec_state, SPEEX_SET_ENH, &enh );

如果enh是0则表是不启用,1则表示启用。在1.2-beta1中,默认是开启的。

做完初始化工作后,则可对每个输入帧进行如下操作:

speex_bits_read_from( &bits, input_bytes, nbBytes );

speex_decode_int( dec_state, &bits, output_frame );

其中,input_bytes是char型指针,包含了一帧的比特流数据,nbBytes是那帧比特流数据的大小(以字节为单位),output_frame是short型指针,指向一块内存区域,存储对语音帧的解码。第二个参数为空值(NULL)意味着没有获得到正确的比特(bit)数据,出现丢帧,Speex解码器会尽可能猜测最为准确的语音信号。

和编码器类似,可以用speex_decode()函数的一个float*型参数获得音频输出。

完成解码后,释放掉所有资源:

speex_bits_destory( &bits );

speex_decoder_destory( dec_state );

 

5.3 编解码选项(speex_*_ctl)

The Speex encoder and decoder support many options and requests that can be accessed through the speex_encoder_ctl and
speex_decoder_ctl functions. These functions are similar to the ioctl system call and their prototypes are:
void speex_encoder_ctl(void *encoder, int request, void *ptr);
void speex_decoder_ctl(void *encoder, int request, void *ptr);
Despite those functions, the defaults are usually good for many applications and optional settings should only be used when one understands them and knows that they are needed. A common error is to attempt to set many unnecessary settings.
Here is a list of the values allowed for the requests. Some only apply to the encoder or the decoder. Because the last argument is of type void *, the _ctl() functions are not type safe, and shoud thus be used with care. The type spx_int32_t is the same as the C99 int32_t type.
SPEEX_SET_ENH‡ Set perceptual enhancer to on (1) or off (0) (spx_int32_t, default is on)
SPEEX_GET_ENH‡ Get perceptual enhancer status (spx_int32_t)
SPEEX_GET_FRAME_SIZE Get the number of samples per frame for the current mode (spx_int32_t)
SPEEX_SET_QUALITY† Set the encoder speech quality (spx_int32_t from 0 to 10, default is 8)
SPEEX_GET_QUALITY† Get the current encoder speech quality (spx_int32_t from 0 to 10)
SPEEX_SET_MODE† Set the mode number, as specified in the RTP spec (spx_int32_t)
SPEEX_GET_MODE† Get the current mode number, as specified in the RTP spec (spx_int32_t)
SPEEX_SET_VBR† Set variable bit-rate (VBR) to on (1) or off (0) (spx_int32_t, default is off)
SPEEX_GET_VBR† Get variable bit-rate (VBR) status (spx_int32_t)
SPEEX_SET_VBR_QUALITY† Set the encoder VBR speech quality (float 0.0 to 10.0, default is 8.0)
SPEEX_GET_VBR_QUALITY† Get the current encoder VBR speech quality (float 0 to 10)
SPEEX_SET_COMPLEXITY† Set the CPU resources allowed for the encoder (spx_int32_t from 1 to 10, default is 2)
SPEEX_GET_COMPLEXITY† Get the CPU resources allowed for the encoder (spx_int32_t from 1 to 10, default is 2)
SPEEX_SET_BITRATE† Set the bit-rate to use the closest value not exceeding the parameter (spx_int32_t in bits per second)
SPEEX_GET_BITRATE Get the current bit-rate in use (spx_int32_t in bits per second)
SPEEX_SET_SAMPLING_RATE Set real sampling rate (spx_int32_t in Hz)
SPEEX_GET_SAMPLING_RATE Get real sampling rate (spx_int32_t in Hz)
SPEEX_RESET_STATE Reset the encoder/decoder state to its original state, clearing all memories (no argument)
SPEEX_SET_VAD† Set voice activity detection (VAD) to on (1) or off (0) (spx_int32_t, default is off)
SPEEX_GET_VAD† Get voice activity detection (VAD) status (spx_int32_t)
SPEEX_SET_DTX† Set discontinuous transmission (DTX) to on (1) or off (0) (spx_int32_t, default is off)
SPEEX_GET_DTX† Get discontinuous transmission (DTX) status (spx_int32_t)
SPEEX_SET_ABR† Set average bit-rate (ABR) to a value n in bits per second (spx_int32_t in bits per second)
SPEEX_GET_ABR† Get average bit-rate (ABR) setting (spx_int32_t in bits per second)
SPEEX_SET_PLC_TUNING† Tell the encoder to optimize encoding for a certain percentage of packet loss (spx_int32_t in percent)
SPEEX_GET_PLC_TUNING† Get the current tuning of the encoder for PLC (spx_int32_t in percent)
SPEEX_SET_VBR_MAX_BITRATE† Set the maximum bit-rate allowed in VBR operation (spx_int32_t in bits per second)
SPEEX_GET_VBR_MAX_BITRATE† Get the current maximum bit-rate allowed in VBR operation (spx_int32_t in bits per second)
SPEEX_SET_HIGHPASS Set the high-pass filter on (1) or off (0) (spx_int32_t, default is on)
SPEEX_GET_HIGHPASS Get the current high-pass filter status (spx_int32_t)
† applies only to the encoder
‡ applies only to the decoder

Speex编码器和解码器可以通过访问speex_encoder_ctl和speex_decoder_ctl函数来设置更多选项,类似于系统函数ioctl。它们的原型是:

void speex_encoder_ctl( void* encoder, int request, void* ptr );

void speex_decoder_ctl( void* decoder, int request, void* ptr );

尽管拥有这些函数,但一般的应用程序在默认情况下就足够,如果要设置则需了解并知道为什么需要它们,勿随变设置。

下面列出了各种需求的允许值,其中一些只能应用于编码器或解码器。因为最后一个参数是void指针,所以_ctl()函数不是类型安全的,应小心使用。spx_int32_t类型同C99中的int32_t。

SPEEX_SET_ENH:设置知觉增强,1开启,0关闭(spx_int32_t,默认开启)

SPEEX_GET_ENH:获得知觉增强状态( spx_int32_t)

SPEEX_SET_QUALITY:设置编码质量(spx_int32_t 从0~10,默认为8 )

SPEEX_GET_QUALITY:获得当前语音编码质量(spx_int32_t 从0~10 )

SPEEX_SET_MODE:设置模式,指明RTP协议规格(spx_int32_t)

SPEEX_GET_MODE:获得当前模式,指明的RTP协议规格(spx_int32_t)

SPEEX_SET_VBR:设置变比特率(VBR),1开启,0关闭(spx_int32_t, 默认关闭)

SPEEX_GET_VBR: 获得变比特率功能当前是否开启(spx_int32_t )

SPEEX_SET_VBR_QUALITY:设置变比特率语音的编码质量(浮点数从0.0~10.0,默认8.0)

SPEEX_GET_VBR_QUALITY:获得当前变比特率语音的编码质量( 浮点数从0.0~10.0)

SPEEX_SET_COMPLEXITY:设置编码器的可用CPU资源( spx_int32_t从1~10,默认为2)

SPEEX_GET_COMPLEXITY:获取编码器的可用CPU资源(spx_int32_t从1~10,默认为2)

SPEEX_SET_BITRATE:设置不超过参数设置的最佳比特值(spx_int32_t 单位bits/s )

SPEEX_GET_BITRATE:获取当前使用的比特率( spx_int32_t 单位 bits/s)

SPEEX_SET_SAMPLING_RATE:设置实时采样率(spx_int32_t 单位 Hz )

SPEEX_GET_SAMPLING_RATE:获取实时采样率(spx_int32_t 单位 Hz)

SPEEX_RESET_STATE:重置编/解码器到原始状态,并清除所有记忆(无参数)

SPEEX_SET_VAD:设置静音检测特性(VAD),1为打开,0为关闭( spx_int32_t, 默认为关闭)

SPEEX_GET_VAD:获取静音检测是否打开( spx_int32_t )

SPEEX_SET_DTX:设计非连续性传输(DTX),1为打开,0为关闭(spx_int32_t, 默认为关闭)

SPEEX_GET_DTX:获取非连续性传输(DTX)是否打开(spx_int32_t )

SPEEX_SET_ABR:设置平均比特率(ABR)值, 单位 bits/s(spx_int32_t,单位 bits/s )

SPEEX_GET_ABR:获得平均比特率设置(spx_int32_t,单位bits/s )

SPEEX_SET_PLC_TUNING:让编码器对一定的失包率开启最优化编码(spx_int32_t,单位 %)

SPEEX_GET_PLC_TUNING:获取编码器为PLC的当前调整(spx_int32_t,单位%)

SPEEX_SET_VBR_MAX_BITRATE:设置允许变比特率(VBR)使用的最大比特率(spx_int32_t,单位 bits/s )

SPEEX_GET_VBR_MAX_BITRATE:获取允许变比特率(VBR)使用的最大比特率(spx_int32_t,单位 bits/s )

SPEEX_SET_HIGHPASS:设置高通滤波器,1为打开,0为关闭(spx_int32_t,默认为打开)

SPEEX_GET_HIGHPASS:获取高通滤波器状态( spx_int32_t )

仅用于编/解码器。

 

5.4 模式查询

Speex modes have a query system similar to the speex_encoder_ctl and speex_decoder_ctl calls. Since modes are read-only,it is only possible to get information about a particular mode. The function used to do that is:
void speex_mode_query(SpeexMode *mode, int request, void *ptr);

类似于调用speex_encoder_ctl和speex_decoder_ctl,Speex有模式查询系统。因为模式是只读的,所以只能获得模式的详细信息。使用如下函数:

void speex_mode_query( SpeexMode* mode, int request, void* ptr );

 

The admissible values for request are (unless otherwise note, the values are returned through ptr):
SPEEX_MODE_FRAME_SIZE Get the frame size (in samples) for the mode
SPEEX_SUBMODE_BITRATE Get the bit-rate for a submode number specified through ptr (integer in bps).

受理的请求值(除非另有说明,要不返回值都是通过ptr):

SPEEX_MODE_FRAME_SIZE 获得模式的帧大小(样本中)

SPEEX_SUBMODE_BITRATE:获取通过ptr指定的子模式数量的比特率(以bps为单位的整数)

 

5.5 封包和带内信令

Sometimes it is desirable to pack more than one frame per packet (or other basic unit of storage). The proper way to do it is to call speex_encode N times before writing the stream with speex_bits_write. In cases where the number of frames is not determined by an out-of-band mechanism, it is possible to include a terminator code. That terminator consists of the code 15 (decimal) encoded with 5 bits, as shown in Table 9.2. Note that as of version 1.0.2, calling speex_bits_write automatically inserts the terminator so as to fill the last byte. This doesn’t involves any overhead and makes sure Speex can always detect when there is no more frame in a packet.

有时我们打包的数据不只一帧(或其他基本存储单元),正确做法是在用speex_bits_write写入流数据之前调用N次speex_encode。这种情况下的帧数不是由带外机制决定的,它会包含一个终结码。如表9.2所示,这个终结码是由用5bits编码的Mode 15组成。如果是1.0.2版本需注意,调用speex_bits_write时,为了填充最后字节,它会自动添加终结码。这不会增加开销,并能确保Speex一直检测到包中没有更多帧为止。

 

It is also possible to send in-band “messages” to the other side. All these messages are encoded as “pseudo-frames” of mode 14 which contain a 4-bit message type code, followed by the message. Table 5.1 lists the available codes, their meaning and the size of the message that follows. Most of these messages are requests that are sent to the encoder or decoder on the other end, which is free to comply or ignore them. By default, all in-band messages are ignored.

当然也可以通过带内“消息”的方法,所有这些消息是作为Mode14的“伪帧”编码的,Mode14包含4bit的消息类型代码。表5.1列出了可用代码的说明和大小,发送给编/解码器的的消息大部分都可随意的被接受或被忽略。默认情况下,所有带内消息都被忽略掉了。

In-band signalling codes

 

表5.1 带内信号代码

Finally, applications may define custom in-band messages using mode 13. The size of the message in bytes is encoded with 5 bits, so that the decoder can skip it if it doesn’t know how to interpret it.

最后,一些应用会使用Mode 13自定义带内消息,消息的字节大小是用5bits编码的,所以如果编码器不知道如何解析它就会跳过。

 

补充:

本是第9章--Speex窄带模式中的图和表格,但本章中需要参考,贴上来

Analysis-by-synthesis closed-loop optimization on a sub-frame

Figure 9.2: Analysis-by-synthesis closed-loop optimization on a sub-frame.

 

 Quality versus bit rate

Table 9.2: Quality versus bit-rate

 

后记

因为时间问题(英语学的太菜,翻译对偶来说是件困难的事情),所以直接到第5章了,第3章--编译和移植和第4章--编/解码器命令行以后再进行翻译整理。

posted on 2011-09-06 14:47  stay  阅读(6545)  评论(0编辑  收藏  举报