代码改变世界

FFMpeg笔记(三) 音频处理基本概念及音频重采样

2018-04-05 22:46  jiayayao  阅读(17450)  评论(0编辑  收藏  举报

    Android放音的采样率固定为44.1KHz,录音的采样率固定为8KHz,因此底层的音频设备驱动需要设置好这两个固定的采样率。如果上层传过来的采样率不符的话,需要进行resample重采样处理。

    几个名词:

1. 采样率

    采样设备每秒抽取样本的次数

2. 音频格式及量化精度(位宽)

    每种音频格式有不同的量化精度(位宽),位数越多,表示值就越精确,声音表现自然就越精准。FFMpeg中音频格式有以下几种,每种格式有其占用的字节数信息:

enum AVSampleFormat {
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits
    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double

    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar
    AV_SAMPLE_FMT_DBLP,        ///< double, planar
    AV_SAMPLE_FMT_S64,         ///< signed 64 bits
    AV_SAMPLE_FMT_S64P,        ///< signed 64 bits, planar

    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

 3. 分片(plane)和打包(packed)

    以双声道为例,带P(plane)的数据格式在存储时,其左声道和右声道的数据是分开存储的,左声道的数据存储在data[0],右声道的数据存储在data[1],每个声道的所占用的字节数为linesize[0]和linesize[1],frame.data[i]或者frame.extended_data[i]表示第i个声道的数据;

    不带P(packed)的音频数据在存储时,是按照LRLRLR...的格式交替存储在data[0]中,linesize[0]表示总的数据量,frame.data[0]或frame.extended_data[0]包含所有的音频数据中

 

4. 声道分布(channel_layout)

    声道分布在FFmpeg\libavutil\channel_layout.h中有定义,一般来说用的比较多的是AV_CH_LAYOUT_STEREO(双声道)和AV_CH_LAYOUT_SURROUND(三声道),这两者的定义如下:

#define AV_CH_LAYOUT_STEREO            (AV_CH_FRONT_LEFT|AV_CH_FRONT_RIGHT)
#define AV_CH_LAYOUT_SURROUND          (AV_CH_LAYOUT_STEREO|AV_CH_FRONT_CENTER)

5. 音频帧的数据量计算

    一帧音频的数据量=channel数 * nb_samples样本数 * 每个样本占用的字节数

    如果该音频帧是FLTP格式的PCM数据,包含1024个样本,双声道,那么该音频帧包含的音频数据量是2*1024*4=8192字节。

6. 音频播放时间计算

    以采样率44100Hz来计算,每秒44100个sample,而正常一帧为1024个sample,可知每帧播放时间/1024=1000ms/44100,得到每帧播放时间=1024*1000/44100=23.2ms。

7. 音频重采样(resample)

    FFMpeg自带的resample例子:FFmpeg\doc\examples\resampling_audio.c,这里把最核心的resample代码贴一下,在工程中使用时,注意设置的各种参数,给定的输入数据都不能错。

int main(int argc, char **argv)
{
// 设置数据源src和dst声道布局 int64_t src_ch_layout
= AV_CH_LAYOUT_STEREO, dst_ch_layout = AV_CH_LAYOUT_SURROUND;
// 设置src和dst采样率
int src_rate = 48000, dst_rate = 44100; uint8_t **src_data = NULL, **dst_data = NULL; int src_nb_channels = 0, dst_nb_channels = 0; int src_linesize, dst_linesize; int src_nb_samples = 1024, dst_nb_samples, max_dst_nb_samples;
// 设置src和dst音频格式
enum AVSampleFormat src_sample_fmt = AV_SAMPLE_FMT_DBL, dst_sample_fmt = AV_SAMPLE_FMT_S16; const char *dst_filename = NULL; FILE *dst_file; int dst_bufsize; const char *fmt;
// 重采样上下文,包含resample信息
struct SwrContext *swr_ctx; double t; int ret; if (argc != 2) { fprintf(stderr, "Usage: %s output_file\n" "API example program to show how to resample an audio stream with libswresample.\n" "This program generates a series of audio frames, resamples them to a specified " "output format and rate and saves them to an output file named output_file.\n", argv[0]); exit(1); }
// resample后的数据保存到本地文件 dst_filename
= argv[1]; dst_file = fopen(dst_filename, "wb"); if (!dst_file) { fprintf(stderr, "Could not open destination file %s\n", dst_filename); exit(1); } /* create resampler context */ swr_ctx = swr_alloc(); if (!swr_ctx) { fprintf(stderr, "Could not allocate resampler context\n"); ret = AVERROR(ENOMEM); goto end; } /* set options */ // 将resample信息写入resample上下文
av_opt_set_int(swr_ctx,
"in_channel_layout", src_ch_layout, 0); av_opt_set_int(swr_ctx, "in_sample_rate", src_rate, 0); av_opt_set_sample_fmt(swr_ctx, "in_sample_fmt", src_sample_fmt, 0); av_opt_set_int(swr_ctx, "out_channel_layout", dst_ch_layout, 0); av_opt_set_int(swr_ctx, "out_sample_rate", dst_rate, 0); av_opt_set_sample_fmt(swr_ctx, "out_sample_fmt", dst_sample_fmt, 0); /* initialize the resampling context */ if ((ret = swr_init(swr_ctx)) < 0) { fprintf(stderr, "Failed to initialize the resampling context\n"); goto end; } /* allocate source and destination samples buffers */ src_nb_channels = av_get_channel_layout_nb_channels(src_ch_layout); ret = av_samples_alloc_array_and_samples(&src_data, &src_linesize, src_nb_channels, src_nb_samples, src_sample_fmt, 0); if (ret < 0) { fprintf(stderr, "Could not allocate source samples\n"); goto end; } /* compute the number of converted samples: buffering is avoided * ensuring that the output buffer will contain at least all the * converted input samples */ max_dst_nb_samples = dst_nb_samples = av_rescale_rnd(src_nb_samples, dst_rate, src_rate, AV_ROUND_UP); /* buffer is going to be directly written to a rawaudio file, no alignment */ dst_nb_channels = av_get_channel_layout_nb_channels(dst_ch_layout); ret = av_samples_alloc_array_and_samples(&dst_data, &dst_linesize, dst_nb_channels, dst_nb_samples, dst_sample_fmt, 0); if (ret < 0) { fprintf(stderr, "Could not allocate destination samples\n"); goto end; } t = 0; do { /* generate synthetic audio */
// 这里是自行生成源数据帧,实际工程中应该将解码后的PCM数据填入src_data中 fill_samples((double *)src_data[0], src_nb_samples, src_nb_channels, src_rate, &t); /* compute destination number of samples */ dst_nb_samples = av_rescale_rnd(swr_get_delay(swr_ctx, src_rate) + src_nb_samples, dst_rate, src_rate, AV_ROUND_UP); if (dst_nb_samples > max_dst_nb_samples) { av_freep(&dst_data[0]); ret = av_samples_alloc(dst_data, &dst_linesize, dst_nb_channels, dst_nb_samples, dst_sample_fmt, 1); if (ret < 0) break; max_dst_nb_samples = dst_nb_samples; } /* convert to destination format */ // 重采样操作
ret
= swr_convert(swr_ctx, dst_data, dst_nb_samples, (const uint8_t **)src_data, src_nb_samples); if (ret < 0) { fprintf(stderr, "Error while converting\n"); goto end; } dst_bufsize = av_samples_get_buffer_size(&dst_linesize, dst_nb_channels, ret, dst_sample_fmt, 1); if (dst_bufsize < 0) { fprintf(stderr, "Could not get sample buffer size\n"); goto end; } printf("t:%f in:%d out:%d\n", t, src_nb_samples, ret); fwrite(dst_data[0], 1, dst_bufsize, dst_file); } while (t < 10); if ((ret = get_format_from_sample_fmt(&fmt, dst_sample_fmt)) < 0) goto end; fprintf(stderr, "Resampling succeeded. Play the output file with the command:\n" "ffplay -f %s -channel_layout %"PRId64" -channels %d -ar %d %s\n", fmt, dst_ch_layout, dst_nb_channels, dst_rate, dst_filename); end: fclose(dst_file); if (src_data) av_freep(&src_data[0]); av_freep(&src_data); if (dst_data) av_freep(&dst_data[0]); av_freep(&dst_data); swr_free(&swr_ctx); return ret < 0; }