USB拾音器默认采样pcm_s16le格式数据,即PCM signed 16-bit little-endian,可通过如下命令保存原始PCM数据:
ffmpeg -y -f alsa -thread_queue_size 2048 -ar 22050 -ac 1 -i hw:1,0 -f s16le -c:a copy -t 10 raw.pcm
- 采样率(rate):8kHz、11.025kHz、22.05kHz、16kHz、37.8kHz、44.1kHz、48kHz等。
- 采样数(sample):采样点个数,AAC为1024,mp3位1152。
- 采样值格式(format):一个采样点或一次采样的数值格式,U8,S16,FLT等。
- 通道数:1单声道,2立体声
- 通道布局:交错或平面(非交错)。
// 音频相关部分成员
typedef struct AVFrame{
uint8_t *data[AV_NUM_DATA_POINTERS];
* For video, size in bytes of each picture line.
* For audio, size in bytes of each plane.
* For audio, only linesize[0] may be set. For planar audio, each channel
* plane must be the same size.
* For video the linesizes should be multiples of the CPUs alignment
* preference, this is 16 or 32 for modern desktop CPUs.
* Some code requires such alignment other code can be slower without
* correct alignment, for yet other it makes no difference.
* @note The linesize may be larger than the size of usable data -- there
* may be extra padding present for performance reasons.
int linesize[AV_NUM_DATA_POINTERS];
* pointers to the data planes/channels.
* For video, this should simply point to data[].
* For planar audio, each channel has a separate data pointer, and
* linesize[0] contains the size of each channel buffer.
* For packed audio, there is just one data pointer, and linesize[0]
* contains the total size of the buffer for all channels.
* Note: Both data and extended_data should always be set in a valid frame,
* but for planar audio with more channels that can fit in data,
* extended_data must be used in order to access all channels.
uint8_t **extended_data;
* number of audio samples (per channel) described by this frame
int nb_samples;
* format of the frame, -1 if unknown or unset
* Values correspond to enum AVPixelFormat for video frames,
* enum AVSampleFormat for audio)
int format
1.1 音频数据存储
成员extended_data指向了data,是一个拓展,上面可以看到data 是包含8个指针的数组,也就是说对于音频,最多只支持8个声道
// 音频交错格式
AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_S32, ///< signed 32 bits
AV_SAMPLE_FMT_FLT, ///< float
AV_SAMPLE_FMT_DBL, ///< double
只能保存在AVFrame的uint8_t *data[0]; 音频保持格式:LRLRLR......
AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP, ///< float, planar //aac只支持此格式
AV_SAMPLE_FMT_DBLP, ///< double, planar
plane 0对应uint8_t *data[0];plane 1对应uint8_t *data[1]。
int av_samples_get_buffer_size(int *linesize,
int nb_channels, int nb_samples,
enum AVSampleFormat sample_fmt, int align)
1.2 音频帧数据大小
- AVCodecContext中int frame_size
//Audio only, Samples per packet.
// 对于ffmpeg音频的codec,好像每次只能编这个数量的采样
- AVFrame中int nb_samples
//number of audio samples (per channel) described by this frame
// 对于ffmpeg音频的frame,表示frame中采样的数量
一般设置AVFrame.nb_samples = AVCodecContext.frame_size;
// For audio: If AV_CODEC_CAP_VARIABLE_FRAME_SIZE is set, then each frame can have any number of samples. If it is not set, frame->nb_samples must be equal to avctx->frame_size for all frames except the last.
//如果 AV_CODEC_CAP_VARIABLE_FRAME_SIZE(在 AVCodecContext.codec.capabilities 变量中,只读)标志有效,表示编码器支持可变尺寸音频帧,送入编码器的音频帧可以包含任意数量的采样点。如果此标志无效,则每一个音频帧的采样点数目(frame->nb_samples)必须等于编码器设定的音频帧尺寸(avctx->frame_size),最后一帧除外,最后一帧音频帧采样点数可以小于 avctx->frame_size
编码过程中的音频帧尺寸:当编码器 AV_CODEC_CAP_VARIABLE_FRAME_SIZE 标志有效时,音频帧尺寸是可变的,AVFrame.nb_samples值可能为0;否则,AVFrame.nb_samples必须等于AVCodecContext.frame_size(最后一帧可以小于)。
上述代码中第一个判断条件是 "(stream.o_codec_ctx->codec->capabilities & AV_CODEC_CAP_VARIABLE_FRAME_SIZE) == 0)", 第二个判断条件是 "(stream.i_codec_ctx->frame_size != stream.o_codec_ctx->frame_size)"。如果编码器不支持可变尺寸音频帧(第一个判断条件生效),而原始音频帧的尺寸又和编码器帧尺寸不一样(第二个判断条件生效),则需要引入音频帧 FIFO,以保证每次从 FIFO 中取出的音频帧尺寸和编码器帧尺寸一样。音频 FIFO 输出的音频帧不含时间戳信息,因此需要重新生成时间戳。
pAudioFrame = avcodec_alloc_frame();
pAudioFrame->nb_samples= pAudioEncodeCtx->frame_size;
pAudioFrame->format= pAudioEncodeCtx->sample_fmt;
//依据channel,nb_sample,sample_fmt 计算frame的数据块的大小
int size = av_samples_get_buffer_size(NULL, pAudioEncodeCtx->channels, pAudioEncodeCtx->frame_size, pAudioEncodeCtx->sample_fmt, 1);
uint8_t * frame_buf = (uint8_t *)av_malloc(size);
//依据channel,nb_sample,sample_fmt 及frame的数据块的大小,设置frame中的信息
avcodec_fill_audio_frame(pAudioFrame, pAudioEncodeCtx->channels, pAudioEncodeCtx->sample_fmt,(const uint8_t*)frame_buf, size, 1);
while (1){
int readSize = fread(frame_buf, 1, size, fInputPCM);
if (readSize <= 0) {
pAudioFrame->data[0] = frame_buf; //采样信号
int got_frame = 0;
int ret = avcodec_encode_audio2(pAudioEncodeCtx, &AudioPacket, pAudioFrame, &got_frame);
nb_samples和frame_size = 1024
双声道一帧数据量:1024 x 2 x av_get_bytes_per_sample(fltp) = 8192个字节。
- MP3
nb_samples和frame_size = 1152
双声道一帧数据量:1152 x 2 x av_get_bytes_per_sample(s32p) = 9216个字节。
A HE-AAC v1 or v2 audio frame contains 2048 PCM samples per channel (there is
also one mode with 1920 samples per channel but this is only for special purposes
such as DAB+ digital radio).
These bits/frame figures are average figures where each AAC frame generally has a different
size in bytes. To calculate the same for AAC-LC just use 1024 instead of 2048 PCM samples per
frame and channel.
For AAC-LD/ELD it is either 480 or 512 PCM samples per frame and channel.
当aac编码级别是LC时frame_size 和nb_samples是1024,如果是HE的时候是2048。
//这里的最后一个参数一定要注意用pInputFrame->nb_samples* per_sample_in,以AAC举例子,AVCodecContext中的profile会有LC,HE等不同,
ret = avcodec_fill_audio_frame(pInputFrame,Channel_in,SampleFormat_in,buf_in,buf_size_in,pInputFrame->nb_samples* per_sample_in);
1.3 音频格式
// libavutil/channel_layout.h
#define AV_CH_FRONT_LEFT 0x00000001
#define AV_CH_FRONT_RIGHT 0x00000002
#define AV_CH_FRONT_CENTER 0x00000004
#define AV_CH_LOW_FREQUENCY 0x00000008
#define AV_CH_BACK_LEFT 0x00000010
#define AV_CH_BACK_RIGHT 0x00000020
#define AV_CH_FRONT_LEFT_OF_CENTER 0x00000040
#define AV_CH_FRONT_RIGHT_OF_CENTER 0x00000080
#define AV_CH_BACK_CENTER 0x00000100
#define AV_CH_SIDE_LEFT 0x00000200
#define AV_CH_SIDE_RIGHT 0x00000400
#define AV_CH_TOP_CENTER 0x00000800
#define AV_CH_TOP_FRONT_LEFT 0x00001000
#define AV_CH_TOP_FRONT_CENTER 0x00002000
#define AV_CH_TOP_FRONT_RIGHT 0x00004000
#define AV_CH_TOP_BACK_LEFT 0x00008000
#define AV_CH_TOP_BACK_CENTER 0x00010000
#define AV_CH_TOP_BACK_RIGHT 0x00020000
#define AV_CH_STEREO_LEFT 0x20000000 ///< Stereo downmix.
#define AV_CH_STEREO_RIGHT 0x40000000 ///< See AV_CH_STEREO_LEFT.
#define AV_CH_WIDE_LEFT 0x0000000080000000ULL
#define AV_CH_WIDE_RIGHT 0x0000000100000000ULL
#define AV_CH_SURROUND_DIRECT_LEFT 0x0000000200000000ULL
#define AV_CH_SURROUND_DIRECT_RIGHT 0x0000000400000000ULL
#define AV_CH_LOW_FREQUENCY_2 0x0000000800000000ULL
/** Channel mask value used for AVCodecContext.request_channel_layout
to indicate that the user requests the channel order of the decoder output
to be the native codec channel order. */
#define AV_CH_LAYOUT_NATIVE 0x8000000000000000ULL
* @}
* @defgroup channel_mask_c Audio channel layouts
* @{
* */
// libavutil/channel_layout.h
* Return default channel layout for a given number of channels.
int64_t av_get_default_channel_layout(int nb_channels);
// libavutil/samplefmt.h
enum AVSampleFormat {
AV_SAMPLE_FMT_U8, ///< unsigned 8 bits
AV_SAMPLE_FMT_S16, ///< signed 16 bits
AV_SAMPLE_FMT_S32, ///< signed 32 bits
AV_SAMPLE_FMT_FLT, ///< float
AV_SAMPLE_FMT_DBL, ///< double
AV_SAMPLE_FMT_U8P, ///< unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P, ///< signed 16 bits, planar
AV_SAMPLE_FMT_S32P, ///< signed 32 bits, planar
AV_SAMPLE_FMT_FLTP, ///< float, planar //aac只支持此格式
AV_SAMPLE_FMT_DBLP, ///< double, planar
AV_SAMPLE_FMT_S64, ///< signed 64 bits
AV_SAMPLE_FMT_S64P, ///< signed 64 bits, planar
AV_SAMPLE_FMT_NB ///< Number of sample formats. DO NOT USE if linking dynamically
const char *av_get_sample_fmt_name(enum AVSampleFormat sample_fmt);
char *av_get_sample_fmt_string(char *buf, int buf_size, enum AVSampleFormat sample_fmt);
int av_get_bytes_per_sample(enum AVSampleFormat sample_fmt);
int av_samples_copy(uint8_t **dst, uint8_t * const *src, int dst_offset,
int src_offset, int nb_samples, int nb_channels,
enum AVSampleFormat sample_fmt);
// libavutil/samplefmt.h
* Get the required buffer size for the given audio parameters.
* @param[out] linesize calculated linesize, may be NULL
* @param nb_channels the number of channels
* @param nb_samples the number of samples in a single channel
* @param sample_fmt the sample format
* @param align buffer size alignment (0 = default, 1 = no alignment)
* @return required buffer size, or negative error code on failure
int av_samples_get_buffer_size(int *linesize, int nb_channels, int nb_samples,
enum AVSampleFormat sample_fmt, int align)
int line_size;
int sample_size = av_get_bytes_per_sample(sample_fmt);
int planar = av_sample_fmt_is_planar(sample_fmt);
/* validate parameter ranges */
if (!sample_size || nb_samples <= 0 || nb_channels <= 0)
/* auto-select alignment if not specified */
if (!align) {
if (nb_samples > INT_MAX - 31)
align = 1;
nb_samples = FFALIGN(nb_samples, 32);
/* check for integer overflow */
if (nb_channels > INT_MAX / align ||
(int64_t)nb_channels * nb_samples > (INT_MAX - (align * nb_channels)) / sample_size)
line_size = planar ? FFALIGN(nb_samples * sample_size, align) :
FFALIGN(nb_samples * sample_size * nb_channels, align);
if (linesize)
*linesize = line_size;
return planar ? line_size * nb_channels : line_size;
* Allocate a samples buffer for nb_samples samples, and fill data pointers and
* linesize accordingly.
* The allocated samples buffer can be freed by using av_freep(&audio_data[0])
* Allocated data will be initialized to silence.
* @see enum AVSampleFormat
* The documentation for AVSampleFormat describes the data layout.
* @param[out] audio_data array to be filled with the pointer for each channel
* @param[out] linesize aligned size for audio buffer(s), may be NULL
* @param nb_channels number of audio channels
* @param nb_samples number of samples per channel
* @param align buffer size alignment (0 = default, 1 = no alignment)
* @return >=0 on success or a negative error code on failure
* @todo return the size of the allocated buffer in case of success at the next bump
* @see av_samples_fill_arrays()
* @see av_samples_alloc_array_and_samples()
int av_samples_alloc(uint8_t **audio_data, int *linesize, int nb_channels,
int nb_samples, enum AVSampleFormat sample_fmt, int align);
* Allocate a data pointers array, samples buffer for nb_samples
* samples, and fill data pointers and linesize accordingly.
* This is the same as av_samples_alloc(), but also allocates the data
* pointers array.
* @see av_samples_alloc()
int av_samples_alloc_array_and_samples(uint8_t ***audio_data, int *linesize, int nb_channels,
int nb_samples, enum AVSampleFormat sample_fmt, int align);
Resampler用于转换音频采样格式,而FIFO buffer用于储存音频采样以编码。
音频交互使用SwrContext(通过swr_alloc()或swr_alloc_set_opts()分配),参数必须通过AVOptions设置。调用swr_init()初始化SwrContext,音频转换通过重复调用swr_convert(),At the end of conversion the resampling buffer can be flushed by calling swr_convert() with NULL in and 0 in_count.最后swr_free()释放。
// the following code will setup conversion from planar float sample format to interleaved signed 16-bit integer,
// downsampling from 48kHz to 44.1kHz and downmixing from 5.1 channels to stereo (using the default mixing matrix).
SwrContext *swr = swr_alloc();
av_opt_set_channel_layout(swr, "in_channel_layout", AV_CH_LAYOUT_5POINT1, 0);
av_opt_set_channel_layout(swr, "out_channel_layout", AV_CH_LAYOUT_STEREO, 0);
av_opt_set_int(swr, "in_sample_rate", 48000, 0);
av_opt_set_int(swr, "out_sample_rate", 44100, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLTP, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);
uint8_t **input;
int in_samples;
while (get_input(&input, &in_samples)) {
uint8_t *output;
int out_samples = av_rescale_rnd(swr_get_delay(swr, 48000) +in_samples, 44100, 48000, AV_ROUND_UP);
av_samples_alloc(&output, NULL, 2, out_samples, AV_SAMPLE_FMT_S16, 0);
out_samples = swr_convert(swr, &output, out_samples, input, in_samples);
handle_output(output, out_samples);
3.1 重采样函数
struct SwrContext *swr_alloc(void);
struct SwrContext *swr_alloc_set_opts(struct SwrContext *s,
int64_t out_ch_layout, enum AVSampleFormat out_sample_fmt, int out_sample_rate,
int64_t in_ch_layout, enum AVSampleFormat in_sample_fmt, int in_sample_rate,
int log_offset, void *log_ctx);
int swr_init(struct SwrContext *s);
void swr_free(struct SwrContext **s);
/** Convert audio.
* in and in_count can be set to 0 to flush the last few samples out at the
* end.
* If more input is provided than output space, then the input will be buffered.
* You can avoid this buffering by using swr_get_out_samples() to retrieve an
* upper bound on the required number of output samples for the given number of
* input samples. Conversion will run directly without copying whenever possible.
* @param s allocated Swr context, with parameters set
* @param out output buffers, only the first one need be set in case of packed audio
* @param out_count amount of space available for output in samples per channel
* @param in input buffers, only the first one need to be set in case of packed audio
* @param in_count number of input samples available in one channel
* @return number of samples output per channel, negative value on error
int swr_convert(struct SwrContext *s,
uint8_t **out, int out_count,
const uint8_t **in , int in_count);
* Gets the delay the next input sample will experience relative to the next output sample.
* Swresample can buffer data if more input has been provided than available
* output space, also converting between sample rates needs a delay.
* This function returns the sum of all such delays.
* The exact delay is not necessarily an integer value in either input or
* output sample rate. Especially when downsampling by a large value, the
* output sample rate may be a poor choice to represent the delay, similarly
* for upsampling and the input sample rate.
* @param s swr context
* @param base timebase in which the returned delay will be:
* @li if it's set to 1 the returned delay is in seconds
* @li if it's set to 1000 the returned delay is in milliseconds
* @li if it's set to the input sample rate then the returned
* delay is in input samples
* @li if it's set to the output sample rate then the returned
* delay is in output samples
* @li if it's the least common multiple of in_sample_rate and
* out_sample_rate then an exact rounding-free delay will be
* returned
* @returns the delay in 1 / @c base units.
int64_t swr_get_delay(struct SwrContext *s, int64_t base);
4. 编译
==16296== LEAK SUMMARY:
==16296== definitely lost: 0 bytes in 0 blocks
==16296== indirectly lost: 0 bytes in 0 blocks
==16296== possibly lost: 22,748 bytes in 1,216 blocks
==16296== still reachable: 164 bytes in 6 blocks
参考:alsa - mem leak? - stackoverflow
【推荐】还在用 ECharts 开发大屏?试试这款永久免费的开源 BI 工具!
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步