webrtc音频处理算法

音频处理有以下几个模块:

1.降噪NS

2.回音消除aec

3.回声控制acem

4.音频增益agc

5.静音检测

1.降噪NS -noice_suppression.h

原理:

维纳滤波原理

输入信号通过一个线性时不变系统之后产生一个输出信号,使得输出信号尽量逼近期望信号,使其估计误差最小化,

能够最小化这个估计误差的最优滤波器称为维纳滤波器。

基本思想:

对接收到的每一帧带噪语音信号,以对该帧的初始噪声估计为前提,定义语音概率函数,测量每一帧带噪信号的分类特征,使用测量出来的分类特征,计算每一帧基于多特征的语音概率,在对计算出的语音概率进行动态因子(信号分类特征和阈值参数)加权,根据计算出的每帧基于特征的语音概率,修改多帧中每一帧的语音概率函数,以及使用修改后每帧语音概率函数,更新每帧中的初始噪声(连续多帧中每一帧的分位数噪声)估计.

 

特征值

 

 

 

频谱平坦度计算(Compute Spectral Flatness)

假设语音比噪声有更多的谐波,语音频谱往往会在基频(基音)和谐波中出现峰值,噪声相对平坦。

计算公式:

 

 

N:STFT后频率点数

B:频率带数量

K:频点指数

j:频带指数

每个频带有大量频率点,128个频率点可分成4个频带(低带,中低频带,中高频带,高频),每个频带32个频点

对于噪声Flatness偏大且为常数,而对于语音,计算出的数量则偏下且为变量。

对应代码:计算特征数据光谱平坦度featuredata[0] 

// Compute spectral flatness on input spectrum 计算
// magnIn is the magnitude spectrum
// spectral flatness is returned in inst->featureData[0]
void WebRtcNs_ComputeSpectralFlatness(NSinst_t* inst, float* magnIn) {
  int i;
  int shiftLP = 1; //option to remove first bin(s) from spectral measures
  float avgSpectralFlatnessNum, avgSpectralFlatnessDen, spectralTmp;

  // comute spectral measures
  // for flatness  跳过第一个频点,即直流频点Den是denominator(分母)的缩写,avgSpectralFlatnessDen是上述公式分母计算用到的  
  avgSpectralFlatnessNum = 0.0;
  avgSpectralFlatnessDen = inst->sumMagn;
  for (i = 0; i < shiftLP; i++) {
    avgSpectralFlatnessDen -= magnIn[i];
  }
  // compute log of ratio of the geometric to arithmetic mean: check for log(0) case
  //TVAG是time-average的缩写,对于能量出现异常的处理。利用前一次平坦度直接取平均返回。  
  for (i = shiftLP; i < inst->magnLen; i++) {
    if (magnIn[i] > 0.0) {
      avgSpectralFlatnessNum += (float)log(magnIn[i]);
    } else {
      inst->featureData[0] -= SPECT_FL_TAVG * inst->featureData[0];
      return;
    }
  }
  //normalize
  avgSpectralFlatnessDen = avgSpectralFlatnessDen / inst->magnLen;
  avgSpectralFlatnessNum = avgSpectralFlatnessNum / inst->magnLen;

  //ratio and inverse log: check for case of log(0)
  spectralTmp = (float)exp(avgSpectralFlatnessNum) / avgSpectralFlatnessDen;

  //time-avg update of spectral flatness feature
  inst->featureData[0] += SPECT_FL_TAVG * (spectralTmp - inst->featureData[0]);
  //inst->featureData[0] +=0-SPECT_FL_TAVG * inst->featureData[0]
  // done with flatness feature
}

 代码逻辑

 

 

Compute Spectral Difference 计算频谱差异

假设:噪声频谱比语音频谱更稳定,因此,假设噪声频谱体形状在任何给定阶段都倾向于保持相同,
此特征用于测量输入频谱与噪声频谱形状的偏差

通过相关系数计算差异性

相关系数计算:cov(x,y)^2/(D(x)D(Y))

 代码:// Compute the difference measure between input spectrum and a template/learned noise spectrum

// magnIn is the input spectrum
// the reference/template spectrum is inst->magnAvgPause[i]
// returns (normalized) spectral difference in inst->featureData[4]
void WebRtcNs_ComputeSpectralDifference(NSinst_t* inst, float* magnIn) {
  // avgDiffNormMagn = var(magnIn) - cov(magnIn, magnAvgPause)^2 / var(magnAvgPause)
  int i;
  float avgPause, avgMagn, covMagnPause, varPause, varMagn, avgDiffNormMagn;

  avgPause = 0.0;
  avgMagn = inst->sumMagn;
  // compute average quantities
  for (i = 0; i < inst->magnLen; i++) {
    //conservative smooth noise spectrum from pause frames
    avgPause += inst->magnAvgPause[i];
  }
avgPause
= avgPause / ((float)inst->magnLen); avgMagn = avgMagn / ((float)inst->magnLen); covMagnPause = 0.0; varPause = 0.0; varMagn = 0.0; // compute variance and covariance quantities
//
covMagnPause 协方差 varPause varMagn 各自的方差

for (i = 0; i < inst->magnLen; i++) { covMagnPause += (magnIn[i] - avgMagn) * (inst->magnAvgPause[i] - avgPause); varPause += (inst->magnAvgPause[i] - avgPause) * (inst->magnAvgPause[i] - avgPause); varMagn += (magnIn[i] - avgMagn) * (magnIn[i] - avgMagn); } covMagnPause = covMagnPause / ((float)inst->magnLen); varPause = varPause / ((float)inst->magnLen); varMagn = varMagn / ((float)inst->magnLen); // update of average magnitude spectrum inst->featureData[6] += inst->signalEnergy; //计算相关系数 avgDiffNormMagn = varMagn - (covMagnPause * covMagnPause) / (varPause + (float)0.0001); // normalize and compute time-avg update of difference feature 归一化 avgDiffNormMagn = (float)(avgDiffNormMagn / (inst->featureData[5] + (float)0.0001)); inst->featureData[4] += SPECT_DIFF_TAVG * (avgDiffNormMagn - inst->featureData[4]); }

 

Compute SNR

根据分位数噪声估计计算前后信噪比。

后验信噪比:观测到的能量与噪声功率相关输入功率相比的瞬态SNR:

Y:输入含有噪声的频谱

N:噪声频谱

 

 

 

 

先验SNR是与噪声功率相关的纯净信号功率的期望值

X:输入的纯净信号(语音信号)

 

 

webrtc实际计算采用数量级而非平方数量级

纯净信号是未知信号,先验 SNR的估计是上一帧经估计的先验SNR和瞬态SNR的平均值

ydd:时间平滑参数 值越大 流畅度越高 延迟越大

H:smooth 上一帧的维纳滤波器

 

 

 

 相关代码计算snrloc snrpri snrlocpost

  1011        // directed decision update of snrPrior
  1012:       snrLocPrior[i] = DD_PR_SNR * previousEstimateStsa[i] + ((float)1.0 - DD_PR_SNR)* snrLocPost[i];
  1013        // post and prior snr needed for step 2

  1106        // directed decision update of snrPrior
  1107:       snrPrior = DD_PR_SNR * previousEstimateStsa[i] + ((float)1.0 - DD_PR_SNR)
  1108                   * currentEstimateStsa;

 

lrt因素平均值 feature[3]

 

// compute feature based on average LR factor
  // this is the average over all frequencies of the smooth log lrt
  logLrtTimeAvgKsum = 0.0;
  for (i = 0; i < inst->magnLen; i++) {
    tmpFloat1 = (float)1.0 + (float)2.0 * snrLocPrior[i];
    tmpFloat2 = (float)2.0 * snrLocPrior[i] / (tmpFloat1 + (float)0.0001);
    besselTmp = (snrLocPost[i] + (float)1.0) * tmpFloat2;
    inst->logLrtTimeAvg[i] += LRT_TAVG * (besselTmp - (float)log(tmpFloat1)
                                          - inst->logLrtTimeAvg[i]);
    logLrtTimeAvgKsum += inst->logLrtTimeAvg[i];
  }
  logLrtTimeAvgKsum = (float)logLrtTimeAvgKsum / (inst->magnLen);
  inst->featureData[3] = logLrtTimeAvgKsum;
  // done with computation of LR factor

 

 

 

 

/*
 *  Copyright (c) 2012 The WebRTC project authors. All Rights Reserved.
 *
 *  Use of this source code is governed by a BSD-style license
 *  that can be found in the LICENSE file in the root of the source
 *  tree. An additional intellectual property rights grant can be found
 *  in the file PATENTS.  All contributing project authors may
 *  be found in the AUTHORS file in the root of the source tree.
 */

#ifndef WEBRTC_MODULES_AUDIO_PROCESSING_NS_INCLUDE_NOISE_SUPPRESSION_H_
#define WEBRTC_MODULES_AUDIO_PROCESSING_NS_INCLUDE_NOISE_SUPPRESSION_H_

#include "typedefs.h"

typedef struct NsHandleT NsHandle;

#ifdef __cplusplus
extern "C" {
#endif

/*
 * This function creates an instance to the noise suppression structure
 *
 * Input:
 *      - NS_inst       : Pointer to noise suppression instance that should be
 *                        created
 *
 * Output:
 *      - NS_inst       : Pointer to created noise suppression instance
 *
 * Return value         :  0 - Ok
 *                        -1 - Error
 */
//输入应创建的降噪实例指针,输出已创建的降噪实例指针,返回成功获失败
int WebRtcNs_Create(NsHandle** NS_inst); /* * This function frees the dynamic memory of a specified noise suppression * instance. * * Input: * - NS_inst : Pointer to NS instance that should be freed * * Return value : 0 - Ok * -1 - Error */
//释放降噪指针空间
int WebRtcNs_Free(NsHandle* NS_inst); /* * This function initializes a NS instance and has to be called before any other * processing is made. * * Input: * - NS_inst : Instance that should be initialized * - fs : sampling frequency * * Output: * - NS_inst : Initialized instance * * Return value : 0 - Ok * -1 - Error */
//初始化降噪实例、采样率,在其他处理之前调用
int WebRtcNs_Init(NsHandle* NS_inst, uint32_t fs); /* * This changes the aggressiveness of the noise suppression method. * * Input: * - NS_inst : Noise suppression instance. * - mode : 0: Mild, 1: Medium , 2: Aggressive * * Output: * - NS_inst : Updated instance. * * Return value : 0 - Ok * -1 - Error */
//设置降噪级数
输入:需要进行噪声处理的实例、降噪级数
输出:处理后的实例
mode 0:轻度,1:中度,2:重度

//
int WebRtcNs_set_policy(NsHandle* NS_inst, int mode); /* * This functions does Noise Suppression for the inserted speech frame. The * input and output signals should always be 10ms (80 or 160 samples). * * Input * - NS_inst : Noise suppression instance. * - spframe : Pointer to speech frame buffer for L band * - spframe_H : Pointer to speech frame buffer for H band * - fs : sampling frequency * * Output: * - NS_inst : Updated NS instance * - outframe : Pointer to output frame for L band * - outframe_H : Pointer to output frame for H band * * Return value : 0 - OK * -1 - Error */
//对插入的语音帧进行处理,输入和输出信号总保持在10ms长度
//采样个数为80/160
//输入噪声消除实例 ,L波段语音帧缓冲区的指针,H波段的语音帧缓冲区指针,各自对应的输出指针

int WebRtcNs_Process(NsHandle* NS_inst, short* spframe, short* spframe_H, short* outframe, short* outframe_H); /* Returns the internally used prior speech probability of the current frame. * There is a frequency bin based one as well, with which this should not be * confused. * * Input * - handle : Noise suppression instance. * * Return value : Prior speech probability in interval [0.0, 1.0]. * -1 - NULL pointer or uninitialized instance. */
//返回当前帧使用先前语音的概率 输入实例,返回概率
float WebRtcNs_prior_speech_probability(NsHandle* handle); #ifdef __cplusplus } #endif #endif // WEBRTC_MODULES_AUDIO_PROCESSING_NS_INCLUDE_NOISE_SUPPRESSION_H_

 

参考:https://blog.csdn.net/godloveyuxu/article/details/73657931

posted @ 2022-05-18 18:12  Aemnprsu_wx  阅读(803)  评论(0编辑  收藏  举报