mfcc vs fbank

There is some debate in the community regarding the use of the DCT, instead of directly using the log Mel fiterbank features, particularly for deep neural network based acoustic models. Some research groups, like Google, use filterbanks (fbanks) while Kaldi mostly uses MFCCs, especially in its TDNN chain models. Since filterbank energies are correlated and cannot be used directly with a Gaussian mixture with diagonal covariance, we apply a discrete cosine transform (DCT) to decorrelate them.

Here is Dan Povey’s take on this:

The reason we use MFCC is because they are more easily compressible, being decorrelated; we dump them to disk with compression to 1 byte per coefficient. But we dump all the coefficients, so it’s equivalent to filterbanks times a full-rank matrix, no information is lost.

 

参考:A note on MFCCs and delta features (desh2608.github.io)

posted @ 2022-07-21 15:21  koala999  阅读(22)  评论(0编辑  收藏  举报