PB 9+10+11

9 Homomorphic Systems and Cepstrum Analysis of Speech

Definitions

Z-Transform Analysis

Discrete-Time Model for Speech Production

The Cepstrum of Speech

Relation to LPC

Application to Pitch Detection

Application to Analysis/Synthesis Coding

Applications to Speech Pattern Recognition

Mel-Frequency Cepstrum Coefficients (MFCC)

The basic idea is to compute a frequency analysis based upon a filter bank with approximately critical band spacing of the filters and bandwidths. For 4 kHz bandwidth, approximately 20 filters are used.
A short-time Fourier analysis is done first, resulting in a DFT \(X_{m}[k]\) for the m-th frame.
Then the DFT values are grouped together in critical bands and weighted by triangular weighting functions as depicted in Fig.

Note that the bandwidths are constant for center frequencies below 1 kHz and then increase exponentially up to half the sampling rate of 4 kHz, resulting in 24 filters. The mel-spectrum of the m-th frame is defined for $r=1,2,...,R$ as $$MF_{m}[r]=\frac{1}{A_{r}}\sum_{k=L_{r}}^{U_{r}}|V_{r}[k]X_{m}[k]|^2$$ where $V_{r}[k]$ is the weighting function for the r-th filter ranging from DFT index $L_{r}$ to $U_{r}$, and $$A_{r}=\frac{1}{A_{r}}\sum_{k=L_{r}}^{U_{r}}|V_{r}[k]|^2$$ is a normalizing factor for the r-th mel-filter. This normalization is built into the weighting functions in Fig. It is needed so that a perfectly flat input Fourier spectrum will produce a flat mel-spectrum. For each frame, a discrete cosine (DCT) transform of the log of the magnitude of the mel-filter outputs is computed to form the function mfccp[n] as in $$mfcc[n]=\frac{1}{R}\sum_{r=1}^{R}log(MF_{m}[r])cos[\frac{2\pi}{R}(r+\frac{1}{2})n]$$
posted @ 2022-09-16 20:28  prettysky  阅读(25)  评论(0编辑  收藏  举报