2020,Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework

DOI:10.1016/j.dsp.2020.102943
paper

multi-spectrogram:

log-Mel spectrogram (log-Mel)
STFT spectra \(S(f,t)=\sum_{n=0}^{N-1}x_{t}[n]w[n]e^{-i2\pi n f/f_{s}}\)
Mel frequency warping \(f_{mel}=2595 log(1+f/700)\)
simulates the overall frequency selectivity of the human auditory system
Gammatonegram (Gamma)
STFT spectra
Gammatone weighting by \(g(t)=t^{P-1}e^{-2bt\pi}cos(2ft\pi+\theta)\)
model the frequency-selective cochlea activation response of the human inner ear
Constant Q Transform (CQT)
model the geometric relationship of pitch,
which makes it likely to be effective when undertaking a comparison between natural and artificial sounds, as well as being suitable for frequencies that span several octaves

posted @ 2022-12-10 12:21 prettysky 阅读(27) 评论(0) 收藏举报

刷新页面返回顶部

prettysky