2020,Robust acoustic scene classification using a multi-spectrogram encoder-decoder framework

image

DOI:10.1016/j.dsp.2020.102943
paper


multi-spectrogram:

  • log-Mel spectrogram (log-Mel)
    STFT spectra \(S(f,t)=\sum_{n=0}^{N-1}x_{t}[n]w[n]e^{-i2\pi n f/f_{s}}\)
    Mel frequency warping \(f_{mel}=2595 log(1+f/700)\)
    simulates the overall frequency selectivity of the human auditory system
  • Gammatonegram (Gamma)
    STFT spectra
    Gammatone weighting by \(g(t)=t^{P-1}e^{-2bt\pi}cos(2ft\pi+\theta)\)
    model the frequency-selective cochlea activation response of the human inner ear
  • Constant Q Transform (CQT)
    model the geometric relationship of pitch,
    which makes it likely to be effective when undertaking a comparison between natural and artificial sounds, as well as being suitable for frequencies that span several octaves
posted @ 2022-12-10 12:21  prettysky  阅读(13)  评论(0编辑  收藏  举报