2017, An Investigation of High-Resolution Modeling Units of Deep Neural Networks for Acoustic Scene Classification
DOI:10.1109/IJCNN.2017.7966232
paper
low-level time-based and frequency-based audio descriptors
frequency-band energy features (energy/frequency)
auditory filter banks (Gammatone, Mel filters)
cepstral features(MFCC)
spatial features (ITD: interaural time difference, ILD: interaural level difference)
voicing features (f0)
i-vector