PA 4+5
4 Perception of Speech and Sound
4.1 Basic Psychoacoustic Quantities
声音的物理特征:
Frequency(频率)
Amplitude(振幅)
Waveform(波形)
听觉的基本特性:
Pitch(音调) - 男女音调(声带长度厚度)
Loudness(音响) - 用分贝(dB)表示
Tone quality(音色) - 反映了复杂声波的成分
4.1.1 Mapping of Intensity into Loudness
强度单位 SPL: The normalized threshold in quiet at 1kHz averaged over a large number of normal hearing subjects is defined as 0 dB sound pressure level (SPL), which corresponds to 20 uPa.
响度单位 phon:only be assigned to a sinusoid.
express the speech SPL in a way that approximates the human loudness impression
- Unweighted root-mean-squere (RMS) level.
- A-weighted signal level.
- Loudness in sone.
4.1.2 Pitch
the auditory critical bandwidth is expressed in bark as a function of frequency \(f_{0}\) (in Hz) as
\(\begin{aligned}
1 \text { bark } & =100 \mathrm{mel} \\
& \approx 100 \mathrm{~Hz} \text { for frequencies below } 500 \mathrm{~Hz} \\
& \approx 1 / 5 f_0 \text { for frequencies above } 500 \mathrm{~Hz} .
\end{aligned}\)
4.1.3 Temporal Analysis and Modulationi Perception
4.1.4 Binaural Hearing
4.1.5 Binaural Noise Suppression
4.2 Acoustical Information Required for Speech Perception
4.2.1 Speech Intelligibility and Speech Reception Threshold (SRT)
4.2.2 Measurement Methods
4.2.3 Factors Influencing Speech Intelligibility
4.2.4 Prediction Methods
Internal Representation Approach and Higher-Order Temporal-Spectral Features
- Auditory spectrogram: The basic internal representation assumes that the speech sound is separated into a number of frequency bands and that the compressed frequency-channel-specific intensity is represented over time. The compression can either be a logarithmic compression that is also required to represent human intensity resolution and loudness mapping.
- Modulation spectrogram: one important property of the internal representation is the temporal analysis within each audio frequency band using the modulation filter bank concept. Temporal envelope fluctuations in each audio frequency channel are spectrally analyzed to yield the modulation spectrum in each frequency band, using either a fixed set of modulation filters or a complete spectral analysis. This representation yields the so-called amplitude modulation spectrogram for each instant of time.
- Temporal/Spectral ripple or Gabor feature approach: A generalization of the modulation frequency feature detectors in the temporal domain outlined above also consider the spectral analysis of ripples in the frequency domain as well as a ripple frequency analysis for combined temporal and spectral modulation.
Such a temporal-spectral ripple analysis is motivated by physiological findings of the auditory receptive fields in ferrets as well as psychoacoustical findings by Kaernbach who demonstrated a sensitivity towards combinations of spectral variations and temporal variations.
An elegant way to formalize the sensitivity to joint temporal and spectral energy variations is the Gabor feature concept that considers features with a limited spectro-temporal extent tuned to a certain combination of temporal modulation frequency and spectral ripple frequency.