MELP 标准：美国国防部DoD电信和系统标准

ANALOG-TO-DIGITAL CONVERSION OF VOICE BY 2,400 BIT/SECOND MIXED EXCITATION LINEAR PREDICTION (MELP)

MELP (Mixed-Excitation Linear Predictive)

The MELP (Mixed-Excitation Linear Predictive) algorithm grew out of work done in the mid 1990's on another military coder known as LPC-10. The MELP model was developed in order to improve upon the deficiencies in LPC-10. The first MELP standard was released in 1999 and is commonly known as the 3005 standard. This original MELP specified only the 2400 bps bps mode of operation with an optional postfilter available for the decoder. A MELP frame interval is 22.5 ms in duration and contains 180 voice samples, at a sampling rate of 8,000 kHz. Recommended analog requirements are for a nominal bandwidth ranging from 100 Hz - 3800 Hz. MELP can operate with a more band-limited signal but with a degradation in performance. Also see MELPe, the Mixed-Excitation Linear Predictive enhanced algorithm.

History

MELP was selected as the new 2400 bps Federal Standard speech vocoder by the United States Department of Defense (DoD) Digital Voice Processing Consortium (DDVPC) after a multi-year extensive testing program. The selection test concentrated on four areas: intelligibility, voice quality, talker recognizability, and communicability. The selection criteria also included hardware parameters such as processing power, memory usage, and delay. MELP was selected as the best of the seven candidates and even beat the FS1016 4800 bps vocoder, a vocoder with twice the bit-rate.

Advantages

MELP is robust in difficult background noise environments such as those frequently encountered in commercial and military communication systems. It is very efficient in its computational requirements. This translates into relatively low power consumption, an important consideration for portable systems. MELP uses extensive lookup tables and models of the human voice to extract and regenerate speech; further, the codec is tuned to regenerate the english language, and speakers of non-germanic languages generally rate the coder more poorly than english speakers.

Features

Traditional pitched-excited LPC vocoders use either a periodic pulse train or white noise as the excitation for an all-pole synthesis filter. These vocoders produce intelligible speech at very low bit rates, but they sometimes sound mechanical or buzzy and are prone to annoying thumps and tonal noises. These problems arise from the inability of a simple pulse train to reproduce all kinds of voiced speech. The MELP Vocoder uses a mixed-excitation model that can produce more natural sounding speech because it can represent a richer ensemble of possible speech characteristics.

Many modifications were made to LPC-10 in order to improve speech quality. These include:

Mixed pulse and noise excitation
Periodic or aperiodic impulses
Adaptive spectral enhancement
Pulse dispersion filter
Fourier magnitude modeling

Mixed Pulse and Noise Excitation

The mixed-excitation is implemented using a multi-band mixing model. This model can simulate frequency dependent voicing strength using a novel adaptive filtering structure based on a fixed filterbank. The primary effect of this multi-band mixed-excitation is to reduce the buzz usually associated with LPC vocoders, especially in broadband acoustic noise.

Periodic/Aperiodic Impulses

When the input speech is voiced, the MELP vocoder can synthesize speech using either periodic or aperiodic pulses. Aperiodic pulses are most often used during transition regions between voiced and unvoiced segments of the speech signal. This feature allows the synthesizer to reproduce erratic glottal pulses without introducing tonal noises.

Adaptive Spectral Enhancement

The adaptive spectral enhancement filter is based on the poles of the LPC vocal tract filter and is used to enhance the formant structure in the synthetic speech. This filter improves the match between synthetic and natural bandpass waveforms, and introduces a more natural quality to the speech output.

Pulse Dispersion Filter

The pulse dispersion is implemented using fixed pulse dispersion filter based on a spectrally flattened triangle pulse. This filter has the effect of spreading the excitation energy with a pitch period. This, in turn, reduces the harsh quality of the synthetic speech.

Fourier Magnitude Modeling

Ten Fourier magnitudes are coded with an 8-bit vector quantizer. The index of the code vector, which minimizes the weighted Euclidean distance between the input and code vectors, is transmitted.

Configurations

DAA interface using linear codec at 8.0 kHz sample rate
Direct interface to 8.0 kHz PCM data stream (A-law or μ-law)
North American/International Telephony (including caller ID) support available
Simultaneous DTMF detector operation available - (less than 10 talkoff hits on Bellcore test tape set)
MF tone detectors, general purpose programmable tone detectors/generators available
Data/Facsimile/Voice Distinction available
Common compressed speech frame stream interface to support systems with multiple speech coders
Dynamic speech coders selection if multiple speech codecs available
Can be integrated with G.168 Echo Canceller and Tone Detection/Regeneration modules
Multiple ports can be executed on a single DSP