Advanced Audio Coding (AAC) is a standardized, lossy compression and encoding scheme for digital audio. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at similar bit rates.[2]
AAC has been standardized by ISO and IEC, as part of the MPEG-2 and MPEG-4 specifications.[3][4] Part of the AAC known as High-Efficiency Advanced Audio Coding (HE-AAC) which is part of MPEG-4 Audio is also adopted into digital radio standards like DAB+ and Digital Radio Mondiale, as well as mobile television standards DVB-H and ATSC-M/H.
AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 16 low frequency effects (LFE, limited to 120 Hz) channels, up to 16 "coupling" or dialog channels, and up to 16 data streams. The quality for stereo is satisfactory to modest requirements at 96 kbit/s in joint stereo mode; however, hi-fi transparency demands data rates of at least 128 kbit/s (VBR). The MPEG-2 audio tests showed that AAC meets the requirements referred to as "transparent" for the ITU at 128 kbit/s for stereo, and 320 kbit/s for 5.1 audio.
AAC is also the default or standard audio format for iPhone, iPod, iPad, Nintendo DSi, iTunes, DivX Plus Web Player and PlayStation 3. It is supported on PlayStation Portable, Wii (with the Photo Channel 1.1 update installed for Wii consoles purchased before late 2007), Sony Walkman MP3 series and later, mobile phones made by Sony Ericsson; Nokia; Android; and WebOS - based mobile phones. AAC has also seen moderate adoption on in-dash car audio especially on high-end units such as the Pioneer AVIC series.
History
AAC was developed with the cooperation and contributions of companies including AT&T Bell Laboratories, Fraunhofer IIS, Dolby Laboratories, Sony Corporation and Nokia. It was officially declared an international standard by the Moving Picture Experts Group in April 1997. It is specified both as Part 7 of the MPEG-2 standard, and Subpart 4 in Part 3 of the MPEG-4 standard.[5]
Standardization
In 1997, AAC was first introduced as MPEG-2 Part 7, formally known as ISO/IEC 13818-7:1997. This part of MPEG-2 was a new part, since MPEG-2 already included MPEG-2 Part 3, formally known as ISO/IEC 13818-3: MPEG-2 BC (Backwards Compatible).[6][7] Therefore, MPEG-2 Part 7 is also known as MPEG-2 NBC (Non-Backward Compatible), because it is not compatible with the MPEG-1 audio formats (MP1, MP2 and MP3).[6][8][9][10]
MPEG-2 Part 7 defined three profiles: Low-Complexity profile (AAC-LC / LC-AAC), Main profile (AAC Main) and Scalable Sampling Rate profile (AAC-SSR). AAC-LC profile consists of a base format very much like AT&T's Perceptual Audio Coding (PAC) coding format,[11][12][13] with the addition of temporal noise shaping (TNS),[14] the Dolby Kaiser Window (described below), a nonuniform quantizer, and a reworking of the bitstream format to handle up to 16 stereo channels, 16 mono channels, 16 low-frequency effect (LFE) channels and 16 commentary channels in one bitstream. The Main profile adds a set of recursive predictors that are calculated on each tap of the filterbank. The SSR uses a 4-band PQMF filterbank, with four shorter filterbanks following, in order to allow for scalable sampling rates.
In 1999, MPEG-2 Part 7 was updated and included in the MPEG-4 family of standard and became known as MPEG-4 Part 3, MPEG-4 Audio or ISO/IEC 14496-3:1999. This update included several improvements. One of these improvements was the addition of Audio Object Types which are used to allow interoperability with a diverse range of other audio formats such as TwinVQ, CELP, HVXC, Text-To-Speech Interface and MPEG-4 Structured Audio. Another notable addition in this version of the AAC standard is Perceptual Noise Substitution (PNS). In that regard, the ACC profiles (AAC-LC, AAC Main and AAC-SSR profiles) are combined with perceptual noise substitution and are defined in the MPEG-4 audio standard as Audio Object Types.[15] MPEG-4 Audio Object Types are combined in four MPEG-4 Audio profiles: Main (which includes most of the MPEG-4 Audio Object Types), Scalable (AAC LC, AAC LTP, CELP, HVXC, TwinVQ, Wavetable Synthesis, TTSI), Speech (CELP, HVXC, TTSI) and Low Rate Synthesis (Wavetable Synthesis, TTSI).[16][17]
The reference software for MPEG-4 Part 3 is specified in MPEG-4 Part 4 and the conformance bit-streams are specified in MPEG-4 Part 5. MPEG-4 Audio remains backward-compatible with MPEG-2 Part 7.[18]
The MPEG-4 Audio Version 2 (ISO/IEC 14496-3:1999/Amd 1:2000) defined new audio object types: the low delay AAC (AAC-LD) object type, bit-sliced arithmetic coding (BSAC) object type, parametric audio coding using harmonic and individual line plus noise and error resilient (ER) versions of object types.[19][20][21] It also defined four new audio profiles: High Quality Audio Profile, Low Delay Audio Profile, Natural Audio Profile and Mobile Audio Internetworking Profile.[22]
The HE-AAC Profile (AAC LC with SBR) and AAC Profile (AAC LC) were first standardized in ISO/IEC 14496-3:2001/Amd 1:2003.[23] The HE-AAC v2 Profile (AAC LC with SBR and Parametric Stereo) was first specified in ISO/IEC 14496-3:2005/Amd 2:2006.[24][25][26] The Parametric Stereo audio object type used in HE-AAC v2 was first defined in ISO/IEC 14496-3:2001/Amd 2:2004.[27][28][29]
The current version of the AAC standard is defined in ISO/IEC 14496-3:2009.[30]
AAC+ v2 is also standardized by ETSI (European Telecommunications Standards Institute) as TS 102005.[27]
The MPEG-4 Part 3 standard also contains other ways of compressing sound. These include lossless compression formats, synthetic audio and low bit-rate compression formats generally used for speech.
AAC's improvements over MP3
Advanced Audio Coding is designed to be the successor of the MPEG-1 Audio Layer 3, known as MP3 format, which was specified by ISO/IEC in 11172-3 (MPEG-1 Audio) and 13818-3 (MPEG-2 Audio).
Blind tests show that AAC demonstrates greater sound quality and transparency than MP3 for files coded at the same bit rate.[2]
Improvements include:
More sample frequencies (from 8 to 96 kHz) than MP3 (16 to 48 kHz)
Up to 48 channels (MP3 supports up to two channels in MPEG-1 mode and up to 5.1 channels in MPEG-2 mode)
Arbitrary bit-rates and variable frame length. Standardized constant bit rate with bit reservoir.
Higher efficiency and simpler filterbank (rather than MP3's hybrid coding, AAC uses a pure MDCT)
Higher coding efficiency for stationary signals (AAC uses a blocksize of 1024 or 960 samples, allowing more efficient coding than MP3's 576 sample blocks)
Higher coding accuracy for transient signals (AAC uses a blocksize of 128 or 120 samples, allowing more accurate coding than MP3's 192 sample blocks)
Can use Kaiser-Bessel derived window function to eliminate spectral leakage at the expense of widening the main lobe
Much better handling of audio frequencies above 16 kHz
More flexible joint stereo (different methods can be used in different frequency ranges)
Adds additional modules (tools) to increase compression efficiency: TNS, Backwards Prediction, PNS etc... These modules can be combined to constitute different encoding profiles.
Overall, the AAC format allows developers more flexibility to design codecs than MP3 does, and corrects many of the design choices made in the original MPEG-1 audio specification. This increased flexibility often leads to more concurrent encoding strategies and, as a result, to more efficient compression. However, in terms of whether AAC is better than MP3, the advantages of AAC are not entirely decisive, and the MP3 specification, although antiquated, has proven surprisingly robust in spite of considerable flaws. AAC and HE-AAC are better than MP3 at low bit rates (typically less than 128 kilobits per second)[citation needed]. This is especially true at very low bit rates where the superior stereo coding, pure MDCT, and better transform window sizes leave MP3 unable to compete. However, as bit rate increases, the efficiency of an audio format becomes less important relative to the efficiency of the encoder's implementation, and the intrinsic advantage AAC holds over MP3 no longer dominates audio quality[citation needed].
Also, in terms of comparison, due to its popularity, MP3 format was much more explored than AAC, and there are fewer available AAC codecs than MP3 ones.[31]
Modular encoding
AAC takes a modular approach to encoding. Depending on the complexity of the bitstream to be encoded, the desired performance and the acceptable output, implementers may create profiles to define which of a specific set of tools they want to use for a particular application.
The MPEG-2 Part 7 standard (Advanced Audio Coding) was first published in 1997 and offers three default profiles:[1][33]
Low Complexity (LC) – the simplest and most widely used and supported;
Main Profile (Main) – like the LC profile, with the addition of backwards prediction;
Scalable Sample Rate (SSR) (MPEG-4 AAC-SSR) – a.k.a. Sample-Rate Scalable (SRS);
The MPEG-4 Part 3 standard (MPEG-4 Audio) defined various new compression tools (a.k.a. Audio Object Types) and their usage in brand new profiles. AAC is not used in some of the MPEG-4 Audio profiles. The MPEG-2 Part 7 AAC LC profile, AAC Main profile and AAC SSR profile are combined with Perceptual Noise Substitution and defined in the MPEG-4 Audio standard as Audio Object Types (under the name AAC LC, AAC Main and AAC SSR). These are combined with other Object Types in MPEG-4 Audio profiles.[15] Here is a list of some audio profiles defined in the MPEG-4 standard:[24][34]
Main article: MPEG-4 Part 3#Audio Profiles
Main Audio Profile – defined in 1999, uses most of the MPEG-4 Audio Object Types (AAC Main, AAC-LC, AAC-SSR, AAC-LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, Main synthesis)
Scalable Audio Profile – defined in 1999, uses AAC-LC, AAC-LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI
Speech Audio Profile – defined in 1999, uses CELP, HVXC, TTSI
Synthetic Audio Profile – defined in 1999, TTSI, Main synthesis
High Quality Audio Profile – defined in 2000, uses AAC-LC, AAC-LTP, AAC Scalable, CELP, ER-AAC-LC, ER-AAC-LTP, ER-AAC Scalable, ER-CELP
Low Delay Audio Profile – defined in 2000, uses CELP, HVXC, TTSI, ER-AAC-LD, ER-CELP, ER-HVXC
Mobile Audio Internetworking Profile – defined in 2000, uses ER-AAC-LC, ER-AAC-Scalable, ER-TwinVQ, ER-BSAC, ER-AAC-LD
AAC Profile – defined in 2003, uses AAC-LC
High Efficiency AAC Profile – defined in 2003, uses AAC-LC, SBR
High Efficiency AAC v2 Profile – defined in 2006, uses AAC-LC, SBR, PS
(One of many improvements in MPEG-4 Audio is the Object Type - Long Term Prediction (LTP), which is an improvement of the Main profile using a forward predictor with lower computational complexity.[18])
Depending on the AAC profile and the MP3 encoder, 96 kbit/s AAC can give nearly the same or better perceptual quality as 128 kbit/s MP3.[35]
Extensions and improvements
Some extensions have been added to the first AAC standard (defined in MPEG-2 Part 7 in 1997):
Perceptual Noise Substitution (PNS), added in MPEG-4 in 1999. It allows the coding of noise as pseudorandom data;
Long Term Predictor (LTP), added in MPEG-4 in 1999. It is a forward predictor with lower computational complexity.[18]
Error Resilience (ER), added in MPEG-4 Audio version 2 in 2000, used for transport over error prone channels;[38]
AAC-LD (Low Delay), defined in 2000, used for real-time conversation applications;
High Efficiency AAC (HE-AAC), a.k.a. aacPlus v1 or AAC+, the combination of SBR (Spectral Band Replication) and AAC LC; used for low bitrates; defined in 2003;
HE-AAC v2, a.k.a. aacPlus v2 or eAAC+, the combination of Parametric Stereo (PS) and HE-AAC; used for even lower bitrates; defined in 2004 and 2006;
MPEG-4 Scalable To Lossless (SLS), defined in 2006, can supplement an AAC stream to provide a lossless decoding option, such as in Fraunhofer IIS's "HD-AAC" product;