MDCT

  

Copyright © Philip M. Parker, INSEAD. Terms of Use.

MDCT

Specialty Definition: Modified discrete cosine transform

(From Wikipedia, the free Encyclopedia)

The modified discrete cosine transform (MDCT) is a frequency transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are 50% overlapped. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. Thus, an MDCT is employed in MP3, AC-3, Ogg Vorbis, and AAC for audio compression, for example.

(There also exists an analogous transform, the MDST, based on the discrete sine transform, as well as other forms of the MDCT based on different types of DCT.)

In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band polyphase quadrature filter (PQF) bank. The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank. Such a combination of a filter bank with an MDCT is called a hybrid filter bank or a subband MDCT. AAC, on the other hand, normally uses a pure MDCT; only the (rarely used) MPEG-4 AAC-SSR variant (by Sony) uses a four-band PQF bank followed by an MDCT. ATRAC uses stacked quadrature mirror filters (QMF) followed by an MDCT.

Definition

As a lapped transform, the MDCT is a bit unusual compared to other frequency transforms in that it has half as many outputs as inputs (instead of the same number). In particular, it is a linear function F : R2n -> Rn (where R denotes the set of real numbers). The 2n real numbers x0, ..., x2n-1 are transformed into the n real numbers f0, ..., fn-1 according to the formula:

(The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments. Only the product of the normalizations of the MDCT and the IMDCT, below, is constrained.)

Inverse Transform

The inverse MDCT is known as the IMDCT. Because there are different numbers of inputs and outputs, at first glance it might seem that the MDCT should not be invertible. However, perfect invertibility is achieved by adding the overlapped IMDCTs of subsequent overlapping blocks, causing the errors to cancel and the original data to be retrieved; this technique is known as time-domain aliasing cancellation (TDAC).

The IMDCT transforms n real numbers f0, ..., fn-1 into 2n real numbers y0, ..., y2n-1 according to the formula:

(Like for the DCT-IV, an orthogonal transform, the inverse has the same form as the forward transform.)

In the case of a windowed MDCT with the usual window normalization (see below), the normalization coefficient in front of the IMDCT should be multiplied by 2 (i.e., becoming 2/n).

Computation

Although the direct application of the MDCT formula would require O(n2) operations, as in the fast Fourier transform (FFT) it is possible to compute the same thing with only O(n log n) complexity by recursively factorizing the computation. One can also compute MDCTs via other transforms, typically a DFT (FFT) or a DCT, combined with O(n) pre- and post-processing steps.

Relationship to DCT-IV and Origin of TDAC

As can be seen by inspection of the definitions, for even n the MDCT is essentially equivalent to a DCT-IV, where the input is shifted by n/2 and two n-blocks of data are transformed at once. By examining this equivalence more carefully, important properties like TDAC can be easily derived.

In order to define the precise relationship to the DCT-IV, one must realize that the DCT-IV corresponds to alternating even/odd boundary conditions: even at its left boundary (around k=-1/2), odd at its right boundary (around k=n-1/2), and so on (instead of periodic boundaries as for a DFT). This follows from the identities cos[(j+1/2)(-k-1+1/2)π/n] = +cos[(j+1/2)(k+1/2)π/n] and cos[(j+1/2)(2n-k-1+1/2)π/n] = -cos[(j+1/2)(k+1/2)π/n]. Thus, if its inputs are an array x of length n, we can imagine extending this array to (x, -xr, -x, xr, ...) and so on, where xr denotes x in reverse order.

Consider an MDCT of size n, where we divide the 2n inputs into four blocks (a, b, c, d) each of size n/2. If we shift these by n/2 (from the +n/2 term in the MDCT definition), then (b, c, d) extend past the end of the n DCT-IV inputs, so we must "fold" them back according to the boundary conditions described above. Thus, the MDCT is exactly equivalent to a DCT-IV of the n inputs: (-cr-d, a-br), where r denotes reversal as above. (In this way, any algorithm to compute the DCT-IV can be trivially applied to the MDCT.)

Similarly, the IMDCT formula above is precisely 1/2 of the (self) inverse DCT-IV, where the output is shifted by n/2 and extended (via the boundary conditions) to a length 2n. The inverse DCT-IV would simply give back the inputs (-cr-d, a-br) from above. When this is shifted and extended via the boundary conditions, one obtains (a-br, b-ar, c+dr, cr+d) / 2. (Half of the IMDCT outputs are thus redundant.)

One can now understand how TDAC works. Suppose that one computes the MDCT of the subsequent, 50% overlapped, 2n block (c, d, e, f). The IMDCT will then yield, analogous to the above: (c-dr, d-cr, e+fr, er+f) / 2. When this is added with the previous IMDCT result in the overlapping half, the reversed terms cancel and one obtains simply (c, d), recovering the original data.

Moreover, the origin of the term "time-domain aliasing cancellation" is now clear. The use of input data that extend beyond the boundaries of the logical DCT-IV causes the data to be aliased in exactly the same way that frequencies beyond the Nyquist frequency are aliased to lower frequencies, except that this aliasing occurs in the time domain instead of the frequency domain. Hence the combinations c-dr and so on, which have precisely the right signs for the combinations to cancel when they are added.

For odd n (which are rarely used in practice), n/2 is not an integer so the MDCT is not simply a shift permutation of a DCT-IV. In this case, the additional shift by half a sample means that the MDCT/IMDCT becomes equivalent to the DCT-III/II, and the analysis is analogous to the above.

Window Functions

In typical signal-compression applications, the transform properties are further improved by using a window function wk (k = 0, ..., 2n-1) that is multiplied with xk and yk in the MDCT and IMDCT formulas, above, in order to avoid discontinuities at the k = 0 and 2n boundaries by making the function go smoothly to zero at those points. In principle, x and y could have different window functions, and the window function could also change from one block to the next (especially for the case where data blocks of different sizes are combined), but for simplicity we consider the common case of identical window functions for equal-sized blocks.

The transform remains invertible, for a symmetric window wk = w2n-1-k, as long as w satisfies the Princen-Bradley condition:

wk2 + wk+n2 = 1.

Various different window functions are common, e.g.

for MP3 and MPEG-2 AAC, and

for Vorbis. AC-3 uses a Kaiser-Bessel derived (KBD) window, and MPEG-4 AAC can also use a KBD window.

Note that windows applied to the MDCT are different from windows used for other types of signal analysis, since they must fulfill the Princen-Bradley condition. One of the reasons for this difference is that MDCT windows are applied twice, for both the MDCT (analysis) and the IMDCT (synthesis).

References

Top     

Crosswords: MDCT

Specialty definitions using "MDCT": Adaptive TRansform Acoustic CodingMPEG-1 audio layer 3, MPEG-2 AAC Low Profile, MPEG-4 AAC Main Profile. (references)

Source: compiled by the editor from various references; see credits.

Top     

Frequency of Internet Keywords: MDCT

The following statistics estimate the number of searches per day across the major English-language search engines as identified by various trade publications. Hyperlinks lead to commercial use of the expression at Amazon.com.
 
ExpressionFrequency
per Day

mdct

4
Source: compiled by the editor from various references; see credits.

Top     

Anagrams: MDCT

Scrabble® Enable2K-Verified Anagrams

 Words containing the letters "c-d-m-t"
 

+2 letters: dictum, mudcat, tomcod.

 

+3 letters: coadmit, compted, demotic, dictums, matched, midcult, mudcats, mulcted, tomcods.

 

+4 letters: acetamid, cemented, coadmits, coempted, combated, commuted, competed, computed, costumed, cremated, dalmatic, decimate, democrat, demotics, diatomic, dimetric, document, dogmatic, domestic, dramatic, dumpcart, dutchman, dutchmen, ectoderm, impacted, maledict, medicate, methodic, microdot, midcults, midwatch, misacted, miscited, miticide, morticed, mucidity, muscadet, smutched, talmudic, timecard.

Source: compiled by the editor from various references; see credits.

SCRABBLE® is a registered trademark. All intellectual property rights in and to the game are owned in the U.S.A and Canada by Hasbro Inc., and throughout the rest of the world by J.W. Spear & Sons Limited of Maidenhead, Berkshire, England, a subsidiary of Mattel Inc. Mattel and Spear are not affiliated with Hasbro.

Top     

Alternative Orthography: MDCT


Hexadecimal (or equivalents, 770AD-1900s) (references)

4D 44 43 54

Leonardo da Vinci (1452-1519; backwards) (references)

American Sign Language (origins from 1620-1817 in Italy and, especially, France) (references)

=

Semaphore (1791, in France) (references)

Braille (1829, in France) (references)

Morse Code (1836) (references)

--    -..    -.-.    -

Dancing Men (Sir Arthur Conan Doyle, 1903) (references)

Binary Code (1918-1938, probably earlier) (references)

01001101 01000100 01000011 01010100

HTML Code (1990) (references)

&#77 &#68 &#67 &#84

ISO 10646 (1991-1993) (references)

004D 0044 0043 0054

British Sign Language (Fingerspelling, BSL; 1992, British Deaf Association Dictionary of British Sign Language) (references)

Encryption (beginner's substitution cypher): (references)

47383754

Top     



INDEX

1. Crosswords
2. Expressions: Internet
3. Anagrams
4. Orthography
5. Bibliography


  

Copyright © Philip M. Parker, INSEAD. Terms of Use.