Development of ATRAC2 Encoder/Decoder LSI
Takahiro Watanbe, Masahiro Ohmoto, Miki Abe
Sony Corporation
Dept. 1, Advanced Development Laboratory, Consumer A&V Products
Company
6-7-35 Kitashinagawa, Shinagawa-ku, Tokyo, 141 Japan
March 96, from the digest of technical papers of the 1996 IEEE ICCE: 0-7803-3029
Abstract: The ATRAC2 LSI can record and playback 148 minutes of
stereo digital audio on MD-DATA format. ATRAC2 compresses CD data 10:1 and
controls quantization noise, rendering it inaudible using
psychoacoustic principles.
Discussion of Technical Report by Watanabe et al. covering
the
ATRAC2 Encoder/Decoder LSI
1. Introduction
Normal ATRAC compresses CD quality [44.1kHz, 16bit] audio 5:1 (to
146kbps/ch). By using a more efficient coding system, ATRAC2
compresses audio 10:1 to 20:1 (to 73-36kbps/ch). A one chip stereo
encoder/decoder LSI was developed for ATRAC2. MD-Data was chosen as
the format for ATRAC2, apparently due to the need for greater
compression in MD-Data applications.
2. System Components
|
ATRAC2 Encoder |
ATRAC2 operates by extracting the psychoacoustically important tonal
components from the input signal spectra and encoding them separately
from the other less important spectral data. A tone component is a
group of consecutive spectral coefficients, described with parameters
such as its location and width. The tone components and the remaining
spectral components are Huffman coded for efficient bit packing.
|
ATRAC2 Time Frequency Analysis |
Input signal analysis is done with a 96-tap Polyphase Quadrature
Filter (PQF) that breaks the incoming signal into 4 frequency bands:
0-5.51kHz, 5.51-11.03kHz, 11.03-16.54kHz, and 16.54-22.05kHz. This
is followed by a fixed-length, 50% overlap Modified Discrete Cosine
Transform (MDCT).
ATRAC2 handles the pre-echo problem differently than normal ATRAC.
Instead of adaptively changing the transform window size, it minimizes
pre-echo by adaptively amplifying the signal preceding an attack
before doing the MDCT, and then restoring it to the original level
after the Inverse MDCT in the decoder. This technique, called Gain
Control, simplifies the spectral structure of the system.
ATRAC2 allows 16bit, 44.1kHz (705.6kbps) signals to be compressed to
73kbps without sacrificing audio quality.
3. ATRAC2 LSI
The LSI is composed only of a DSP core, memory and serial interface.
All operation is performed by software running on the DSP. Memory
buffer requirements have doubled since ATRAC2 operates on input audio
segments twice the size of ATRAC.
Configuration:
- Data-Ram: 7.6k x 16bits
- Data-Rom: 3k x 16bits
- Program-Rom: 8k x 24bits
- DSP core: horizontal type 16bit fixed-point DSP
The DSP core's 16 bit wordlength has insufficient accuracy for
performing the necessary multiplications. To minimize error, 32bit X
16bit double precision multiplication and Block-floating calculations
are used ("with the index part in 256 data transmission units"). The
double precision multiply is used with the PQF and the block floating
primarly with the MDCT. These methods are necessary to ensure a tone
quality equivilant to CD.
Depending upon the input signal, the amount of data processing
required for encoding and decoding varies on a block by block basis.
After finishing the processing for each block, the LSI switches itself
to sleep mode, thereby minimizing power consumption, which is
typically 100mW or less at 3 volts.
Notes
- Parts of this document appear to have been translated from Japanese to
English with the close assistance of an automatic translation system. I
make an effort in this summary to try and determine the author's
original intent.
- The name ATRAC2 (no space) is confusing and Sony
could have chosen a better one, ATRAC-beta (okay, ATRAC-B) or ATRAC
type II come to mind. As it is, it is easily confused with ATRAC
generation 2.