Development of ATRAC2 Encoder/Decoder LSI

Takahiro Watanbe, Masahiro Ohmoto, Miki Abe
Sony Corporation
Dept. 1, Advanced Development Laboratory, Consumer A&V Products Company
6-7-35 Kitashinagawa, Shinagawa-ku, Tokyo, 141 Japan

March 96, from the digest of technical papers of the 1996 IEEE ICCE: 0-7803-3029

Abstract: The ATRAC2 LSI can record and playback 148 minutes of stereo digital audio on MD-DATA format. ATRAC2 compresses CD data 10:1 and controls quantization noise, rendering it inaudible using psychoacoustic principles.


Discussion of Technical Report by Watanabe et al. covering the
ATRAC2 Encoder/Decoder LSI

1. Introduction

Normal ATRAC compresses CD quality [44.1kHz, 16bit] audio 5:1 (to 146kbps/ch). By using a more efficient coding system, ATRAC2 compresses audio 10:1 to 20:1 (to 73-36kbps/ch). A one chip stereo encoder/decoder LSI was developed for ATRAC2. MD-Data was chosen as the format for ATRAC2, apparently due to the need for greater compression in MD-Data applications.

2. System Components

ATRAC2
encoder diagram
ATRAC2 Encoder
ATRAC2 operates by extracting the psychoacoustically important tonal components from the input signal spectra and encoding them separately from the other less important spectral data. A tone component is a group of consecutive spectral coefficients, described with parameters such as its location and width. The tone components and the remaining spectral components are Huffman coded for efficient bit packing.

ATRAC2
time-frequency analysis diagram
ATRAC2 Time Frequency Analysis

Input signal analysis is done with a 96-tap Polyphase Quadrature Filter (PQF) that breaks the incoming signal into 4 frequency bands: 0-5.51kHz, 5.51-11.03kHz, 11.03-16.54kHz, and 16.54-22.05kHz. This is followed by a fixed-length, 50% overlap Modified Discrete Cosine Transform (MDCT).

ATRAC2 handles the pre-echo problem differently than normal ATRAC. Instead of adaptively changing the transform window size, it minimizes pre-echo by adaptively amplifying the signal preceding an attack before doing the MDCT, and then restoring it to the original level after the Inverse MDCT in the decoder. This technique, called Gain Control, simplifies the spectral structure of the system.

ATRAC2 allows 16bit, 44.1kHz (705.6kbps) signals to be compressed to 73kbps without sacrificing audio quality.

3. ATRAC2 LSI

The LSI is composed only of a DSP core, memory and serial interface. All operation is performed by software running on the DSP. Memory buffer requirements have doubled since ATRAC2 operates on input audio segments twice the size of ATRAC.

Configuration:

The DSP core's 16 bit wordlength has insufficient accuracy for performing the necessary multiplications. To minimize error, 32bit X 16bit double precision multiplication and Block-floating calculations are used ("with the index part in 256 data transmission units"). The double precision multiply is used with the PQF and the block floating primarly with the MDCT. These methods are necessary to ensure a tone quality equivilant to CD.

Depending upon the input signal, the amount of data processing required for encoding and decoding varies on a block by block basis. After finishing the processing for each block, the LSI switches itself to sleep mode, thereby minimizing power consumption, which is typically 100mW or less at 3 volts.


Notes


Return to the MiniDisc Community Page.