Lossy Audio Data Compression
Effects
This page demonstrates the distortion that can
be added to a soundfile when it is saved with lossy data
compression. It is nearly impossible to distinguish between the
original and the compressed version of the test file by listening.
However, if you look at the spectrograms, you will see significant
distortions (additional noise) around rapidly modulated sounds.
Also, soft sounds (the constant sine signal at 17 kHz) will be
degraded. Additional background noise seems to have no major effects
on the spectrographic representation because the potential spurious
noise is buried in that original masking noise. It should also be
noted that the occurrence of the spurious noise is very
unpredictable (the original zig-zag shaped signal ranging from 1.13
to 1.56 sec is identical to that from 3.1 to 3.54 sec). In the first
example, the MP3 system was used. There are differences between the
various versions (including the ATRAC system employed in MiniDisk
recorders). However, the principle is always the same. If the
available bit rate is not sufficient for encoding a given signal,
the data reduction algorithm has to remove those parts of the sound,
that are inaudible or less important for the human perception. This
is done by reducing the bit-depths in some frequency bands. That
procedure may produce additional spurious (quantization) noise in
the decoded signal. Even more sophisticated algorithms as the bit
reservoir feature of MP3 will lead to loss of information or
distortion, as soon as complicated sounds last for more than a few
milliseconds |

This is the
spectrogram of the original sound file. Listen to the original
soundfile. |

This is the spectrogram of the compressed MP3 sound
file. Listen to the compressed
soundfile (decoded back into .wav file format) and the compressed
.mp3 soundfile. |
|
This is a single spectrum taken from
the spectrogram at t=3.34 sec (uncompressed file) |
 |
This is a single spectrum taken from
the spectrogram at t=3.34 sec (compressed MP3 file). The
spurious signal components ranging from 5 to 12 kHz have a
maximum amplitude of -28 dB (relative to the peak amplitude of
the original signal). Theoretically, a bit-depth of about only
4 bit (28dB/6dB) would be sufficient to represent that worst
case situation. | |
This is the
spectrogram of the test signal after passing through a MiniDisk
recorder employing ATRAC 4.5. Listen to the compressed
soundfile (decoded back into .wav file format). It reveals more
dramatic artifacts than the MP3 example. The constant sine signals
at 16 and 17 kHz temporarily disappear completely. The spurious
noise surrounding the rapid frequency-modulated sound structures is
considerably stronger at some locations. |
 |
This is a single spectrum taken from
the orginal test signal (uncompressed file) |
 |
This is a single spectrum taken from
the test file at the same location, but after passing through
MiniDisk (worst-case situation at t=1.374 sec).
| |
| See also the spectrogram of the 8 bit
version of the original soundfile. It can be seen, that even with a
very poor resolution of 8 bit (dynamic range of 42dB), we are
getting a much more precise spectrogram than from the compressed 16
bit version! |
|
|
| The following two spectrograms
represent natural bird sounds that have been recorded with both DAT
(Tascam DA-P1) and MiniDisk (HHB Portadisk with Atrac 4.5)
recorders. The microphone signal was fed simultaneously into both
recorders. This test has been conducted by Jeremy Minns, Brazil.
Listen to the underlying soundfile.
The artifacts visible on the spectrogram at t=6.3 sec can also be
recognized when listening carefully via headphones. |
DAT recording |

MiniDisk
recording |
| The amount of distortion added by
the encoder will heavily depend on the frequency content of the
recorded signals. Band-limited signals carry less information and
are therefore easier to process. To illustrate this effect, the
original artificial sound file has been low-pass filtered with a
cut-off frequency of 8.5 kHz (spectrogram and soundfile
of the uncompressed file). The remaining signals below 8.5 kHz will
be much better reproduced (spectrogram and soundfile
of the compressed file). So, if the sounds to be recorded have a limited bandwidth (which is the case in most bird sounds when recorded at larger distances), then there will be no significant artifacts.
|
| Care should be taken when
transferring even digital data between a digital recorder and the
PC. This example shows the spectrogram of a resampled and compressed
soundfile, that has been digitally transferred onto a MiniDisk
recorder and transferred back to the PC via SPDIF. Listen to both
the the original re-sampled signal on the left and the compressed soundfile
one right channel. Obviously, the real-time sample-rate conversion
process of the SP/DIF interface from 44.1 to 48 kHz introduced the
additional artifacts at the higher frequencies (especially those
above 22 kHz). Under these circumstances it were better to transer
the data via the analog path, even if some (neglectable) additinal
quantization noise were introduced. |
Methods |
The synthetic test signal was
generated using Avisoft-SASLab Pro, version 4.21 and it was saved
as an uncompressed mono 16 bit/44.1 kHz .wav file. The data rate of
this uncompressed file is 705 kBit/s. This file was then converted
into a .MP3 file using LAME V3.93.1 (MP3DEV.ORG). The default settings
were used (bitrate of 128 kbit, bit reservoir not disabled, ...).
The bitrate of 128 kBit/s for mono files (256 kBit/s in stereo)
corresponds to the 5:1 compression of comparable ATRAC systems (the
compression ratio in the MP3 sample is 5.38 : 1). The .MP3 file was
then converted back to a .WAV file by using the LAME decoder.
In the MiniDisk (ATRAC) example, the orginal test file was
digitally recorded onto a Sony MDS-JE520 MiniDisk deck (ATRAC 4.5).
That recorded test signal was then played back to the PC by using
the same equipment.
Please note, that this is not an evaluation of the equipment
used. It should only demonstrate the horrific effects that may occur
in extreme situations. The spectrograms of both the
compressed and original file were made in Avisoft-SASLab Pro. The
spectrogram parameters were : FFT-Length = 256, Frame size = 100%,
Window = FlatTop, Overlap = 87.5%. This configuration provides an
analysis bandwidth of 650 Hz and a temporal resolution of 1.5
milliseconds. |
Links |
MP3 Tech , goto
Overview of the MP3 techniques Fraunhofer Institution developed the MPEG audio
technology ATRAC : Adaptive Transform Acoustic Coding for
MiniDisc Tutorial CD-ROM on coding artifacts, published by
the Audio Engineering Society Debate about the usefulness of minidisc (MD)
recordings on the email list
BIOACOUSTICS-L |