Advances in Speech and Audio Compression
ALLEN
GERSHO,
FELLOW, IEEE
Invited
Paper
Speech and audio compression has advanced rapidly in recent
years spurred
on
by cost-effective digital technology and diverse
commercial applications. Recent activity in speech compression is
dominated by research and development of a family
of
techniques
commonly described as code-excited linear prediction
(CELP)
coding. These algorithms exploit models of speech production and
auditory perception and offer a quality versus bit rate tradeoff that
significantly exceeds most prior compression techniques for rates
in the range of 4 to
16
kbls. Techniques have also been emerging
in
recent years that offer enhanced quality in the neighborhood of 2.4
kbls over traditional vocoder methods, Wideband audio compres-
sion is generally aimed at a quality that is nearly indistinguishable
from consumer compact-disc audio. Subband and transform coding
methods combined with sophisticated perceptual coding techniques
dominate
in
this arena with nearly transparent quality achieved at
bit rates in the neighborhood of 128 kbls per channel.
I.
INTRODUCTION
Compression of telephone-bandwidth speech has been an
ongoing area of research for several decades. Nevertheless,
in the last several years, there has been an explosion of
interest and activity in this area with numerous applications
in telecommunications and storage, and several national
and intemational standards have been adopted. High-fidelity
audio compression has also advanced rapidly in recent
years, accelerated by the commercial success of consumer
and professional digital audio products. The surprising
growth
of
activity in the relatively old subject of speech
compression is driven by the insatiable demand for voice
communication, by the new generation of technology for
cost-effective implementation of digital signal processing
algorithms, by the need to conserve bandwidth in both
wired and wireless telecommunication networks, and the
need to conserve disk space in voice storage systems. Most
of this effort is focused on the usual telephone bandwidth
of roughly
3.2
kHz (200 Hz to
3.4
kHz). There has also
Manuscript received November
I ,
1993; revised January 15, 1994.
This work was supported in part by the National Science Foundation,
Fujitsu Laboratories,Ltd., the UC Micro program, Rockwell Intemational
Corporation, Hughes Aircraft Company,Echo Speech Corporation, Signal
Technology, Inc., and Qualcomm, Inc.
The author is with the Center for Information Processing Research,
Department of Electrical and Computer Engineering, University of
Cali-
fornia, Santa Barbara, CA 93 106, USA.
IEEE Log Number 9401
177.
been a very large increase in research and development in
the coding of audio signals, particularly, wideband audio
(typically 20-kHz bandwidth) for transmission and storage
of CD-quality music. Interest in wideband (7-kHz) speech
for audio in video teleconferencing has also increased in
recent years.
Since standards are essential for compatibility of ter-
minals in voice and audio communication systems, stan-
dardization of speech and audio coding algorithms has
lately become a major activity of central importance to
industry and government.
As
a result, the driving force
for much of the research in speech and audio coding has
been the challenge
of
meeting the objectives of standards
committees. The most important organization involved in
speech coding standardization is the Telecommunication
Standardization Sector of the Intemational Telecommuni-
cations Union, referred to by the acronym ITU-T (the
successor of the Intemational Telephone and Telegraph
Consultative Committee, CCITT). Other standards organi-
zations will be mentioned later in this paper.
This paper highlights the state of the
art
for
digital
compression of speech and audio signals. The scope is
limited to surveying the most important and prevailing
methods, approaches, and activities of current interest with-
out attempting to give a tutorial presentation of specific
algorithms or a historical perspective of the evolution of
speech coding methods. No attempt
is
made to offer a
complete review of the numerous contributions that have
been made in recent years, and inevitably some important
papers and methods will be overlooked. Nevertheless, the
major ideas and trends are covered here and attention
is focused on those contributions which have had the
most impact on the current state of the
art.
Many al-
gorithms that are no longer of current importance are
not covered at all or only briefly mentioned here, even
though they may have been widely studied in the past.
We do not attempt to describe the quantitative performance
of different coding algorithms as determined from the
many subjective evaluations that have taken place in recent
years.
For reviews, tutorials, or collections of papers on earlier
work in speech compression, see [205],
[83], [204],
[110],
0018-9219/94$04.00
0
1994 IEEE
900
PROCEEDINGS
OF
THE
IEEE,
VOL. 82,
NO.
6,
JUNE
1994
[67],
[115],
[851,
[46], [79], [253].
A recent survey of
audio compression is given in
[77].
For a cross section of
recent work in speech compression, see
[SI, [9].
A
general
perspective of issues, techniques, targets, and standards in
signal compression is given in
[112].
A comprehensive
review of the methods and procedures involved in speech
standardization and some recent activity in this area is given
in
[65].
Virtually all work in speech and audio compression
involves
lossy
compression
where the numerical represen-
tation of the signal samples is never recovered exactly
after decoding (decompression). There is a wide range of
tradeoffs between bit rate and recovered speech quality
that are of practical interest in the coding of telephone
speech, where users are accustomed to tolerating various
degrees of degradation. On the other hand, for wideband
audio compression, consumers have higher expectations
today, and quality close to that of the compact disc (CD)
is generally needed. Thus research in speech compres-
sion includes concurrent studies for different distortion-rate
tradeoffs motivated by various applications with different
quality objectives. For wideband audio compression, most
research aims at the same or similar standard
of
quality as
offered by the CD.
Although the term
compression
is commonly used in the
lay press and in the computer science literature, researchers
working in speech or audio generally prefer the term
coding.
This avoids ambiguity with the altemative use of
speech
compression
that refers to time-scale modification
of speech, as in the speeding-up of the speech signal, e.g.,
in leaming aids for the blind. Information theorists refer to
signal compression as
source coding.
Henceforth, we shall
use the term
coding.
The ease of real-time implementation of speech-coding
algorithms with single-chip digital signal processors has
led to widespread implementations of speech algorithms
in the laboratory as well as an extension of applications
to communication and voice storage systems. The largest
potential market for speech coding is in the emerging area
of personal communication systems (PCS) where volumes
of hundreds of millions are expected in the
U.S.
alone,
and comparable numbers in Westem Europe and Japan. In
the next decade or
so,
a significant number (perhaps more
than
50%)
of telephones are expected to become wireless.
Another new area of application is multimedia in personal
computing where voice storage is becoming a standard
feature. With so many applications already emerging or
expected to emerge in the next few years, it is not surprising
that speech coding has become such an active field of
research in recent years.
Wideband audio coding for high-fidelity reproduction of
voice and music has emerged as an important activity in the
past decade. Applications of audio coding lie largely with
the broadcasting industry, motion picture industry, and con-
sumer audio and multimedia products. A key intemational
standard developed by the Motion Picture Experts Group
(MPEG) of the Intemational Standards Organization (ISO)
includes an audio coding algorithm
[21].
GERSHO:
ADVANCES
IN
SPEECH
AND
AUDIO COMPRESSION
Speech-coding algorithms can be divided into two
main categories
waveform coders
and
vocoders.
The
term vocoder historically originated as a contraction of
voice coder.
In waveform coders, the data transmitted
from encoder to decoder specify a representation of the
original speech as a waveform of amplitude versus time,
so
that the reproduced signal approximates the original
waveform and, consequently, provides an approximate
recreation of the original sound. In contrast, vocoders do
not reproduce an approximation to the original waveform;
instead, parameters that characterize individual sound
segments are specified and transmitted to the decoder,
which then reconstructs a new and different waveform
that will have a similar sound. Vocoders are sometimes
called
parametric coders
for obvious reasons. Often these
parameters characterize the short-term spectrum of a sound.
Alternatively, the parameters specify a mathematical model
of human speech production suited to a particular sound.
In either case, the parameters do not provide sufficient
information to regenerate a close approximation to the
original waveform but the information is sufficient for the
decoder to synthesize a perceptually similar speech sound.
Vocoders operate at lower bit rates than waveform coders
but the reproduced speech quality, while intelligible, usually
suffers from a loss of naturalness and some of the unique
characteristics of an individual speaker are often lost.
Most work on speech coding today is based on telephone-
bandwidth speech, nominally limited
to
about
3.2
kHz and
sampled at the rate of
8
kHz. Wideband speech coding is
of increasing interest today and is intended for speech or
audio signals of
7
kHz, sampled at
16
kHz. High-fidelity
audio signals of bandwidth
20
kHz are generally sampled at
rates of
44.1
or
48
kHz although there is also some interest
in 15-kHz bandwidth signals with a 32-kHz sampling rate.
Audio coding schemes of interest today include joint coding
of multiple audio channels.
Much of the work in waveform speech coding, is domi-
nated by a handful of different algorithmic approaches and
most of the developments in recent years have focused on
modifications and enhancements of these generic methods.
Most notable and most popular for speech coding is
code-
excited linear prediction
(CELP). Other methods in com-
mercial use today that continue to receive some attention
include
adaptive delta modulation
(ADM),
adaptive differ-
ential pulse code modulation
(ADPCM),
adaptive predictive
coding
(APC),
multipulse linear predictive coding
(MP-
LPC), and
regular pulse excitation
(RPE). MP-LPC, RPE,
and CELP belong to a common family of analysis-by-
synthesis algorithms to be described later. These algorithms
are sometimes viewed as “hybrid” algorithms because they
borrow some features of vocoders, but they basically belong
to the class
of
waveform coders.
Although many vocoders were studied several decades
ago, the most important survivor is the
linear predictive
coding
(LPC) vocoder, which is extensively used in secure
voice telephony today and is the starting point of some
current vocoder research. Another vocoding approach that
has emerged as an effective new direction in the past decade
90
I
is
sinusoidal coding.
In particular,
sinusoidal transform
coding
(STC) and
multiband excitation
(MBE) coding are
both very actively studied versions of sinusoidal coding.
Many waveform coders with other names are closely
related to those listed here. Of diminishing interest are RPE,
MP-LPC, ADPCM, and ADM although versions of these
have become standardized for specific application areas.
Perhaps the oldest algorithm to be used in practice is ADM,
one well-known version of which
is
continuously variable
slope delta modulation
(CVSD). Although the performance
of ADPCM at 32 kb/s can today be achieved at much lower
rates by more “modem” algorithms, ADPCM remains of
interest for some commercial applications because of its
relatively low complexity.
Subband and transform coding methods have been ex-
tensively studied for speech coding a decade ago. To-
day, they serve as the basis for most wideband audio-
coding algorithms and for many image- and video-coding
schemes but they are generally not regarded today as
competitive techniques for speech coding. Nevertheless,
many researchers continue to study subband and transform
techniques for speech coding and a few very interesting
and effective coding schemes of current interest make
use of filter banks or some form of linear transformation.
These techniques generally function as building blocks that
contribute to an overall algorithm for some effective coding
schemes such as IMBE and CELP. One ITU-T standard,
Recommendation G.722, for wideband (7-kHz) speech at
64, 56, and 48 kb/s, uses a two-band subband coder [225],
[174].
Compression algorithms of current interest for wide-
band audio are based on signal decompositions via linear
transformations or subband filter banks (including wavelet
methods) which allow explicit and separate control of
the coding of different frequency regions in the auditory
spectrum. Efficient coding is achieved with the aid of
sophisticated perceptual masking models for dynamically
allocating bits to different frequency bands. The quality
objectives for audio coding are generally much more de-
manding than for speech coding. The usual goal is to
attain a quality that is nearly indistinguishable from that
of
the compact disc (CD). In contrast, most speech coding
is applied to signals already limited by the telephone
bandwidth
so
that users are not accustomed to high-fidelity
reproduction.
This paper is organized as follows. In Section 11, we give
a brief overview of the most important family of speech
coding algorithms that includes CELP and in Section 111 we
review the recent activity in CELP coding, the most widely
studied algorithmic approach of current interest. Section
IV examines the advances in low-delay speech coding
and Section V reviews the area of variable-rate speech
coding. In Section VI, we examine recent developments in
vocoders. Section VI1 looks at wideband speech and audio
coding. Section VI11 summarized the current performance
achievable today in speech and audio coding at various bit
rates. Finally, in Section IX, some concluding remarks are
offered.
902
11.
LPAS SPEECH
CODING
The approach to speech coding most widely studied and
implemented today is
linear-prediction-based analysis-by-
synthesis
(LPAS) coding. An LPAS coder has three basic
features:
Basic decoder structure: The decoder receives data
which specify an excitation signal and a synthesis fil-
ter; the reproduced speech is generated as the response
of the synthesis filter to the excitation signal.
Synthesis filter: The time-varying linear-prediction-
based synthesis filter is periodically updated and is
determined by
linear prediction
(LP) analysis of the
current segment or
frame
of the speech waveform;
the filter functions as a shaping filter which maps a
relatively flat spectral-magnitude signal into a signal
with an autocorrelation and spectral envelope that are
similar to those of the original speech.
Analysis-by-synthesis excitation coding: The encoder
determines the excitation signal one segment at a
time, by feeding candidate excitation segments into
a replica of the synthesis filter and selecting the one
that minimizes a perceptually weighted measure of
distortion between the original and reproduced speech
segments.
The earliest proposals for LPAS coder configurations
appeared in 1981. Schroeder and Atal described a tree-code
excitation generator [206] and Stewart proposed a codebook
excitation source [222]. The first effective and practical
form of LPAS coder to
be
introduced was
multipulse
LPC
(MP-LPC) due to Atal and Remde
[
111 where in each frame
of speech, a
multipulse
excitation is computed as a sparse
sequence of amplitudes (pulses) separated by zeros. The
locations and amplitudes of the pulses in the frame are
transmitted to the decoder. An MP-LPC algorithm at 9.6
kb/s was recently adopted as a standard for aviation satellite
communicaiions by the Airlines Electronic Engineering
Committee (AEEC).
In 1986
regular pulse excitation
(WE) coding was in-
troduced by Kroon, Deprettere, and Sluyter [145]. Also an
LPAS technique, RPE uses regularly spaced pulse pattems
for the excitation with the position
of
the first pulse and
the pulse amplitudes determined in the encoding process.
Although inspired by MP-LPC, it is also close in spirit to
CELP. A modified version of RPE, called
regular pulse ex-
citation with long-term prediction
(RPE-LTP), was selected
as part of the first standard for time-division multiple-access
(TDMA) digital cellular telephony by the global system
for mobile telecommunications (GSM) subcommittee of the
European Telecommunications Standards Institute (ETSI)
[931.
Most early LPAS methods were based on a synthesis filter
which is a cascade of a short-term or
formant
filter and a
long-term or
pitch
filter. The short-delay filter is typically
a 10th-order all-pole filter with parameters obtained by
conventional LP analysis The long-term filter is typically
PROCEEDINGS
OF
THE IEEE.
VOL.
82,
NO.
6,
JUNE 1994
based on a single-tap or three-tap pitch’prediction. The
properties of these pitch filters were extensively studied
by Ramachandran and Kabal
[1931, [1941.
A key element of LPAS coding is the use of
perceptual
weighting
of the error signal for selecting the best excita-
tion via analysis-by-synthesis. The error between original
and synthesized speech is passed through a time-varying
perceptual weighting filter which emphasizes the error in
frequency bands where the input speech has valleys and
de-emphasizes the error near spectral peaks. The .effect is
to reduce the resulting quantization noise in the valleys
and increase it near the peaks. This is generally done by
an all-pole filter obtained from the LP synthesis filter by
scaling down the magnitude of the poles by a constant
factor. This technique exploits the masking feature of the
human hearing system to reduce the audibility of the noise.
It is based on the classic work of Atal and Schroeder in
1979
on subjective error criteria
[12].
The most important form of LPAS coding today is
commonly known as
code-excited linear prediction
(CELP)
coding, but has also been called
stochastic coding, vector
excitation coding
(VXC), or
stochastically excited linear
prediction
(SELP). CELP improves on MP-LPC by using
vector quantization (VQ)
[76],
where a predesigned set of
excitation vectors is stored in a
codebook,
and for each
time segment the encoder searches for that code vector
whose set of samples best serves as the excitation signal for
the current time segment. The address of the selected code
vector is transmitted to the receiver, which has a copy
of
the
codebook,
so
that the receiver can regenerate the selected
excitation segment. For example, a codebook containing
1024
code vectors each of dimension
40
would require a
10-b word to specify each successive
40
samples of the
excitation signal. The superior performance capability of
CELP compared to MP-LPC and earlier coding methods for
bit rates ranging from
4.8
to
16
kb/s has become generally
recognized. Today, the terminology “CELP’ refers to a
family of coding algorithms rather than to one specific
technique; all algorithms in this family are based on LPAS
with VQ for coding the excitation.
The invention of CELP is generally attributed to Atal
and Schroeder
[13], [207].
A somewhat similar coding
technique was also introduced by Copperi and Sereno
[42].
At least one earlier research study contained the
key element of CELP, namely, LPAS coding with VQ
[222].
In
fact, MP-LPC is sometimes viewed as a special
form of CELP, in which a multistage VQ structure with
a particular set of deterministic codebooks are used
[135].
RPE can even more easily be seen as a form of CELP
coding. Another coding method,
vector-adaptive predictive
coding
(VAPC) has many features of CELP including the
use of VQ and analysis-by-synthesis but differs in the
encoder search structure and in the ordering of short-
term and long-term synthesis filtering
[36], [37].
Rose and
Bamwell introduced the
self-excited coder
[196],
which
used prior excitation segments as code vectors for the
current excitation. Although MP-LPC perhaps represents a
conceptually more fundamental advance in speech coding,
GERSHO ADVANCES
IN
SPEECH
AND
AUDIO
COMPRESSION
CELP has had a much greater impact in the field. While
newer coding techniques have since been developed, none
clearly overtakes CELP in the range of bit rates
4-16
kb/s.
111. CELP
A .
History
ALGORITHMS
’
Initially viewed as an algorithm
of
extraordinary com-
plexity, CELP served only
as
an
existence proof (with the
help of supercomputers) that it is possible to get very high
speech quality at bit rates far below what was previously
considered feasible. The first papers on CELP coding by
Atal and Schroeder
[13], [207]
attracted great attention,
intrigued researchers, and continue to be widely cited
today. In
1986,
soon after CELP’s introduction, several
reduced complexity methods for implementation of the
basic CELP algorithm were reported
[2391, [521, [941,
[
1571.
By circumventing the initial complexity barrier of
CELP, these papers indicated that CELP is more than a
theoretical curiousity, but rather an algorithm of potential
practical importance. It was quickly recognized that real-
time implementation of CELP was indeed feasible. The
number of studies of CELP coding algorithms has grown
steadily since
1986.
Numerous techniques for reducing
complexity and enhancing the performance of CELP coders
emerged in the next seven years, and CELP has found its
way into national and international standards for speech
coding. Some current speech coding algorithms are hybrids
of CELP and other coding approaches. Our definition of
CELP encompasses any coding algorithm that combines the
features
of
LPAS with some form of VQ for representing
the excitation signal.
Significant landmarks in the history of CELP are the
adoptions of several telecommunications standards for
speech coding based on the CELP approach. The first
of these was the development and adoption of the
U.S.
Federal Standard
1016,
a CELP algorithm operating at
4.8
kb/s, intended primarily for secure voice transmission and
incorporating various modifications and refinements of the
initial CELP concept. For a description of this standard,
see Campbell
et al.
[26].
Another important landmark is
the development of a particular CELP algorithm called
vector-sum excited linear prediction
(VSELP) by Gerson
and Jasiuk
[81]
which has been adopted as a standard for
North American TDMA digital cellular telephony and, in
a modified form, for the Japanese Digital Cellular (JDC)
TDMA standard. Very recently, the JDC has adopted a
half-rate standard for the Japanese TDMA digital cellular
system called
pitch synchronous innovation
CELP (PSI-
CELP)
[176]. In 1992,
the CCITT (now ITU-T) adopted
the
low-delay
CELP
(LD-CELP) algorithm, developed by
Chen
et al.,
[30], [34]
as an intemational standard for
16-
kb/s speech coding. Currently, the GSM is establishing a
standard for half-rate TDMA digital cellular systems in
Europe and the two remaining candidates for the speech-
coding component are both CELP algorithms. Also, the
Telephone Industry Association (TIA) is now evaluating
903
eight candidate algorithms for a North American half-rate
TDMA digital cellular standard, and most of the candidates
are CELP algorithms.
Numerous advances to CELP coding have been devel-
oped, to reduce complexity, increase robustness-to-channel
errors, and improve quality. Much of this effort is oriented
to improving the excitation signal while controlling or
reducing the excitation search complexity. Some advances
have been made to improve the modeling of the short-term
synthesis filter or the quantization of the linear predictor
parameters. Below we highlight some of the more important
improvements to CELP coding.
B .
Closed-Loop Search
In the initial description of CELP [13], [207] only the
basic conceptual idea was reported without regard to a
practical mechanism for performing the encoder’s search
operation. Subsequently, some essential details were re-
ported in 1986 for efficiently handling the search operation.
In particular, it is efficient to separately compute the
zero-
input response
(the ringing) of the synthesis filter after the
previously selected optimal excitation vector has passed
through it. After accounting for the effect of this ringing,
the search for the next excitation vector can be conducted
based on a zero initial condition assumption; thus the
zero-
state response
of the synthesis filter is computed for each
candidate code vector [239], [52]. This use of superposition
greatly simplifies the codebook search process.
In [13], the gain scaling factor of the excitation vector
was determined from the energy
of
the original speech
prediction error signal, (called the
residual).
The residual
is obtained after both short-term prediction and pitch pre-
diction are performed. Subsequently, it was recognized that
a
closed-loop gain computation
is easily done so that, in
effect, the selection of both gain and code vector is jointly
optimized in the analysis-by-synthesis process [7]. This
leads to an important quality improvement.
C.
Excitation Codebooks
In the
stochastic
excitation codebook initially proposed
for CELP, each element of each code vector was an
independently generated Gaussian random number. The
resulting unstructured character of the codebook is not
amenable to efficient search methods, and exhaustive search
requires a very high complexity. A variety of structural
constraints on the excitation codebook have been intro-
duced to achieve one or more of the following features:
reduced search complexity, reduced storage space, reduced
sensitivity to channel errors, and increased speech quality.
Some of the key innovations are summarized here.
An
overlapped
codebook technique, due to Lin, substan-
tially reduces computation as well as codebook storage
[157]. In this method, each code vector of the excitation
codebook is a block of samples taken from a larger se-
quence of random samples by performing a cyclical shift of
one or more samples on the sequence. Thus if a one-sample
shift is used, a sequence of 1024 Gaussian samples can
904
generate 1024 distinct code vectors of dimension 40. The
effect of filtering each such excitation vector through the
synthesis filter is achieved by
a
single convolution operation
on the sequence. The search for the optimal code vector in
an overlapped codebook is further accelerated by the use of
a modified error weighting criterion introduced by Kleijn
et
a f . ,
allowing a fast recursive computation [136].
A widely used approach to reduce search complexity and
storage space is the use of
sparse
excitation codebooks
where most
of
the code vector elements have the value zero.
This is usually done in combination with other constraints
on the magnitude
or
location of the nonzero elements.
Sparse codebooks for CELP were proposed by Davidson
and Gersho [52] and Lin [157]. Sparse codebooks can
also be combined with overlapped codebooks. In
ternary
codebooks, proposed by Lin [157], and later Xydeas [254],
the nonzero entries of a sparse codebook are forced to be 1
or
-
1.
This can be achieved by hard-limiting the nonzero
values of a stochastic codebook or by directly designing
specific ternary codebook structures. Salami [200] proposed
fixed regularly spaced positions for the nonzero entries so
that a short binary word can directly specify the nonzero
polarities, eliminating the need for a stored excitation code-
book. This technique, called BCELP (for binary CELP),
reduces complexity and sensitivity to channel errors while
reportedly maintaining good quality. Sparse excitation sig-
nals were, of course, central to the technique of MP-LPC
and preceded CELP. Attempts to improve MP-LPC by
using a codebook of sparse excitation vectors may also be
viewed as complexity-reduction methods for CELP. (See in
particular b o o n
at al.
[145] and Hernandez-Gomez [71].)
Many other sparse codebook schemes have been proposed,
for example, Kipper
et al.
[128] and Akamini and Miseki
r31.
Another family of excitation codebook methods are based
on
lattices,
regularly spaced arrays of points in multiple
dimensions. Lattice
VQ
was proposed in [741 and [751
and extensively studied by various researchers. (See in
particular, Gibson and Sayood [84] and Jeong and Gibson
[
1161.) Codebook storage is eliminated since lattices are
easily generated and suitable mappings between lattice
points (code vectors) and binary words are known. The
use of lattice structures for excitation codebooks in CELP
has been proposed by Adoul
et al.
who coined the phrase
algebraic codebooks
[
11. In their work, lattice codebooks
with all code vectors having the same energy are generated
from standard error-correcting codes by replacing the binary
symbols 1 and
0
with +1 and -1, respectively. For
additional examples of algebraic codebooks for CELP, see
[150], [104], [148], and [63].
Le
Guyader
et al.
[155] use
binary-valued code vectors of unit magnitude
so
that binary
words directly map into excitations without any codebook
storage.
An alternative way of generating excitation codebooks
is by designing them directly from actual speech files
with a suitable
training
algorithm. This is the standard
approach to codebook generation in vector quantization
(VQ)
[76]. However, a closed-loop design method is needed
+
PROCEEDINGS
OF
THE
IEEE,
VOL.
82, NO.
6,
JUNE 1994
评论