Dr. Jeffrey Rodriguez
Dr. Brad H. Story
Articulatory
speech synthesis is a method of generating speech by simulating the
acoustics of the airway inside a human vocal tract. Different sounds
can be produced by the movement of articulators such as the jaws, the
tongue and the lips. The Kelly-Lochbaum (KL) model of speech synthesis
uses cascaded equal-length tubes of different cross-sectional areas to
approximate the shape and length of the airway. Discretized wave
equations are used to model propagation and scattering effects, and
generate a speech signal. Since the length of individual tube segments
is fixed, the total length of the tract is always quantized to an
integer multiple of the basic unit. However, for generation of a
certain class of speech sounds, the model should be capable of
simulating continuous, rather than quantized vocal tract lengths.
Special signal processing techniques are thus needed in order to extend
the KL model.
The aim of this work is to outline efficient algorithms to model static
vowel shapes so that their total tract length can be any continuous
value, not just an integer multiple of the segment length. The novel
contribution is a set of algorithms that are extensions to the
half-sample delay KL model. A realistic model of a neutral vowel with
44 segments is used to validate the techniques described. The results
obtained after modeling an elongation of the lips, lowering of the
larynx and a length change in an intermediate segment of the vocal
tract are illustrated. It is also shown that with a spatially
over-sampled vocal tract model, the performance of a simple linear
interpolator is adequate for accurate modeling of fractional segment
lengths.
All files below are in PDF
format, created from GSview using its v 1.4 compatibilty level.
[
Back to my
homepage]
Last update: December 29, 2003
Please send any corrections, comments to smathur [at] ece.arizona.edu