Variable-length vocal tract modeling for speech synthesis
by
Siddharth Mathur
[smathur at ece dot arizona dot edu]

Online version of Master's Thesis submitted to the
Department of Electrical and Computer Engineering, University of Arizona
December 2003

Co-advisors:
Dr. Jeffrey Rodriguez
Dr. Brad H. Story

This work was supported by NIH grant NIDCD R01-DC04789, PI: Brad H. Story

Abstract

Articulatory speech synthesis is a method of generating speech by simulating the acoustics of the airway inside a human vocal tract. Different sounds can be produced by the movement of articulators such as the jaws, the tongue and the lips. The Kelly-Lochbaum (KL) model of speech synthesis uses cascaded equal-length tubes of different cross-sectional areas to approximate the shape and length of the airway. Discretized wave equations are used to model propagation and scattering effects, and generate a speech signal. Since the length of individual tube segments is fixed, the total length of the tract is always quantized to an integer multiple of the basic unit. However, for generation of a certain class of speech sounds, the model should be capable of simulating continuous, rather than quantized vocal tract lengths. Special signal processing techniques are thus needed in order to extend the KL model.

The aim of this work is to outline efficient algorithms to model static vowel shapes so that their total tract length can be any continuous value, not just an integer multiple of the segment length. The novel contribution is a set of algorithms that are extensions to the half-sample delay KL model. A realistic model of a neutral vowel with 44 segments is used to validate the techniques described. The results obtained after modeling an elongation of the lips, lowering of the larynx and a length change in an intermediate segment of the vocal tract are illustrated. It is also shown that with a spatially over-sampled vocal tract model, the performance of a simple linear interpolator is adequate for accurate modeling of fractional segment lengths.

All files below are in PDF format, created from GSview using its v 1.4 compatibilty level.



[Back to my homepage]
Last update: December 29, 2003
Please send any corrections, comments to smathur [at] ece.arizona.edu