Speech production for multi-modal speech synthesis

Main responsibility for the research area:
Prof Björn Granström

Fully flexible text-to-speech systems of near-human quality must be based on a deeper understanding of the speech production process. In current models, the description of speech production is restricted to the acoustic level, with little regard for articulation. Consequently, many of these models lack an actual basis in reality. The advantages, on the other hand, are the relative ease with which they can be verified through direct acoustic comparison with natural speech, and their calculation efficiency. However, there is a great potential for augmenting the realism by using models of higher complexity.

Moreover, to fully use the speech databases currently under construction, automatic parameter estimation methods will be of great importance.
In this work, an articulatory approach will be taken, and the choice of synthesis units will be based on optimizations of speech database. Additionally, the current 3-D modeling of a face and speech organs will continue, focusing on augmented flexibility, naturalness, and perception including interactive, non-articulatory facial gestures.

Important Research Topics for Stage 3

  • improve acoustic speech synthesis in order to integrate information from dialogue systems and expanded linguistic analysis
  • complete the 3-D model of face and speech organs that generates articulatory synthesis for use in animated speaking agents
  • model non-articulatory facial gestures typical for interactive speech

Published by: TMH, Speech, Music and Hearing

Last updated: 2006-12-05