Speech production for multi-modal speech synthesis
Main responsibility for the research area:
Prof Björn Granström
Fully flexible text-to-speech systems of near-human quality must be based on a deeper understanding of the speech
production process. In current models, the description of speech production is restricted to the acoustic level,
with little regard for articulation. Consequently, many of these models lack an actual basis in reality. The advantages,
on the other hand, are the relative ease with which they can be verified through direct acoustic comparison with
natural speech, and their calculation efficiency. However, there is a great potential for augmenting the realism
by using models of higher complexity.
Moreover, to fully use the speech databases currently under construction, automatic parameter estimation methods
will be of great importance.
In this work, an articulatory approach will be taken, and the choice of synthesis units will be based on optimizations
of speech database. Additionally, the current 3-D modeling of a face and speech organs will continue, focusing
on augmented flexibility, naturalness, and perception including interactive, non-articulatory facial gestures.
Important Research Topics for Stage 3
- improve acoustic speech synthesis in order to integrate information from dialogue systems and expanded linguistic
- complete the 3-D model of face and speech organs that generates articulatory synthesis for use in animated
- model non-articulatory facial gestures typical for interactive speech