Audiovisual to Articulatory Speech Inversion

Audio-visual-to-articulatory inversion consists of recovering the vocal tract shape (from vocal folds to lips) dynamics from the acoustical speech signal, supplemented by image analysis of the speaker’s face. Being able to recover this information automatically would be a major break-through in speech research and technology, as a vocal tract representation of a speech signal would be both beneficial from a theoretical point of view and practically useful in many speech processing applications (language learning, automatic speech processing, speech coding, speech therapy, film industry …). The project involves two kinds of tasks: the development of inversion methods and the construction of an articulatory database, comprising vocal tract images together with the speech signal for several speakers.

Group: Speech Communication and Technology

Björn Granström (Project leader)
Anne-Marie Öster
Gopal Ananthakrishnan
Daniel Neiberg
Olov Engwall

Funding: EU

Duration: 2005 - 2008


Related publications:

Published by: TMH, Speech, Music and Hearing

Last updated: 2012-11-09