Holger
|
Multimodal speech synthesis, or audio-visual
speech synthesis, deals with automatic generation of voice and facial animation
from arbitrary text. Applications span from research on human communication
and perception, via tools for the hearing impaired, to spoken and multimodal
agent-based user interfaces. A view of the face can improve intelligibility
of both natural and synthetic speech significantly, especially under degraded
acoustic conditions. Moreover, facial expressions can signal emotion, add
emphasis to the speech and support the interaction in a dialogue situation.
3D TALKING HEADS Our approach to audio-visual speech synthesis is based on parametric descriptions of both the acoustic and visual speech modalities, in a text-to-speech framework. The visual speech synthesis uses 3D polygon models, that are parametrically articulated and deformed. We have a flexible architecture that allows us to create new characters either by adopting a static wireframe model and specifying the required deformation parameters for that model, or by sculpting and reshaping an already parameterized model. A few of our talking heads created to this date can be seen at the left. We are currently working on improving dynamic articulation modelling, using movement data recorded using an optical tracking system. AIDS FOR HEARING IMPAIRED We are investigating possible uses of audio-visual speech synthesis as a tool for the hard of hearing. This is done in the Synface and Teleface projects. Within the Teleface project we have carried out extensive audio-visual intelligibility tests, where synthetic and natural voices and faces are tested at varying signal-to-noise ratios, with hearing impaired as well as normal hearing subjects, that demonstrate the potential value of a communication aid based on multimodal synthesis technology. COMMUNICATIVE CHARACTERS & DIALOGUE SYSTEMS Our research is also concerned with interactive aspects of visual speech communication, such as generation of believable facial expressions and gestures for animated talking agents in multimodal spoken dialogue systems. This research is currently being carried out in the framework of the AdApt system. The agent has an associated library of gestures representing communicative functions that can be used in the dialoge. Actions are triggered by the state of the agent in such a way that appropriate gestures are automatically selected when the agent enters, exits or remains in a particular state (examples of states might be speaking or attending etc). State transitions and gestures are controlled by the dialogue manager and are communicated using XML markup. PROJECTSSynface. Synthesized talking face derived from speech for hearing disabled users of voice channelsTeleface. Multimodal Speech Communication for the Hearing Impaired The 3D Vocal Tract Project. A three-dimensional vocal tract model for articulatory and visual speech synthesis. PER. The doorkeeper Per is the user interface for a speaker verification demonstrator at CTT. AdApt. The continuation of the August project. WaveSurfer/SpeechSurfer. Speech toolkit development. PUBLICATIONSList of publications from Centre for Speech Technology (CTT)PEOPLEJonas BeskowMagnus Nordstrand David House Björn Granström DEMONSTRATIONSComputer conversation? Check out this video from the August dialogue system. This is a video playback of a real interaction between August and a user (mpg-format 1.44M) or (mpg-format 12M) (both are Swedish only). More August related videos can be found on the August HomepageFor more video clips featuring our talking heads, please see our video page. LINKSFacial animation page at UC Santa Cruz, USATalking Heads website hosted by Haskins Laboratories, New Haven, CT, USA. Back up to Department of Speech, Music and Hearing, KTH(While we're at it, here's a link to the legendary music group Talking Heads) |