Contact




InkSynt

Incremental Text-To-Speech Conversion

We will develop an incremental text-to-speech converter (TTS), which can be used in dynamically changing situations. In the project we will collect speech databases of how people read incrementally displayed text aloud, which will serve as the basis for the development of methods for incremental TTS with the correct prosody. We will also develop methods for situation sensitive speech generation. The result will be a TTS incremental with human-like abilities to read text on-the-fly with correct prosodic realization.

Today´s speech-based services make it possible for people to use their voices to search for information and perform basic tasks. With the launch of mobile dialogue systems such as Apple´s mobile assistant Siri, expectations of spoken dialogue systems has been increased significantly. Unfortunately these systems still use speech synthesizers that are trained on read speech. In the proposed project, we will develop an incremental text-to-speech converter (TTS), which can be used in dynamically changing situations. In the project we will collect speech databases of how people read incrementally displayed text aloud, which will serve as the basis for the development of methods for incremental TTS with the correct prosody. These methods will include the prediction of how an incrementally incoming text will continue. For the incremental TTS to be useful in mediated human-human communication, we also have to deal with slow text input and corrections. For this purpose, we will develop methods for situation sensitive speech generation. The result will be a TTS incremental with human-like abilities to read text on-the-fly with correct prosodic realization. This will be vital in future humanlike conversational interfaces and in social robots, as well as for computer-mediated human-human communication, real-time speech-to-speech translation, speech prosthesis, second language learning systems, and in voices for interactive animated characters in virtual environments and computer games.

Group: Speech Communication and Technology

Staff:
Joakim Gustafson (Project leader)
Jonas Beskow
Bajibabu Bollepalli
Jens Edlund

Funding: VR (2013-4935)

Duration: 2014-01-01 - 2018-01-01

Related publications:







Published by: TMH, Speech, Music and Hearing
Webmaster, webmaster@speech.kth.se

Last updated: 2012-11-09