Introducing interactional phenomena in speech synthesis
The project will develop and verify ways of including interactional phenomena in speech synthesis, resulting in well-described and tested methods for synthesizing these phenomena in such a way that they can be employed to recreate human interactional behaviour.
Even though most of today´s speech synthesizers are designed to be efficient reading machines, they are frequently used as talking
machines. This will limit the success of spoken dialogue systems in future domains like social and collaborative applications,
education and entertainment. We aim at laying the foundation for synthesizers to be used by tomorrow´s conversational systems, by
implementing and evaluating interactional cues that are missing in current state-of-the art systems. The outcome of the project is a
set of specifications describing how to synthesise these events using different methods, as well as evaluation of limitations and
benefits of the synthesis methods with regards to interactional phenomena. In order to evaluate the interactional effect of the
synthesized cues we will use two scenarios: the attentive speaker, where the synthesis is used as a virtual narrator in a dynamic
environment, and the active listener- where the synthesis is used in an information-gathering system. In particular the project will
advance the state of the art in the following ways: we will identify interactional tokens that are missing in present systems and
implement these using two synthesis methods; we will implement several strategies for transitions between speech and silence,
which is expected to facilitate smooth speaker shifts at appropriate places and we will verify the benefits of the above listed
interactional phenomena in human-computer dialogue settings.
Group: Speech Communication and Technology
Joakim Gustafson (Project leader)
Funding: VR (2009-4291)
Duration: 2010-07-01 - 2013-12-31
Keywords: Conversation, Dialogue, Turn-taking, Feedback, Interaction control, Prosody,
ISCA Workshop on Non-Linear Speech Processing 2013. [pdf] (2013). Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks. In
Speech Communication, 55(3), 451-469. [link] (2013). Semi-supervised methods for exploring the acoustics of simple productive feedback.
The Fourth Swedish Language Technology Conference. Lund, Sweden. (2012). HMM based speech synthesis system for Swedish Language. In
Proc. of Nordic Prosody XI. Tartu, Estonia. (2012). Unconventional methods in perception experiments. In
Fonetik 2012. Göteborg, Sweden. [pdf] (2012). Towards letting machines humming in the right way - prosodic analysis of six functions of short feedback tokens in English. In
Modelling Paralinguistic Conversational Interaction: Towards social awareness in spoken human-machine dialogue. Doctoral dissertation, KTH School of Computer Science and Communication. [link] (2012).
The Interdisciplinary Workshop on Feedback Behaviors in Dialog. [abstract] [pdf] (2012). Cues to perceived functions of acted and spontaneous feedback expressions. In
The Interdisciplinary Workshop on Feedback Behaviors in Dialog. [abstract] [pdf] (2012). Exploring the implications for feedback of a neurocognitive theory of overlapped speech. In
Proc. of Interspeech 2012. Portland, Oregon, US. [abstract] [pdf] (2012). Gaze Patterns in Turn-Taking. In
In search of the conversational homunculus - serving to understand spoken human face-to-face interaction. Doctoral dissertation, KTH. [abstract] [pdf] (2011).
TMH-QPSR, 51(1), 57-60. [abstract] [pdf] (2011). Visualizing prosodic densities and contours: Forming one from many.
INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy. [abstract] [pdf] (2011). Tracking pitch contours using minimum jerk trajectories. In
INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy. [abstract] [pdf] (2011). A Dual Channel Coupled Decoder for Fillers and Feedback. In
INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy.. [abstract] [pdf] (2011). Predicting Speaker Changes and Listener Responses With And Without Eye-contact. In
Proceedings of SLTC 2010. Linköping, Sweden. [pdf] (2010). Modelling humanlike conversational behaviour. In
Proc. of Fonetik 2010 (pp. 7-10). Lund, Sweden. [abstract] [pdf] (2010). Research focus: Interactional aspects of spoken face-to-face communication. In
Proceedings of SLTC 2010. Linköping, Sweden. (2010). Directing conversation using the prosody of mm and mhm. In
Proceedings of DiSS-LPSS Joint Workshop 2010. Tokyo, Japan. [pdf] (2010). Prosodic cues to engagement in non-lexical response tokens in Swedish. In
Proceedings of DiSS-LPSS Joint Workshop 2010. Tokyo, Japan. [pdf] (2010). Modeling Conversational Interaction Using Coupled Markov Chains. In
Fonetik 2010. Lund. (2010). Prosodic Characterization and Automatic Classification of Conversational Grunts in Swedish. In
INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association (pp. 2562-2565). Makuhari, Chiba, Japan. [pdf]
(2010). The Prosody of Swedish Conversational Grunts. In