Contact




SAMSYNT

Introducing interactional phenomena in speech synthesis

The project will develop and verify ways of including interactional phenomena in speech synthesis, resulting in well-described and tested methods for synthesizing these phenomena in such a way that they can be employed to recreate human interactional behaviour.

Even though most of today´s speech synthesizers are designed to be efficient reading machines, they are frequently used as talking machines. This will limit the success of spoken dialogue systems in future domains like social and collaborative applications, education and entertainment. We aim at laying the foundation for synthesizers to be used by tomorrow´s conversational systems, by implementing and evaluating interactional cues that are missing in current state-of-the art systems. The outcome of the project is a set of specifications describing how to synthesise these events using different methods, as well as evaluation of limitations and benefits of the synthesis methods with regards to interactional phenomena. In order to evaluate the interactional effect of the synthesized cues we will use two scenarios: the attentive speaker, where the synthesis is used as a virtual narrator in a dynamic environment, and the active listener- where the synthesis is used in an information-gathering system. In particular the project will advance the state of the art in the following ways: we will identify interactional tokens that are missing in present systems and implement these using two synthesis methods; we will implement several strategies for transitions between speech and silence, which is expected to facilitate smooth speaker shifts at appropriate places and we will verify the benefits of the above listed interactional phenomena in human-computer dialogue settings.

Group: Speech Communication and Technology

Staff:
Joakim Gustafson (Project leader)
Jonas Beskow
Jens Edlund
Daniel Neiberg

Funding: VR (2009-4291)

Duration: 2010-07-01 - 2013-12-31

Website: http://www.speech.kth.se/samsynt

Keywords: Conversation, Dialogue, Turn-taking, Feedback, Interaction control, Prosody,

Related publications:

2013

Bollepalli, B., Beskow, J., & Gustafson, J. (2013). Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks. In ISCA Workshop on Non-Linear Speech Processing 2013. [pdf]

Neiberg, D., Salvi, G., & Gustafson, J. (2013). Semi-supervised methods for exploring the acoustics of simple productive feedback. Speech Communication, 55(3), 451-469. [link]

2012

Bollepalli, B., Beskow, J., & Gustafson, J. (2012). HMM based speech synthesis system for Swedish Language. In The Fourth Swedish Language Technology Conference. Lund, Sweden.

Edlund, J., Hjalmarsson, A., & Tånnander, C. (2012). Unconventional methods in perception experiments. In Proc. of Nordic Prosody XI. Tartu, Estonia.

Neiberg, D., & Gustafson, J. (2012). Towards letting machines humming in the right way - prosodic analysis of six functions of short feedback tokens in English. In Fonetik 2012. Göteborg, Sweden. [pdf]

Neiberg, D. (2012). Modelling Paralinguistic Conversational Interaction: Towards social awareness in spoken human-machine dialogue. Doctoral dissertation, KTH School of Computer Science and Communication. [link]

Neiberg, D., & Gustafson, J. (2012). Cues to perceived functions of acted and spontaneous feedback expressions. In The Interdisciplinary Workshop on Feedback Behaviors in Dialog. [abstract] [pdf]

Neiberg, D., & Gustafson, J. (2012). Exploring the implications for feedback of a neurocognitive theory of overlapped speech. In The Interdisciplinary Workshop on Feedback Behaviors in Dialog. [abstract] [pdf]

Oertel, C., Wlodarczak, M., Edlund, J., Wagner, P., & Gustafson, J. (2012). Gaze Patterns in Turn-Taking. In Proc. of Interspeech 2012. Portland, Oregon, US. [abstract] [pdf]

2011

Edlund, J. (2011). In search of the conversational homunculus - serving to understand spoken human face-to-face interaction. Doctoral dissertation, KTH. [abstract] [pdf]

Neiberg, D. (2011). Visualizing prosodic densities and contours: Forming one from many. TMH-QPSR, 51(1), 57-60. [abstract] [pdf]

Neiberg, D., Ananthakrishnan, G., & Gustafson, J. (2011). Tracking pitch contours using minimum jerk trajectories. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy. [abstract] [pdf]

Neiberg, D., & Gustafson, J. (2011). A Dual Channel Coupled Decoder for Fillers and Feedback. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy. [abstract] [pdf]

Neiberg, D., & Gustafson, J. (2011). Predicting Speaker Changes and Listener Responses With And Without Eye-contact. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association. Florence, Italy.. [abstract] [pdf]

2010

Beskow, J., Edlund, J., Gustafson, J., Heldner, M., Hjalmarsson, A., & House, D. (2010). Modelling humanlike conversational behaviour. In Proceedings of SLTC 2010. Linköping, Sweden. [pdf]

Beskow, J., Edlund, J., Gustafson, J., Heldner, M., Hjalmarsson, A., & House, D. (2010). Research focus: Interactional aspects of spoken face-to-face communication. In Proc. of Fonetik 2010 (pp. 7-10). Lund, Sweden. [abstract] [pdf]

Gustafson, J., & Neiberg, D. (2010). Directing conversation using the prosody of mm and mhm. In Proceedings of SLTC 2010. Linköping, Sweden.

Gustafson, J., & Neiberg, D. (2010). Prosodic cues to engagement in non-lexical response tokens in Swedish. In Proceedings of DiSS-LPSS Joint Workshop 2010. Tokyo, Japan. [pdf]

Neiberg, D., & Gustafson, J. (2010). Modeling Conversational Interaction Using Coupled Markov Chains. In Proceedings of DiSS-LPSS Joint Workshop 2010. Tokyo, Japan. [pdf]

Neiberg, D., & Gustafson, J. (2010). Prosodic Characterization and Automatic Classification of Conversational Grunts in Swedish. In Fonetik 2010. Lund.

Neiberg, D., & Gustafson, J. (2010). The Prosody of Swedish Conversational Grunts. In INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association (pp. 2562-2565). Makuhari, Chiba, Japan. [pdf]







Published by: TMH, Speech, Music and Hearing
Webmaster, webmaster@speech.kth.se

Last updated: 2012-11-09