Contact




Svensk version

VATS

What turns speech into conversation?

The project What turns speech into conversation? (Vad gör tal till samtal?) investigates features that are specific to conversations among humans - the very features that turn speech into conversation - such as how the speakers know when to speak and when not to.

The project takes as its starting point that while conversation must be considered the primary kind of speech, we are still far better at modelling monologue than dialogue, in theory as well as for speech technology applications. There are also good reasons to assume that conversation contains a number of features that are not found in other kinds of speech, including, among other things, the active cooperation among interlocutors to control the interaction, and to establish common ground. Through this project, we hope to improve the situation by investigating features that are specific to human-human conversation – features that turns speech into conversation. We will focus on acoustic and prosodic aspects of such features.

Group: Speech Communication and Technology

Staff:
Mattias Heldner (Project leader)
Jens Edlund

Funding: VR (2006-2172)

Duration: 2007-01-01 - 2009-12-31

Website: http://www.speech.kth.se/vats

KTH research database: http://researchprojects.kth.se/index.php/kb_1/io_10048/io.html

Keywords: Conversation, Dialogue, Turn-taking, Feedback, Interaction control, Prosody,

Related publications:

2010

Heldner, M., & Edlund, J. (2010). Pauses, gaps and overlaps in conversations. Journal of Phonetics, 38, 555-568. [abstract] [pdf]

Laskowski, K., & Edlund, J. (2010). A Snack implementation and Tcl/Tk interface to the fundamental frequency variation spectrum algorithm. In Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odjik, J., Piperidis, S., Rosner, M., & Tapias, D. (Eds.), Proc. of the Seventh conference on International Language Resources and Evaluation (LREC'10) (pp. 3742 - 3749). Valetta, Malta. [abstract] [pdf]

2009

Beskow, J., Carlson, R., Edlund, J., Granström, B., Heldner, M., Hjalmarsson, A., & Skantze, G. (2009). Multimodal Interaction Control. In Waibel, A., & Stiefelhagen, R. (Eds.), Computers in the Human Interaction Loop (pp. 143-158). Berlin/Heidelberg: Springer. [pdf]

Edlund, J., & Beskow, J. (2009). MushyPeek - a framework for online investigation of audiovisual dialogue phenomena. Language and Speech, 52(2-3), 351-367. [abstract]

Edlund, J., Heldner, M., & Hirschberg, J. (2009). Pause and gap length in face-to-face interaction. In Proc. of Interspeech 2009. Brighton, UK. [abstract] [pdf]

Edlund, J., Heldner, M., & Pelcé, A. (2009). Prosodic features of very short utterances in dialogue. In Vainio, M., Aulanko, R., & Aaltonen, O. (Eds.), Nordic Prosody - Proceedings of the Xth Conference (pp. 57 - 68). Frankfurt am Main: Peter Lang. [pdf]

Heldner, M., Edlund, J., Laskowski, K., & Pelcé, A. (2009). Prosodic features in the vicinity of pauses, gaps and overlaps. In Vainio, M., Aulanko, R., & Aaltonen, O. (Eds.), Nordic Prosody - Proceedings of the Xth Conference (pp. 95 - 106). Frankfurt am Main: Peter Lang. [abstract] [pdf]

Hincks, R., & Edlund, J. (2009). Using speech technology to promote increased pitch variation in oral presentations. In Proc. of SLaTE Workshop on Speech and Language Technology in Education. Wroxall, UK. [abstract] [pdf]

Hincks, R., & Edlund, J. (2009). Promoting increased pitch variation in oral presentations with transient visual feedback. Language Learning & Technology, 13(3), 32-50. [abstract] [pdf]

Laskowski, K., Heldner, M., & Edlund, J. (2009). A general-purpose 32 ms prosodic vector for Hidden Markov Modeling. In Proc. of Interspeech 2009. Brighton, UK. [abstract] [pdf]

Laskowski, K., Heldner, M., & Edlund, J. (2009). Exploring the prosody of floor mechanisms in English using the fundamental frequency variation spectrum. In Proceedings of the 2009 European Signal Processing Conference (EUSIPCO-2009). Glasgow, Scotland. [abstract] [pdf]

2008

Edlund, J., Gustafson, J., Heldner, M., & Hjalmarsson, A. (2008). Towards human-like spoken dialogue systems. Speech Communication, 50(8-9), 630-645. [abstract] [pdf]

Gustafson, J., & Edlund, J. (2008). expros: a toolkit for exploratory experimentation with prosody in customized diphone voices. In Proceedings of Perception and Interactive Technologies for Speech-Based Systems (PIT 2008) (pp. 293-296). Berlin/Heidelberg: Springer. [abstract] [pdf]

Gustafson, J., & Edlund, J. (2008). EXPROS: Tools for exploratory experimentation with prosody. In Proceedings of FONETIK 2008 (pp. 17-20). Gothenburg, Sweden. [abstract] [pdf]

Gustafson, J., Heldner, M., & Edlund, J. (2008). Potential benefits of human-like dialogue behaviour in the call routing domain. In Proceedings of Perception and Interactive Technologies for Speech-Based Systems (PIT 2008) (pp. 240-251). Berlin/Heidelberg: Springer. [abstract] [pdf]

Hjalmarsson, A., & Edlund, J. (2008). Human-likeness in utterance generation: effects of variability. In Perception in Multimodal Dialogue Systems - Proceedings of the 4th IEEE Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-Based Systems, PIT 2008, Kloster Irsee, Germany, June 16-18, 2008. (pp. 252-255). Berlin/Heidelberg: Springer. [abstract] [pdf]

Laskowski, K., Edlund, J., & Heldner, M. (2008). An instantaneous vector representation of delta pitch for speaker-change prediction in conversational dialogue systems. In Proceedings ICASSP 2008 (pp. 5041-5044). Las Vegas, Nevada, US. [abstract] [pdf]

Laskowski, K., Edlund, J., & Heldner, M. (2008). Learning prosodic sequences using the fundamental frequency variation spectrum. In Proceedings of the Speech Prosody 2008 Conference (pp. 151-154). Campinas, Brazil: Editora RG/CNPq. [abstract] [pdf]

Laskowski, K., Heldner, M., & Edlund, J. (2008). The fundamental frequency variation spectrum. In Proceedings of FONETIK 2008 (pp. 29-32). Gothenburg, Sweden: Department of Linguistics, University of Gothenburg. [abstract] [pdf]

Laskowski, K., Wölfel, M., Heldner, M., & Edlund, J. (2008). Computing the fundamental frequency variation spectrum in conversational spoken dialogue systems. In Proceedings of Acoustics'08 (pp. 3305-3310). Paris, France. [abstract] [pdf]

2007

Edlund, J., & Beskow, J. (2007). Pushy versus meek – using avatars to influence turn-taking behaviour. In Proceedings of Interspeech 2007. Antwerp, Belgium. [abstract] [pdf]

Edlund, J., Beskow, J., & Heldner, M. (2007). MushyPeek – an experiment framework for controlled investigation of human-human interaction control behaviour. Proceedings of Fonetik, TMH-QPSR, 50(1), 61-64. [abstract] [pdf]

Edlund, J., & Heldner, M. (2007). Underpinning /nailon/ - automatic estimation of pitch range and speaker relative pitch. In Müller, C. (Ed.), Speaker Classification II, Selected Projects (pp. 229-242). Springer. [abstract] [pdf]

Heldner, M., & Edlund, J. (2007). What turns speech into conversation? A project description. Proceedings of Fonetik, TMH-QPSR, 50(1), 45-48. [abstract] [pdf]







Published by: TMH, Speech, Music and Hearing
Webmaster, webmaster@speech.kth.se

Last updated: 2012-11-09