KTH Speech Synthesis demo page

Publications

2024

Székely, É., Higginbotham, J. and Possemato, F. (2024) "Voice and Choice, Investigating the Role of Prosodic Variation in Request Compliance and Perceived Politeness Using Conversational TTS", Proceedings of SIGDial 2024
- (audio samples)

Christina Tånnander, Shivam Mehta, Jonas Beskow, Jens Edlund (2024) "Beyond graphemes and phonemes: continuous phonological features in neural text-to-speech synthesis", Proceedings of Interspeech, 2024.
- audio samples

Mehta, S., Tu, R., Beskow, J., Székely, É. & Eje Henter, G. (2024) Matcha-TTS: A fast TTS architecture with conditional flow matching Proceedings of ICASSP 2024
- (pdf)
- (code and audio samples)

Mehta, S., Tu, R., Alexanderson, S., Beskow, J., Székely, É. & Eje Henter, G. (2024) Unified speech and gesture synthesis using flow matching Proceedings of ICASSP 2024
- (pdf)
- (audio samples)

Lameris, H., Székely, É. & Gustafson, J. (2024) "The Role of Creaky Voice in Turn Taking and the Perception of Speaker Stance: Experiments Using Controllable TTS", The 2024 joint conferences on Computational Linguistics and Language Resources and Evaluation ,Turine, Italy
- (audio samples)

Székely, É. and Hope, M. (2024) "An inclusive approach to creating a palette of synthetic voices for gender diversity", Proceedings of Interspeech 2024
- (audio samples)

Wang,S., Székely, É. & Gustafson, J. (2024) "Contextual Interactive Evaluation of TTS Models in Dialogue Systems" Proceedings of Interspeech 2024, Kos, Greece
- (audio samples)

2023

Székely, É., Wang, S. and Gustafson, J. (2023) "So-to-Speak: an exploratory platform for investigating the interplay between style and prosody in TTS", Proceedings of Interspeech 2023, Dublin, Ireland
- pdf
- video demo

Gustafson, J., Székely, É and Beskow, J. (2023) "Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters" Proceedings of 23rd ACM International Conference on Interlligent Virtual Agent (IVA 2023), Wurzburg, Germany
- pdf
- audio and video samples)

Mehta, S., Wang, S., Alexanderson, S., Beskow, J., Székely, É. and Henter, G. (2023) "Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis", Proceeding of the 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, France
- pdf
- code, audio and video samples

Miniotaite, J., Wang, S., Beskow, J., Gustafson, J., Székely, É. and Pereira, A. (2023) "Hey robot, it´s not what you say, it´s how you say it", Proceeding of the 32nd IEEE International Conference on Robot and Human Interactive Communication, IEEE RO-MAN 2023, Busan, South Korea
- pdf
- audio and videos samples

Lameris,H., Mehta, S., Eje Henter, G., Gustafson, J. and Székely, É (2023) "Prosody-controllable spontaneous TTS with neural HMMs", Proceedings of the ICASSP 2023, Rhodes, Greece.
- pdf
- audio samples

Lameris, H., Gustafson, J. and Székely, É (2023) "Beyond Style: Synthesizing Speech with Pragmatic Functions" Proceedings of Interspeech 2023, Dublin, Ireland
- pdf
- audio samples

Mehta, S., Kirkland, A., Lameris, H., Beskow, J., Székely, É. and Henter, G. (2023) "OverFlow: Putting flows on top of neural transducers for better TTS", Proceedings of ICASSP 2023, Rhodes, Greece.
- pdf
- audio samples

Kirkland, A., Gustafson, J. and Székely, É (2023) "Pardon my disfluency: The impact of disfluency effects on the perception of speaker competence and confidence", Proceedings of Interspeech 2023, Dublin, Ireland
- pdf
- audio samples)

Székely, É., Gustafson, J.and Torre, I. (2023) "Prosody-controllable gender-ambiguous speech synthesis: a tool for investigating implicit bias in speech perception", Proceedings of Interspeech 2023, Dublin, Ireland
- pdf
- audio samples

Wang, S., Henter, G., Gustafson, J. and Székely, É.. (2023) "A comparative study of self-supervised speech representations in read and spontaneous TTS", Proceedings of CASSP 2023 Satellite Workshop: SASB 2023: Self-Supervision in Audio, Speech and Beyond
- (audio samples)

Wang, S., Henter, G., Gustafson, J. and Székely, É. (2023) "On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis", Proceeding of the 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, France
- pdf
- (audio samples)

2022

Mehta, S., Kirkland, A., Lameris, H., Beskow, J., Székely, É., Henter, G. E. (2022): OverFlow: Putting flows on top of neural transducers for better TTS. arxiv.org/abs/2211.06892.
- pdf
- audio samples

Kirkland, A., Lameris, H., Gustafson, J., and Székely, É. (2022) "Where's the uh, hesitation? The interplay between filled pause location, speech rate and fundamental frequency", Proceedings of Interspeech 2022, Incheon, Korea (nominated for best paper award).
- pdf
- audio samples

Wang, S., Gustafson, J., and Székely, É. (2022) "Evaluating Sampling-based Filler Insertion with Spontaneous TTS", Proceedings of 13th Edition of the Language Resources and Evaluation Conference (LREC 2022), Marseille.
- pdf
- audio samples

Mehta, S., Székely, É., Beskow, J. and Henter, G. (2022) "Neural HMMs are all you need (for high-quality attention-free TTS)". In Proceedings of the 47th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022), Singapore.
- pdf
- audio samples

Beck, G. T. D., Wennberg, U., Malisz, Z., and Henter, G. E. (2022) "Wavebender GAN: An architecture for phonetically meaningful speech manipulation". Proceedings of the 47th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022), Singapore.
- pdf
- audio samples

2021

Wang, S., Alexanderson, A., Gustafson, J., Beskow, J., Henter, G. and Székely, É. (2021) "Integrated Speech and Gesture Synthesis", Proceedings of 23rd ACM International Conference on Multimodal Interaction (ICMI 2021), Montreal
- pdf
- video samples

Gustafson, J., Beskow, J. and Székely, É. (2021) "Personality in the mix - investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis" Proceedings of 11th Speech Synthesis Workshop (SSW11), Budapest
- pdf
- audio samples

Kirkland, A., Wlodarczak, M., Gustafson, J.and Székely, É. (2021) "Perception of smiling voice in spontaneous speech synthesis" Proceedings of 11th Speech Synthesis Workshop (SSW11), Budapest
- pdf
- audio samples

2020

Székely, É. Edlund, J. and Gustafson, J. (2020) "Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis" Proceedings of 12th Edition of the Language Resources and Evaluation Conference (LREC 2020), Budapest
- pdf
- audio samples

Székely, É., Henter, G., Beskow, J. and Gustafson, J. (2020) "Breathing and speech planning in spontaneous speech synthesis" Proceedings of the 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020), Barcelona, Spain.
- pdf
- audio samples

2019

Székely, É., Henter, G., Beskow, J. and Gustafson, J. (2019) "How to train your fillers: uh and um in spontaneous speech synthesis", Proceedings of the 10th ISCA Speech Synthesis Workshop, September 20-22, (SSW10) Vienna, Austria
- pdf
- audio samples

Székely, É., Henter, G., Beskow, J. and Gustafson, J. (2019) "Spontaneous Conversational Speech Synthesis from Found Data", Proceedings of the 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019), September 15-19, Graz, Austria.
- pdf
- audio samples

Alexanderson, S., Székely, É., Henter, G., Kucherenko, T. and Beskow, J. (2020) "Generating coherent spontaneous speech and gesture from text", Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents (IVA 20)
- paper

Székely, É., Henter, G., Beskow, J. and Gustafson, J. (2019) "Off the Cuff: Exploring Extemporaneous Speech Delivery with TTS", Reciever of the Best Show and Tell Demo Award. Proceedings of the 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)
- pdf
- video demo

Székely, É., Wagner, P. and Gustafson, J. (2018) "The wrylie-board: mapping acoustic space of expressive feedback to attitude markers", Demo at IEEE Workshop on Spoken Language Technology (SLT 2018)
- pdf
- video demo

Last modified: Mon Oct 25 16:29:33 CEST 2021