STANCE

What we found

Key findings

Six cross-cutting themes drawn from the conclusions and discussions of STANCE publications.

🎚️

Prosody causally shapes social perception

Prosodic choices directly and measurably affect how speakers are judged. Politeness conveyed through voice increases request compliance. Mid-utterance filled pauses lower confidence ratings more than initial pauses. The duration of a discourse marker like "well" shifts its perceived polarity — functioning as hedging or reluctant agreement depending on context.

🗣️

Voice quality carries pragmatic meaning

Voice quality features signal specific interactional functions beyond style. Non-positional creaky voice makes speakers sound less certain and more turn-final. Breathy voice maps reliably onto two distinct constructions: self-directed musings and grounding attempts. Synthesized smiling voice — created by training near laughter data — is perceived as smiling without degrading naturalness.

🤔

Disfluency effects are nuanced, not simply negative

Disfluencies do not uniformly make speakers sound worse. Repetitions harm perceived competence and sincerity most, but explicit listener forewarning significantly reduces these effects — a finding relevant for L2 speakers and speakers with ASD. Pause-internal phonetic particles (including tongue clicks) reduce perceived certainty and can now be synthesized for controlled experiments.

⚧️

Synthetic voices both reflect and construct gender

Gender-ambiguous TTS can serve as a research instrument for exposing implicit bias in speech perception. A voice palette built on non-binary speakers is positively received by AAC users seeking gender-expansive options. Speech-LLMs encode gender bias at the semantic token level during text encoding — making speaker assignment a direct diagnostic window into training data bias.

🌐

AI voices may reshape how humans speak

Widespread interaction with synthetic voices constitutes an emergent sociolinguistic pressure. The influence is complex, mediated by identity, engagement, and context — and demands a transdisciplinary response. The legal, ethical, and societal dimensions of voice-based AI shaping human speech norms remain largely unaddressed.

📊

Standard TTS evaluation fails conversational speech

MOS does not correlate with SSL vocoding loss, and MOS predictors trained on read speech fail on spontaneous speech without fine-tuning. Discrete-token models like Bark produce natural prosody and spontaneous behaviours but are poor in robustness and speaker consistency — dimensions MOS cannot capture. Evaluation must be contextual to reflect what matters in real conversational use.

Research output

Publications

27 peer-reviewed publications from the STANCE project, spanning speech synthesis, speech perception, pragmatics, and inclusive voice design.

2025

Interspeech

VoiceQualityVC: A Voice Conversion System for Studying the Perceptual Effects of Voice Quality in Speech

Lameris, H., Gustafson, J., & Székely, É. (2025). Proceedings of Interspeech 2025, 2295–2299. https://doi.org/10.21437/Interspeech.2025-902

PDF

Interspeech

Voices of 'Cyborg Awesomeness': Posthuman Embodiment of Nonbinary Gender Expression in AI Speech Technologies

Hope, M., & Székely, É. (2025). Proceedings of Interspeech 2025, 689–693. https://doi.org/10.21437/Interspeech.2025-2229

PDF

Interspeech

Voice Reconstruction through Large-Scale TTS Models: Comparing Zero-Shot and Fine-tuning Approaches to Personalise TTS in Assistive Communication

Székely, É., Mihajlik, P., Kádár, M. S., & Tóth, L. (2025). Proceedings of Interspeech 2025, 2735–2739. https://doi.org/10.21437/Interspeech.2025-1726

PDF

IWSDS

Will AI Shape the Way We Speak? The Emerging Sociolinguistic Influence of Synthetic Voices

Székely, É., Miniota, J., & Hejná, M. (2025). Proceedings of IWSDS 2025, 335–340. https://aclanthology.org/2025.iwsds-1.37

PDF

Interspeech

Who Gets the Mic? Investigating Gender Bias in the Speaker Assignment of a Speech-LLM

Puhach, D., Payberah, A. H., & Székely, É. (2025). Proceedings of Interspeech 2025. https://arxiv.org/abs/2508.13603

PDF

2024

Interspeech

An Inclusive Approach to Creating a Palette of Synthetic Voices for Gender Diversity

Székely, É., & Hope, M. (2024). Proceedings of Interspeech 2024, 3070–3074. https://doi.org/10.21437/Interspeech.2024-1631

PDF

Interspeech

CreakVC: A Voice Conversion Tool for Modulating Creaky Voice

Lameris, H., Gustafson, J., & Székely, É. (2024). Proceedings of Interspeech 2024, 1005–1006. https://doi.org/10.21437/Interspeech.2024-1534

PDF

Interspeech

"Well", What Can You Do with Messy Data? Exploring the Prosody and Pragmatic Function of the Discourse Marker "Well" with Found Data and Speech Synthesis

O'Mahony, J., Lai, C., & Székely, É. (2024). Proceedings of Interspeech 2024, 4084–4088. https://doi.org/10.21437/Interspeech.2024-2122

PDF

Interspeech

Contextual Interactive Evaluation of TTS Models in Dialogue Systems

Wang, S., Székely, É., & Gustafson, J. (2024). Proceedings of Interspeech 2024, 2965–2969. https://doi.org/10.21437/Interspeech.2024-1008

PDF

SIGDIAL

Voice and Choice: Investigating the Role of Prosodic Variation in Request Compliance and Perceived Politeness Using Conversational TTS

Székely, É., Higginbotham, J., & Possemato, F. (2024). Proceedings of SIGDIAL 2024, 466–476. https://aclanthology.org/2024.sigdial-1.40

PDF

LREC-COLING

The Role of Creaky Voice in Turn Taking and the Perception of Speaker Stance: Experiments Using Controllable TTS

Lameris, H., Székely, É., & Gustafson, J. (2024). Proceedings of LREC-COLING 2024, 16058–16065. https://aclanthology.org/2024.lrec-main.1396

PDF

LREC-COLING

Evaluating Text-to-Speech Synthesis from a Large Discrete Token-Based Speech Language Model

Wang, S., & Székely, É. (2024). Proceedings of LREC-COLING 2024, 6464–6474. https://aclanthology.org/2024.lrec-main.573

PDF

2023

Interspeech

Prosody-Controllable Gender-Ambiguous Speech Synthesis: A Tool for Investigating Implicit Bias in Speech Perception

Székely, É., Gustafson, J., & Torre, I. (2023). Proceedings of Interspeech 2023, 1234–1238. https://doi.org/10.21437/Interspeech.2023-2086

PDF

Interspeech

So-to-Speak: An Exploratory Platform for Investigating the Interplay Between Style and Prosody in TTS

Székely, É., Wang, S., & Gustafson, J. (2023). Proceedings of Interspeech 2023, 2016–2017.

PDF

Interspeech

Pardon My Disfluency: The Impact of Disfluency Effects on the Perception of Speaker Competence and Confidence

Kirkland, A., Gustafson, J., & Székely, É. (2023). Proceedings of Interspeech 2023, 5217–5221. https://doi.org/10.21437/Interspeech.2023-887

PDF

Interspeech

OverFlow: Putting Flows on Top of Neural Transducers for Better TTS

Mehta, S., Kirkland, A., Lameris, H., Beskow, J., Székely, É., & Henter, G. E. (2023). Proceedings of Interspeech 2023, 4279–4283. https://doi.org/10.21437/Interspeech.2023-2134

PDF

Interspeech

A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS

Wang, S., Henter, G. E., Gustafson, J., & Székely, É. (2023). Proceedings of Interspeech 2023, 4289–4293. https://doi.org/10.21437/Interspeech.2023-2272

PDF

Interspeech

Beyond Style: Synthesizing Speech with Pragmatic Functions

Lameris, H., Gustafson, J., & Székely, É. (2023). Proceedings of Interspeech 2023, 3382–3386. https://doi.org/10.21437/Interspeech.2023-2072

PDF

Interspeech

Situating Speech Synthesis: Investigating Contextual Factors in the Evaluation of Conversational TTS

Lameris, H., Kirkland, A., Gustafson, J., & Székely, É. (2023). Proceedings of Interspeech 2023.

PDF

Interspeech

The Impact of Pause-Internal Phonetic Particles on Recall in Synthesized Lectures

Elmers, M., & Székely, É. (2023). Proceedings of Interspeech 2023, 3387–3391. https://doi.org/10.21437/Interspeech.2023-1491

PDF

ICPhS

Prosody-Controllable Spontaneous TTS with Neural HMMs

Lameris, H., Mehta, S., Henter, G. E., Gustafson, J., & Székely, É. (2023). Proceedings of ICPhS 2023, 3141–3145. https://arxiv.org/abs/2211.13533

PDF

ICPhS

Neural Speech Synthesis with Controllable Creaky Voice Style

Lameris, H., Włodarczak, M., Gustafson, J., & Székely, É. (2023). Proceedings of ICPhS 2023.

PDF

IVA / ACM

Generation of Speech and Facial Animation with Controllable Articulatory Effort for Amusing Conversational Characters

Gustafson, J., Székely, É., & Beskow, J. (2023). Proceedings of IVA 2023. https://doi.org/10.1145/3570945.3607289

PDF

SSW

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Wang, S., Henter, G. E., Gustafson, J., & Székely, É. (2023). Proceeding of the 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, France

PDF

2022

Interspeech

Where's the Uh, Hesitation? The Interplay Between Filled Pause Location, Speech Rate and Fundamental Frequency in Perception of Confidence

Kirkland, A., Lameris, H., Székely, É., & Gustafson, J. (2022). Proceedings of Interspeech 2022, 4990–4994. https://doi.org/10.21437/Interspeech.2022-10973

PDF

Speech Prosody

Two Pragmatic Functions of Breathy Voice in American English Conversation

Ward, N. G., Kirkland, A., Włodarczak, M., & Székely, É. (2022). Proceedings of Speech Prosody 2022, 82–86. https://doi.org/10.21437/SpeechProsody.2022-17

PDF

2021

SSW

Perception of Smiling Voice in Spontaneous Speech Synthesis

Kirkland, A., Włodarczak, M., Gustafson, J., & Székely, É. (2021). Proceedings of SSW11, 108–112. https://doi.org/10.21437/SSW.2021-19

PDF

Why study speaker stance in spontaneous speech?

Three central questions

Key findings

Publications

The researchers behind STANCE