Voice and Choice: Investigating the Role of Prosodic Variation in Request Compliance and Perceived Politeness Using Conversational TTS

Sigdial, 2024

Éva Székely, 1, Jeff Higginbotham2, Francesco Possemato3

1KTH Royal Institute of Technology, 2University at Buffalo, NY, USA, 3University of Groningen, NL

szekely@kth.se, cdsjeff@buffalo.edu, f.possemato@rug.nl

Abstract

As conversational Text-to-Speech (TTS) technologies become increasingly realistic and expressive, understanding
the impact of prosodic variation on speech perception and social dynamics is crucial for enhancing conversational systems.
This study explores the influence of prosodic features on listener responses to indirect requests using a specifically
designed conversational TTS engine capable of controlling prosody, and generating speech across three different speaker
profiles: female, male, and gender-ambiguous.
We conducted two experiments to analyse how naturalistic variations in speech rate and vocal effort impact the likelihood
of request compliance and perceived politeness.
In the first experiment, we examined how prosodic modifications affect the perception of politeness in permission- and
action requests.
In the second experiment participants compared pairs of spoken requests, each rendered with different prosodic features,
and chose which they were more likely to grant.
Results indicate that both faster speech rate and higher vocal effort increased the willingness to comply, though the
extent of this influence varied by speaker gender.
Higher vocal effort in action requests increases the chance of being granted more than in permission requests.
Politeness has a demonstrated positive impact on the likelihood of requests being granted, this effect is stronger for
the male voice compared to female and gender-ambiguous voices.

Samples

Male voice.

Speech Rate: Fast, Vocal Effort: High
Speech Rate: Fast, Vocal Effort: Modal
Speech Rate: Fast, Vocal Effort: Low
Speech Rate: Normal, Vocal Effort: High
Speech Rate: Normal, Vocal Effort: Modal
Speech Rate: Normal, Vocal Effort: Low

Ambiguous voice.

Speech Rate: Fast, Vocal Effort: High
Speech Rate: Fast, Vocal Effort: Modal
Speech Rate: Fast, Vocal Effort: Low
Speech Rate: Normal, Vocal Effort: High
Speech Rate: Normal, Vocal Effort: Modal
Speech Rate: Normal, Vocal Effort: Low

Female voice.

Speech Rate: Fast, Vocal Effort: High
Speech Rate: Fast, Vocal Effort: Modal
Speech Rate: Fast, Vocal Effort: Low
Speech Rate: Normal, Vocal Effort: High
Speech Rate: Normal, Vocal Effort: Modal
Speech Rate: Normal, Vocal Effort: Low