Székely, É., Gustafson, J. and Torre, I.
(2023)
Prosody-controllable gender-ambiguous speech synthesis: a tool for investigating implicit bias in speech perception
Proceedings of Interspeech 2023, Dublin, Ireland

Abstract

This paper proposes a novel method to develop gender-ambiguous TTS, which can be used to investigate hidden gender bias in speech perception. Our aim is to provide a tool for researchers to conduct experiments on language use associated with specific genders. Ambiguous voices can also be beneficial for virtual assistants, to help reduce stereotypes and increase acceptance. Our approach uses a multi-speaker embedding in a neural TTS engine, combining two corpora recorded by a male and a female speaker to achieve a gender-ambiguous timbre. We also propose speaker-disentangled prosody control to ensure that the timbre is robust across a range of prosodies and enable more expressive speech. We optimised the output using an SSL-based network trained on hundreds of speakers. We conducted perceptual evaluations on the settings that were judged most ambiguous by the network, which showed that listeners perceived the speech samples as gender-ambiguous, also in prosody-controlled conditions

Evaluation samples