COMPARING SELF-SUPERVISED SPEECH REPRESENTATIONS FOR READ AND SPONTANEOUS TTS (in submission)

Siyang Wang, Gustav Eje Henter, Joakim Gustafson and Éva Székely

Audio samples from listening tests are presented below.


--------------------------------------------------------------------------

Comparison on LJS (paper Table 2. LJS rows)


mel-spectrogram

HuBERT

wav2vec2.0 L12

wav2vec2.0 L9

--------------------------------------------------------------------------

Comparison on TSGD (spontaneous speech, paper Table 2. TSGD rows)


mel-spectrogram

HuBERT

wav2vec2.0 L9

--------------------------------------------------------------------------

All rights reserved by authors of the paper "COMPARING SELF-SUPERVISED SPEECH REPRESENTATIONS FOR READ AND SPONTANEOUS TTS" (in submission).

All audio samples are created by the authors.

These are for academic research purpose only.

Redistribution or reuse of any material shown on this website or in the paper is prohibited.