Start ReadSpeaker XT


Seminar at Speech, Music and Hearing:

Thesis defense:

From Acoustics to Articulation

Ananthakrishnan Gopal

Opponent: Gérard Bailly, Université Stendhal, Grenoble, France


The focus of this thesis is the relationship between the articulation of speech and the acoustics of produced speech. There are several problems that are encountered in understanding this relationship, given the non-linearity, variance and non-uniqueness in the mapping, as well as the differences that exist in the size and shape of the articulators, and consequently the acoustics, for different speakers. The thesis covers mainly four topics pertaining to the articulation and acoustics of speech.

The first part of the thesis deals with variations among different speakers in the articulation of phonemes. While the speakers differ physically in the shape of their articulators and vocal tracts, the study tries to extract articulation strategies that are common to different speakers. Using multi-way linear analysis methods, the study extracts articulatory parameters which can be used to estimate unknown articulations of phonemes made by one speaker; knowing other articulations made by the same speaker and those unknown articulations made by other speakers of the language. At the same time, a novel method to select the number of articulatory model parameters, as well as the articulations that are representative of a speaker’s articulatory repertoire, is suggested.

The second part is devoted to the study of uncertainty in the acoustic- to-articulatory mapping, specifically non-uniqueness in the mapping. Several studies in the past have shown that human beings are capable of producing a given phoneme using non-unique articulatory configurations, when the articulators are constrained. This was also demonstrated by synthesizing sounds using theoretical articulatory models. The studies in this part of the thesis investigate the existence of non-uniqueness in unconstrained read speech. This is carried out using a database of acoustic signals recorded synchronously along with the positions of electromagnetic coils placed on selected points on the lips, jaws, tongue and velum. This part, thus, largely devotes itself to describing techniques that can be used to study non-uniqueness in the statistical sense, using such a database. The results indicate that the acoustic vectors corresponding to some frames in all the phonemes in the database can be mapped onto non-unique articulatory distributions. The predictability of these non-unique frames is investigated, along with verifying whether applying continuity constraints can resolve this non-uniqueness.

The third part proposes several novel methods of looking at acoustic- articulatory relationships in the context of acoustic-to-articulatory inversion. The proposed methods include explicit modeling of non-uniqueness using cross-modal Gaussian mixture modeling, as well as modeling the mapping as local regressions. Another innovative approach towards the mapping problem has also been described in the form of relating articulatory and acoustic gestures. Definitions and methods to obtain such gestures are presented along with an analysis of the gestures for different phoneme types. The relationship between the acoustic and articulatory gestures is also outlined. A method to conduct acoustic-to-articulatory inverse mapping is also suggested, along with a method to evaluate it. An application of acoustic-to-articulatory inversion to improve speech recognition is also described in this part of the thesis. The final part of the thesis deals with problems related to modeling infants acquiring the ability to speak; the model utilizing an articulatory synthesizer adapted to infant vocal tract sizes. The main problem addressed is related to modeling how infants acquire acoustic correlates that are normalized between infants and adults. A second problem of how infants decipher the number of degrees of articulatory freedom is also partially addressed. The main contribution is a realistic model which shows how an infant can learn the mapping between the acoustics produced during the babbling phase and the acoustics heard from the adults. The knowledge required to map corresponding adult-infant speech sounds is shown to be learnt without the total number of categories or one-one correspondences being specified explicitly. Instead, the model learns these features indirectly based on an overall approval rating, provided by a simulation of adult perception, on the basis of the imitation of adult utterances by the infant model.

Thus, the thesis tries to cover different aspects of the relationship between articulation and acoustics of speech in the context of variations for different speakers and ages. Although not providing complete solutions, the thesis proposes novel directions for approaching the problem, with pointers to solutions in some contexts.

10:00 - 13:00
Friday January 27, 2012

The seminar is held in Sal F3 Lindstedtsvägen 26.

| Show complete seminar list

Published by: TMH, Speech, Music and Hearing

Last updated: Wednesday, 23-Jun-2010 09:22:46 MEST