Seminar at Speech, Music and Hearing:
From Acoustics to Articulation
Opponent: Gérard Bailly, Université Stendhal, Grenoble, France
AbstractThe focus of this thesis is the relationship between the articulation of
speech and the acoustics of produced speech. There are several problems that
are encountered in understanding this relationship, given the non-linearity,
variance and non-uniqueness in the mapping, as well as the differences that
exist in the size and shape of the articulators, and consequently the acoustics,
for different speakers. The thesis covers mainly four topics pertaining to the
articulation and acoustics of speech.
The first part of the thesis deals with variations among different speakers
in the articulation of phonemes. While the speakers differ physically in the
shape of their articulators and vocal tracts, the study tries to extract articulation strategies that are common to different speakers. Using multi-way linear
analysis methods, the study extracts articulatory parameters which can be
used to estimate unknown articulations of phonemes made by one speaker;
knowing other articulations made by the same speaker and those unknown articulations made by other speakers of the language. At the same time, a novel
method to select the number of articulatory model parameters, as well as the
articulations that are representative of a speaker’s articulatory repertoire, is
The second part is devoted to the study of uncertainty in the acoustic-
to-articulatory mapping, specifically non-uniqueness in the mapping. Several
studies in the past have shown that human beings are capable of producing a
given phoneme using non-unique articulatory configurations, when the articulators are constrained. This was also demonstrated by synthesizing sounds
using theoretical articulatory models. The studies in this part of the thesis investigate the existence of non-uniqueness in unconstrained read speech.
This is carried out using a database of acoustic signals recorded synchronously
along with the positions of electromagnetic coils placed on selected points on
the lips, jaws, tongue and velum. This part, thus, largely devotes itself to
describing techniques that can be used to study non-uniqueness in the statistical sense, using such a database. The results indicate that the acoustic
vectors corresponding to some frames in all the phonemes in the database
can be mapped onto non-unique articulatory distributions. The predictability of these non-unique frames is investigated, along with verifying whether
applying continuity constraints can resolve this non-uniqueness.
The third part proposes several novel methods of looking at acoustic-
articulatory relationships in the context of acoustic-to-articulatory inversion.
The proposed methods include explicit modeling of non-uniqueness using
cross-modal Gaussian mixture modeling, as well as modeling the mapping
as local regressions. Another innovative approach towards the mapping problem has also been described in the form of relating articulatory and acoustic
gestures. Definitions and methods to obtain such gestures are presented along
with an analysis of the gestures for different phoneme types. The relationship
between the acoustic and articulatory gestures is also outlined. A method to
conduct acoustic-to-articulatory inverse mapping is also suggested, along with
a method to evaluate it. An application of acoustic-to-articulatory inversion
to improve speech recognition is also described in this part of the thesis.
The final part of the thesis deals with problems related to modeling infants
acquiring the ability to speak; the model utilizing an articulatory synthesizer
adapted to infant vocal tract sizes. The main problem addressed is related to
modeling how infants acquire acoustic correlates that are normalized between
infants and adults. A second problem of how infants decipher the number of
degrees of articulatory freedom is also partially addressed. The main contribution is a realistic model which shows how an infant can learn the mapping
between the acoustics produced during the babbling phase and the acoustics heard from the adults. The knowledge required to map corresponding
adult-infant speech sounds is shown to be learnt without the total number
of categories or one-one correspondences being specified explicitly. Instead,
the model learns these features indirectly based on an overall approval rating,
provided by a simulation of adult perception, on the basis of the imitation of
adult utterances by the infant model.
Thus, the thesis tries to cover different aspects of the relationship between
articulation and acoustics of speech in the context of variations for different
speakers and ages. Although not providing complete solutions, the thesis proposes novel directions for approaching the problem, with pointers to solutions
in some contexts.
10:00 - 13:00
Friday January 27, 2012
The seminar is held in Sal F3 Lindstedtsvägen 26.
| Show complete seminar list