Vacant PhD position at the Department of Speech, Music and Hearing
Call for applications
Query form for PhD applicants
Description of work
Objectives
The main objectives of the work within which the PhD student will be working are to create the mechanisms that control the geometrical configuration
of the vocal tract, which is to be used as input for the finite element modelling; and of the
laryngeal posturing. This consists of four tasks:
The recruited PhD student will primarily be working with
T7.1- Vocal Tract,
T7.2 - Vocal Tract, and
the parts of T7.3 and T7.4 that relates to the vocal tract control.
T7.1 Definition of control parameters
(KTH-TMH + KTH-CB)
Larynx: We aim to create a deformable three-dimensional model of laryngeal posturing, controlled by a
simplified set of motor parameters, which may be set manually, by neuromuscular activation or
by specification of phonetic targets. First, however, a basic model of the laryngeal mechanics
that support the vocal folds will be adapted from the literature. Its control parameters will initially
correspond approximately to the subglottal pressure, adductive force, vocal fold position, vocal fold
length and vocal fold tension. In a second stage, we will map these entities to degrees of activation in
the primary laryngeal muscles: cricothyroid, thyroarytenoid, interarytenoid and possibly others.
Vocal tract: Similarly, we need to create a deformable three-dimensional vocal tract geometry, controlled by a
limited set of intuitive articulatory parameters, which may be set manually, by neuromuscular
activation or by specification of phonetic targets. The set of articulatory parameters
(typically positions of the jaw, tongue dorsum, tongue blade, tongue tip, lip rounding, larynx and
velum) will be defined through factor analysis of the variation in 3D Magnetic Resonance Imaging
(MRI) data of at least two speakers producing a corpus of phonetically relevant sounds. Initial attempts
will determine the most suitable factor analysis method; where candidates include guided PCA,
PARAFAC, and a factor regression model. The factor analysis, and hence the generated control
parameters, will be based on the Euclidean distribution of the data, but it will be supervised so as to
generate control parameters that are firstly articulatorily relevant (as listed above) and secondly
possible to control through neuromuscular activation.
T7.2 Temporal variation of control parameters. Neuromotor activation
(KTH-TMH + CB)
To be able to generate time-varying vocal tract geometries, a temporal control model will be created,
which specifies how the articulatory control parameters vary when moving from one articulation to
another, in isolated diphones and in connected speech. The temporal control model will be based on
variations observed in real-time MRI data and/or Electromagnetic Articulography (EMA)
measurements. The temporal
control model will specify the time variation of each the geometric parameter values (i.e. the activation
of that parameter) in a given phonetic sequence. As a first simplified temporal control, the parameter
change from one articulation to the next will be defined through cubic interpolation from the onset
value to the target. Analysis of speech production data will then define time constants for this
variation.
This task will also explore how the geometric control parameters can be mapped onto
muscle group models, to be activated with a representation of neural signals. The aim is to be able to
use a neuromotor representation to generate the required laryngeal positioning, as well as the vocal
tract geometries observed in the MRI data.
T7.3 Phonetic interface design
(KTH-TMH, CNRS)
The third input mode to generate the underlying vocal tract geometries is through simpler phonetic specifications. The aim is to build an easily
explorable interface, accessible for the lay public. In this interface, the user should be able to use
menus, buttons and sliders to select a target phoneme, a phoneme sequence or phonetic features along
different dimensions (e.g., place of articulation: front/back; type of articulation: plosive/fricative/vowel;
openness: open/close; vowel; voicing) and observe the impact of the setting; visually in the
resulting vocal tract geometry and acoustically in the resulting sound. The role of the phonetic
interface is thus to allow for an easy, higher level control of the vocal tract model, which may be
beneficial in education and for demonstration. Under the hood, the phonetic interface will control the
geometric control parameters by sending a set of parameter values (static phonemes) or a sequence of
target values (for diphones and connected speech) to the temporal parameter controller. The prototype
is expected to produce only a basic repertoire of phonemes, which may not be complete for any given
language.
T7.4 Interface for remote operation
(KTH)
This task is to develop a public programming interface that will allow users to request EUNISON
simulations using input vectors of control values. Input can be in the geometrical representation using
PCA factors, in a simplified muscle activation representation, or in a phonetic representation; the latter
possibly complemented with extraphonemic information such as emphasis and/or intonation. Outputs
will include visualised renderings of geometries, flow/pressure fields as well as audio output of the
acoustic wave.
Page responsible: Olov Engwall, engwall@kth.se, +468-790 75 35