Contact




Vacant PhD position at the Department of Speech, Music and Hearing

Call for applications

Query form for PhD applicants

Description of work

Objectives

The main objectives of the work within which the PhD student will be working are to create the mechanisms that control the geometrical configuration of the vocal tract, which is to be used as input for the finite element modelling; and of the laryngeal posturing. This consists of four tasks:

The recruited PhD student will primarily be working with
T7.1- Vocal Tract,
T7.2 - Vocal Tract, and
the parts of T7.3 and T7.4 that relates to the vocal tract control.

T7.1 Definition of control parameters

(KTH-TMH + KTH-CB)

Larynx: We aim to create a deformable three-dimensional model of laryngeal posturing, controlled by a simplified set of motor parameters, which may be set manually, by neuromuscular activation or by specification of phonetic targets. First, however, a basic model of the laryngeal mechanics that support the vocal folds will be adapted from the literature. Its control parameters will initially correspond approximately to the subglottal pressure, adductive force, vocal fold position, vocal fold length and vocal fold tension. In a second stage, we will map these entities to degrees of activation in the primary laryngeal muscles: cricothyroid, thyroarytenoid, interarytenoid and possibly others.

Vocal tract: Similarly, we need to create a deformable three-dimensional vocal tract geometry, controlled by a limited set of intuitive articulatory parameters, which may be set manually, by neuromuscular activation or by specification of phonetic targets. The set of articulatory parameters (typically positions of the jaw, tongue dorsum, tongue blade, tongue tip, lip rounding, larynx and velum) will be defined through factor analysis of the variation in 3D Magnetic Resonance Imaging (MRI) data of at least two speakers producing a corpus of phonetically relevant sounds. Initial attempts will determine the most suitable factor analysis method; where candidates include guided PCA, PARAFAC, and a factor regression model. The factor analysis, and hence the generated control parameters, will be based on the Euclidean distribution of the data, but it will be supervised so as to generate control parameters that are firstly articulatorily relevant (as listed above) and secondly possible to control through neuromuscular activation.

T7.2 Temporal variation of control parameters. Neuromotor activation

(KTH-TMH + CB)

To be able to generate time-varying vocal tract geometries, a temporal control model will be created, which specifies how the articulatory control parameters vary when moving from one articulation to another, in isolated diphones and in connected speech. The temporal control model will be based on variations observed in real-time MRI data and/or Electromagnetic Articulography (EMA) measurements. The temporal control model will specify the time variation of each the geometric parameter values (i.e. the activation of that parameter) in a given phonetic sequence. As a first simplified temporal control, the parameter change from one articulation to the next will be defined through cubic interpolation from the onset value to the target. Analysis of speech production data will then define time constants for this variation.

This task will also explore how the geometric control parameters can be mapped onto muscle group models, to be activated with a representation of neural signals. The aim is to be able to use a neuromotor representation to generate the required laryngeal positioning, as well as the vocal tract geometries observed in the MRI data.

T7.3 Phonetic interface design

(KTH-TMH, CNRS)

The third input mode to generate the underlying vocal tract geometries is through simpler phonetic specifications. The aim is to build an easily explorable interface, accessible for the lay public. In this interface, the user should be able to use menus, buttons and sliders to select a target phoneme, a phoneme sequence or phonetic features along different dimensions (e.g., place of articulation: front/back; type of articulation: plosive/fricative/vowel; openness: open/close; vowel; voicing) and observe the impact of the setting; visually in the resulting vocal tract geometry and acoustically in the resulting sound. The role of the phonetic interface is thus to allow for an easy, higher level control of the vocal tract model, which may be beneficial in education and for demonstration. Under the hood, the phonetic interface will control the geometric control parameters by sending a set of parameter values (static phonemes) or a sequence of target values (for diphones and connected speech) to the temporal parameter controller. The prototype is expected to produce only a basic repertoire of phonemes, which may not be complete for any given language.

T7.4 Interface for remote operation

(KTH)

This task is to develop a public programming interface that will allow users to request EUNISON simulations using input vectors of control values. Input can be in the geometrical representation using PCA factors, in a simplified muscle activation representation, or in a phonetic representation; the latter possibly complemented with extraphonemic information such as emphasis and/or intonation. Outputs will include visualised renderings of geometries, flow/pressure fields as well as audio output of the acoustic wave.


Page responsible: Olov Engwall, engwall@kth.se, +468-790 75 35