Contact




Project summary

Everyone needs a voice! Speech sets us apart from other mammals, it is our primary connection to the world, and it has a pivotal function in modern society. However, the physics of the vocal apparatus are surprisingly complex, and also hidden from normal view. After decades of research, speech technologists are still looking for ways to make artificial voices sound more natural. For clinicians needing to understand how voice problems arise, and how they should be diagnosed and treated, an accurate, dynamic visualisation of the biomechanics would lead quickly to significant progress. Students of languages and of vocal arts would be immensely helped by animated renderings of what actually goes on inside the body during speech and song. A detailed simulation of the human voice is needed both for basic science and for numerous applications.

Targeted breakthrough

To achieve a computational simulation of human voice production, based throughout on first principles of classical physics.

Most current modelling strategies seek primarily to generate only the acoustic signal of the voice, and adopt numerous approximations and simplifications to achieve that goal. For instance, current commercial speech syntheses are not physics-based, but rather resort to elaborate concatenations of pre- recorded audio segments. This is like saying, "the live speech signal contains features that we prize highly, but whose origin we do not understand." Here, we seek to simulate more directly the physics by which the voice produces its signal.

Long-term vision

A detailed computational model of the human voice that can be controlled with input signals at different levels of representation; topological, neuromotoric, phonetic; and that will render as outputs the system's physics, including sound and 3D visualisations.

The EUNISON model would serve as a reference for the state-of-the-art of our knowledge of the voice production process, and would contribute to the atlas of the human body. As a simulation engine, it would find numerous applications in medicine, man-machine communication, robotics, pedagogy and the arts. For example, if driven by a text-to-speech synthesis system, it will be able to generate natural-sounding acoustic output for any type of speaker (gender, age, size, voice quality) or speaking style (whispering, shouting, singing), thus avoiding the need for separate database recordings for different voices; although this would require a sophisticated text-to-phonetics-to-neuromotor "front-end". Only a rudimentary prototype of such a front-end will be devised within the project; the main focus will be on the voice simulation engine itself. Long-term medical applications include personalized voice disorder therapy, simulations of phono-surgery, and the development of voice prostheses for patients who have lost their vocal function for whatever reason.

Problem formulation

The reasons that this has not been done earlier are the sheer scope of the task, too large for individual research groups; and a lack of appropriate simulation tools to handle the challenging multi-physics models involved.

The physics of voice production is an intricate combination of processes which involve elastic bodies invoking non-linear vibrations of tissues, oscillatory flow, and turbulent flow, all of which combine to the generation of an acoustic wave. The acoustic wave, by resonating in the vocal tract, can also become so strong that it "kicks back" on its own generation; and this effect is very difficult to capture in a non- physical model. Conventionally, solid and fluid mechanics are dealt with using separate computational approaches and tools, which makes the treatment of this compound problem very intricate. Here, we will deploy modern multi-physics simulation techniques, in conjunction with experimental validation against physical replicas.

Method

A unified computational domain, in which the dynamics of solids, fluids/gases and acoustic waves are modelled simultaneously, is potentially an enabling breakthrough that can make this problem tractable.

The central computational challenge is to extend the current methodology such that it can deal with solid mechanics, elastic collisions, fluid-structure interactions, aerodynamics and acoustics all at once. The logistic challenge is to combine knowledge from several fields, so as to formulate and simulate the detailed set of structures and control signals that the model will require, in order to behave realistically. The experimental challenge is to perform lab experiments with physical replicas so as to continually refine our understanding of the physics of the phonatory and articulatory processes, while cross-checking and validating the numerical simulations.


Page responsible: Olov Engwall, engwall@kth.se, +468-790 75 35