Everyone needs a voice! Speech sets us apart from other mammals, it is our primary
connection to the world, and it has a pivotal function in modern society. However, the physics
of the vocal apparatus are surprisingly complex, and also hidden from normal view. After
decades of research, speech technologists are still looking for ways to make artificial voices
sound more natural. For clinicians needing to understand how voice problems arise, and how
they should be diagnosed and treated, an accurate, dynamic visualisation of the biomechanics
would lead quickly to significant progress. Students of languages and of vocal arts would be
immensely helped by animated renderings of what actually goes on inside the body during
speech and song. A detailed simulation of the human voice is needed both for basic science
and for numerous applications.
To achieve a computational simulation of human voice
production, based throughout on first principles of classical physics.
Most current modelling
strategies seek primarily to generate only the acoustic signal of the voice, and adopt numerous
approximations and simplifications to achieve that goal. For instance, current commercial
speech syntheses are not physics-based, but rather resort to elaborate concatenations of pre-
recorded audio segments. This is like saying, "the live speech signal contains features that we
prize highly, but whose origin we do not understand." Here, we seek to simulate more directly
the physics by which the voice produces its signal.
A detailed computational model of the human voice that can be controlled with input
signals at different levels of representation; topological, neuromotoric, phonetic; and that
will render as outputs the system's physics, including sound and 3D visualisations.
The EUNISON model would serve as a reference for the state-of-the-art of our knowledge
of the voice production process, and would contribute to the atlas of the human body. As a
simulation engine, it would find numerous applications in medicine, man-machine
communication, robotics, pedagogy and the arts. For example, if driven by a text-to-speech
synthesis system, it will be able to generate natural-sounding acoustic output for any type of
speaker (gender, age, size, voice quality) or speaking style (whispering, shouting, singing),
thus avoiding the need for separate database recordings for different voices; although this
would require a sophisticated text-to-phonetics-to-neuromotor "front-end". Only a
rudimentary prototype of such a front-end will be devised within the project; the main focus
will be on the voice simulation engine itself. Long-term medical applications include
personalized voice disorder therapy, simulations of phono-surgery, and the development of
voice prostheses for patients who have lost their vocal function for whatever reason.
The reasons that this has not been done earlier are the sheer scope of the task, too large for
individual research groups; and a lack of appropriate simulation tools to handle the
challenging multi-physics models involved.
The physics of voice production is an intricate
combination of processes which involve elastic bodies invoking non-linear vibrations of
tissues, oscillatory flow, and turbulent flow, all of which combine to the generation of an
acoustic wave. The acoustic wave, by resonating in the vocal tract, can also become so strong
that it "kicks back" on its own generation; and this effect is very difficult to capture in a non-
physical model. Conventionally, solid and fluid mechanics are dealt with using separate
computational approaches and tools, which makes the treatment of this compound problem
very intricate. Here, we will deploy modern multi-physics simulation techniques, in
conjunction with experimental validation against physical replicas.
A unified computational domain, in which the dynamics of solids, fluids/gases and acoustic waves are modelled
simultaneously, is potentially an enabling breakthrough that can make this problem
The central computational challenge is to extend the current methodology such that
it can deal with solid mechanics, elastic collisions, fluid-structure interactions, aerodynamics
and acoustics all at once. The logistic challenge is to combine knowledge from several fields,
so as to formulate and simulate the detailed set of structures and control signals that the model
will require, in order to behave realistically. The experimental challenge is to perform lab
experiments with physical replicas so as to continually refine our understanding of the physics
of the phonatory and articulatory processes, while cross-checking and validating the
Page responsible: Olov Engwall, email@example.com, +468-790 75 35