Annual Report 1999
Table of Contents
The research within the music acoustics group runs in four main streams:
solo and choral singing, stringed instruments, music performance, and electro-acoustic music composition. While
running in parallel for periods, they often merge and interact. An experimental approach has long been a signature
of the research. In particular, experimental work with "analysis-by-synthesis" has proved to be useful.
In the area of music acoustics, the trend continues toward more sophisticated
synthesis strategies, based on a direct mathematical description of the instrument/voice. This strategy commonly
referred to as numerical or "physical modelling," would be generally adopted within the next five years
or so at most major synthesiser laboratories. We have started physical modelling of the singing voice in terms
of articulatory synthesis, controlled by physiologic control parameters such as jaw opening and tongue shape.
The interest for numerical modelling has revealed a lack of reliable
and detailed experimental data in many areas. Our research group is one of very few in the world that works specifically
in the area of scientific research on musical instruments and sound. Hence, basic research is an important responsibility
of our group. Examples are function of control systems used in actual performances, such as breathing behaviour
in singers and wind instrumentalists, or bow movement in the playing of string instruments. Lack of basic knowledge
of such control systems forces the experimenter to explore an overwhelmingly rich parameter space, and the chance
of real success in reasonable time is small.
Our research on musical performance by computer modelling is a long-term
project. Its focus is presently to compare characteristics of locomotion in running with timing in music
performance. Our generative music performance grammar and music examples
illustrating the effects on synthesised performances have been made accessible on the Internet.
Our overall aim is to continue the established experimental tradition
in the study of the voice and the stringed instruments (now needed more than ever, it seems), while striving to
take part in the development of the new synthesis methods. As before, the parallel work on musical performance
bridges several of these tasks.
Breathing and phonation
Breathing and phonation is one of the core themes. Lung volumes used
by professional classical singers have been measured under realistic performance conditions. Earlier, Monica Thomasson
analysed breathing behaviour during singing in professional operatic singers, using inductive plethysmography to
record rib cage and abdominal wall movements. Using the same method, she has now completed a study of their inhalatory
behaviour. As with the phonatory breathing patterns, the results showed a high degree of reproducibility, when
the singers repeated the same phrase. A high degree of consistency was found even when patterns for quick inhalations
in different contexts were compared. Again, she found that while some singers rely mainly on rib cage movements
to increase lung volume, others consistently use a combination of rib cage and abdominal wall movements.
The shape of the body associated with different inhalatory behaviours
is sometimes considered a factor of relevance to voice production in clinical voice therapy as well as in singing
pedagogy. Jenny Iwarsson has compared two different inhalatory behaviours; 1) with a "paradoxical" inward
movement of the abdominal wall, and 2) with an expansion of the abdominal wall and studied their effects on vertical
laryngeal position. Seventeen male and 17 female, healthy, vocally untrained subjects were used. No instructions
were given regarding movements of the rib cage. The subjects inhaled up to 70% of inspiratory capacity as measured
by respiratory inductive plethysmography. Vertical laryngeal position was recorded by two-channel electroglottography
during the subsequent vowel production. A significant effect was found; the abdomen-out inhalation was associated
with a higher larynx position than the abdomen-in inhalation. This result apparently contradicted a theory based
on the assumption that an expansion of the abdominal wall would allow the diaphragm to descend deeper in the torso,
thus increasing the tracheal pull, which would result in a lower laryngeal position. In a post-hoc experiment including six of the subjects, the
body posture was studied by digital video recording. The results revealed that the two inhalatory modes were clearly
associated with postural changes that are likely to affect the vertical position of the larynx. The "paradoxical"
inward movement of the abdominal wall was associated with a recession of the chin towards the neck, such that the
larynx appeared in a lower position in the neck, for reasons of a postural change. The results thus suggest that
the larynx position tend to be affected by inhalatory behaviour, if no attention is paid to posture. This implies
that clinicians’ and pedagogues’ instructions regarding breathing behaviour must be carefully formulated and adjusted
so as to ensure that the intended goals are reached.
Sundberg re-examined a material where professional baritone singer Håkan
Hagegård performed a set of musical excerpts in two ways, as in a concert and as void of musical expression
as possible. A number of performance aspects were examined. In general, the singer increased the differences between
peaceful and agitated examples when he sang as in a concert. Certain emphasis markers were observed, e.g.,
delayed arrival; the singer
lengthened the consonant preceding the stressed vowel of an emphasised syllable, even if it appeared on the upbeat
of the note carrying the syllable. As a result, the upbeat was lengthened. The means used by the singer for the
purpose of emphasis was found also in emphatic speech as produced by a professional actor and voice coach.
Sundberg’s co-operation with doctors Tom Cleveland and Ed Stone of the
Vanderbilt Voice Centre, Vanderbilt University, Nashville, Tennessee, USA, has been devoted to the long-time-average
spectrum characteristics in country singers’ speech and singing. Again, the results show that these singers sing
in much the same way as they speak.
The properties of the human voice source is another core theme in our
voice research. One investigation concerns vocal registers, which is generally agreed to reflect phonatory phenomena.
Voice source differences between the male modal (chest) and falsetto registers have been studied in co-operation
between Johan Sundberg and singing teacher Carl Högset, Norway. Subjects were professional baritone, tenor
and counter tenor singers. The results confirmed observations in earlier studies. The falsetto register contains
a stronger fundamental than the modal register. This can be explained as a consequence of a difference in vocal
fold thickness. Counter tenor singers were observed to use lower subglottal pressures than tenors and baritones.
Sundberg returned to question related to the singer’s formant. This is
an unusually prominent peak near 3 kHz in the spectra of professional classical bass, baritone, tenor, and alto
singers. A revised method, based on the predictability of formant levels, was elaborated for measuring the amplitude
of the singer’s formant. The method calculates an expected level of formant 3 from the frequencies of formants
1 and 2. The difference between observed and expected values proved to be useful as an indicator of a singer’s
formant. Also, the centre frequency of the singer’s formant was investigated in commercial recordings of 20 singers’
renderings of various excerpts of opera arias. The centre frequency was determined from long-term-average spectra.
The mean values across singer classifications were 2.4, 2.6, 2.8 and 3.0 for basses, baritones, tenors, and altos,
respectively. The peak was much higher for the male voices than for the altos. The sopranos showed a split peak,
presumably simply reflecting the mean of formants 3 and 4.
Sundberg is engaged in a co-operation with the Royal Academy of Music,
Aarhus, The Royal Danish Academy of Music, Copenhagen, and the Bispebjergs Hospital, Copenhagen. The aim is to
find out to what extent professional opera singers use their nasal cavity as a resonator when they sing vowels.
Using a flow mask, nasal and oral airflow was separately recorded in 16 opera singers of different classifications,
who sang sustained tones on different pitches. In addition, the singers’ velopharyngeal openings were recorded
by means of nasofiberscopy. Some singers were found to sing with a slight velopharyngeal opening. As a next step,
the acoustic significance of such openings to the spectral characteristics will be analysed.
Sundberg continued his co-operation with Björn Lindblom and co-workers
at the Linguistics department, Stockholm University, on APEX, the articulatory model of the vocal tract. A version
of the computer program is presently on exhibition at the Naturhistoriska Riksmuseet, Stockholm, where visitors
can explore the vowel quality obtained by different articulatory constellations. Thus, the program allows various
deformations of the tongue, varying of the lip and jaw openings and of the larynx position. The synthesis quality
obtained was greatly improved by implementing a realistic voice source, derived by inverse filtering of a real
supervised the thesis work of logopeds Maria Andersson and Clara Hultqvist, Karolinska Institute. The investigation
concerned the dependence of the voice source on subglottal pressure in five premiere baritone singers. The results
showed various voice source parameters as function of subglottal pressure, sampled at ten equidistantly spaced
values between the lowest and highest pressures. This study was published in the March issue of JASA. The study
has now been continued in terms of another thesis work of two logopeds, Anja Morelli and Ellinor Fahlstedt, Lund
University, supervised by Sundberg. A group of 17 male and 17 female untrained voices volunteered as subjects.
Their task was to vary vocal loudness while keeping pitch constant. As an increase in vocal loudness is normally
accompanied by an increase in pitch in untrained voices, the subjects had to be taught how to tease out the vocal
loudness variable from the pitch. This task turned out to be quite simple when the subjects were provided with
visual real-time feedback on a display, showing fundamental frequency and sound level on the axes. The results
are under analysis. It is clear that the data showed a much greater scatter than in the case of the baritone singers.
This reflects the fact that professional singers use their voice much more consistently than untrained subjects
Analysis and simulation of the glottal oscillator
Sten Ternström has worked half time on the
Analysis and simulation of the glottal oscillator,
a project funded by the Swedish Research Council for Engineering Sciences (TFR). One goal is to devise a model
of the glottal source that allows for perceptually relevant control of the voice source spectrum, and that has
high precision in fundamental frequency and negligible aliasing distortion even at low sampling rates. One solution,
presented in August at the PEVOC III (3rd
Pan European Voice Conference) in Utrecht is to generate a flat but unaliased
spectrum using time-domain sinc pulses, and to tailor this spectrum with parameter-controlled filters. The next
steps are to refine this source, and to implement control rules for it based on measurements of real speakers and
singers. An alternative source model, which proved especially appropriate for child voices, was demonstrated in
a synthesis of child singing voice, presented at the Voice Foundation symposium in Philadelphia.
Peta White, University of Surrey, Roehampton, received a two years post-doc-
fellowship from the European Commission, starting 1999. She has been preparing manuscripts for publication from
her dissertation and edited the proceedings of the 1998 symposium on Child Voice, arranged by the KTH Voice Research
KTH Voice Research Centre
The KTH Voice Research Centre arranged an international symposium on
Real-time Biofeedback in Voice Therapy and Training. The basic idea was to present different methods now available
to provide such feedback, such as real-time spectrum and spectrogram analysis, flow glottography by inverse filtering,
nasality indicator, real-time phonetography, EGG etc. Each method was presented by one representative of the company
selling the device and one clinician with solid experience in using it. More than 200 participants from Europe
and USA attended the meeting.
The Centre published Rösten
i vårt samhälle, the proceedings of the first of the
four symposia it arranged at KTH in 1998. This book is a compilation of scientific facts documenting the frequency
of voice disturbances in various professions and the associated costs they cause society.
Female school teachers often develop voice problems. In co-operation
with Maria Södersten of the Huddinge Hospital, Karolinska Institute, Svante Granqvist has developed an automatic
voice accumulator that has been used for measuring speaking time and mean sound level of speaker and environment.
The potential of his Correlogram
program, which analyses periodicity, appears more useful for distinguishing different voice qualities than traditional
programs. It has also been found extremely successful for measuring fundamental frequency in piano accompanied
Ternström’s long-term commitment to the acoustics of choir singing
has taken a back seat for the time being, after the publication in June of a final article in JASA. His participation
in the RJ project Music and Motion was
reflected in his docent lecture "How to invent a musical instrument", which will appear in adapted form
in the Journal of New Music Research. He continues to participate in the development of a new Media Technology
programme at KTH.
"From Air to Music"
Leonardo Fuks, doctoral student from Rio de Janeiro, Brazil, completed
his doctoral thesis "From Air to Music
" in December 1998 and defended his dissertation in January. It contained a series
of papers on blowing pressures and breathing behaviours in eight professional woodwind players. The results showed
that both lung pressures and breathing strategies differed from those observed in professional classically trained
singers. An investigation of the relation between the player’s perceived loudness and his/her lung pressures revealed
an almost linear relationship in these subjects.
The final article of the dissertation presented an investigation of a
type of chanting similar to that used by Tibetan monks. Fuks studied the voice source by inverse filtering and
the vocal fold vibration characteristics by high-speed photography, performed by Per Åke Lindestad at the
Huddinge Hospital. The extremely low fundamental frequency used in this type of singing was found to originate
by a combined effect of vocal fold and ventricular band vibration. The vibration of the latter was half that of
the former. As a result, every second flow pulse produced by the vocal fold vibration is attenuated by the closing
of the ventricular folds. Hence, fundamental frequency dropped one octave below that of the vocal fold vibration.
The vibrations of the two pairs of folds were coupled, and also frequency ratios other than 1:2 could be achieved.
After successfully defending his dissertation, Fuks returned to Rio de Janeiro, Brazil.
Our research on musical performance by computer modelling has focused
on fine analysis of fundamental frequency during violin playing. Four professional violinists played a set of compositions
for violin and piano in a concert-like setting together with professional pianists playing on a regular grand piano.
The sound of the violin, picked up by a special microphone mounted on the top plate, was successfully separated
from the sound of the accompaniment, such that the violin performance could be analysed. Fundamental frequency
was measured by means of Granqvist’s autocorrelation functions program. Guest researcher Julietta Gleiser, Argentina,
collected data on vibrato rate and extent as well as on sound level. The results reveal that different vibrato
parameters, such as extent and frequency, vary substantially depending on musical context and performer.
Sofia Dahl has continued her analyses of timing of percussion players.
Recordings of a simple pattern of single strokes with interleaved accents were studied. The recorded sequences
showed examples of both short and long term variations. A tendency of prolongation of the interval beginning with
the accented stroke could be seen. The perceptual significance of these systematic variations was demonstrated
by a listening test. The test showed that the listeners correctly identified the grouping of strokes in sequences
with large cyclic variations in timing data.
Roberto Bresin has continued his research on the synthesis of performances
of the same piece of music that differ with respect to the emotional ambience of the performance, using Director
Musices, our generative grammar of music performance. By selecting a subset of performance rules and by tuning
the magnitude of their effects on the performance, clearly differing versions were obtained. A forced-choice listening
test revealed that listeners were able to classify the versions in accordance with the intended emotional ambience.
Analysis of the rule set-ups showed that the rule "Duration
Contrast" played a particularly important role in characterising
different emotions. This rule enhances the contrast between long and short note values, short notes being played
shorter and long notes longer than nominally specified in the score. Highest degree of duration contrast was successfully
applied in the "fear" set-up and lowest degree in the "tenderness" set-up. The other emotions
considered in this research were happiness, sadness, fear and solemnity. The results were exposed during January-April
at the Stockholm house of culture (Kulturhuset) at an exhibition called "Know yourself".
Bresin and Anders Friberg were invited to present their results at the
IEEE 1999 "Systems, Man and Cybernetics Conference (SMC'99), October, Tokyo, Japan. Results were also presented
at International Symposium on the Neuroscience of Music 1999, October, Brain Research Institute, University of
Niigata, Niigata, Japan. Examples are available at the Internet address ~roberto/emotion
Results from some listening tests made during two seminars are available
at the following addresses:
Music and Motion
Our four-year project Music
and Motion, jointly financed by The Bank of Sweden Tercentenary
Foundation and KTH, terminated at the end of the year. The event was celebrated by a public symposium, hosted by
the Italian Institute of Culture in Stockholm. The program contained presentations of all researchers who had participated
in the project and was crowned by a public concert, presenting music emerging from the project. The Director Musices
program was combined with the dance animation system developed within the Kacor
-group. A suite of dances was played, composed for computer animation by Peter
Rajka to three sonatas by Domenico Scarlatti, and one specially composed piece by Julieta Gleiser, in which the
musical interpretation was produced by Director Musices. Tamas Ungvary performed a piece on his new instrument
Sensorg, developed within the project, and controlled by finger gestures.
Color of keys
In collaboration with two KTH students, Lysiane Salzmann and Richard
Donselius, Bresin investigated how transposition of a recorded music performance can influence listeners’ perception.
A composition in C major starting at the pitch of C4 was performed on a synthesiser and recorded in computer MIDI
format. The performance was then transposed, such that it started at pitches C3, C5, E3, A4b, A3b, E4, H3 or H4,
i.e. after transposition up and down by an octave, a minor sixth, and a third, or by a second down and a major
seventh up. In a listening test, subjects were asked to characterise the different performances with respect to
a set of adjectives: dark/bright, heavy/light, hard/soft, warm/cold. Preliminary results show that performances
in tonalities lower than C4 were classified as dark and higher than C4 as bright, with C3 as the darkest one and
C5 as the brightest one.
Temporal aspects of perception
Eric Prame has continued his investigations of temporal aspects of perception
by studying the perception of 100% AM of broad band noise and of AM and FM of complex tones and their dependencies
on the modulation frequency. This has been done both for modulation frequencies above 8 Hz, the region of "roughness",
as well as below that frequency. Previous studies of the roughness region has shown many similarities between AM
and FM, which, however, is not the case in the region below 8 Hz. The phenomena in the lower region are also less
well known. At 4 Hz, for instance, the perception of the increasing phase of an AM modulation period is clearly
dominating, whereas both the increasing and decreasing phases in FM are equally well perceived. This observation
is important for understanding the vibrato phenomenon. One explanation is neurophysiological. The dividing frequency
of about 8 Hz between the two regions is based upon the research of Prof. Robert Efron, who discovered that the
minimum duration of a perception is about 130 ms; when interonset intervals between repetitive stimuli are clearly
shorter than 130 ms, perception of roughness emerges.
The top and back plates of the violin are arched. Technically they are
shells rather than plates. For a shell, out-of-plane vibrations are coupled to in-plane vibrations. Using TV-holography
at the Luleå University of Technology, Erik Jansson in collaboration with Nils-Erik Molin and Anna Runnemalm
investigated this phenomenon in a high-quality violin, built by Leon Bernardel in 1909. The instrument was mounted
in a cage such that its vibrations could be investigated from all directions without changing the driving or the
holding of the violin. The experiments verified the expected vibration properties of the shell and confirmed that
this is important. Furthermore, it showed that the neck has a large influence on the body resonances and that the
violin, in its bridge plane, acts as an elliptical tube.
An investigation was carried of old Polish violins from Poznan and Krakow,
Poland with support from the Polish Science Department at the Music Museum in Poznan and Ms Alicja Knast. Our acoustical
method for measuring bridge mobility was found the best. Jansson was invited to perform the acoustical measurements
together with the violin builder Benedykt Niewczyk, Poznan. Excitation signals and responses were recorded on a
DAT-recorder and were later evaluated in terms of frequency responses at KTH. Calibration recordings showed that
the measurements were accurately repeatable and yielded absolute calibrations. The old violins showed some of the
properties typically found in violins built a hundred years earlier, during the golden age of Stradivarius and
the Cremona school. Some violins were part of the Swedish-Polish history in the sense that they were built by the
violinmaker at Sigismund Wasa’s court.
The "bridge hill"
The curve showing the mobility of the bridge of good violins contains
a maximum near 2.5 kHz, called the bridge hill. Traditionally this maximum is ascribed to the first in-plane resonance
of the bridge. However, experiments carried out by Jansson together with Niewczyk have shown that this maximum
rather derives from the local springiness of the top plate at the bridge feet and the bridge’s moving as a rigid
mass. Introducing a springy support (rubber) between bridge feet and top plate induced a filter action of this
resonance. This is especially interesting as Stradivarius violins as well as old Polish violins have a smooth bridge
hill maximum, which most likely derives from the aged wood.
Violin bows and bow-string interaction
The aim of the bow project is to identify and measure mechanical properties
of violin bows that are important to the quality of a bow, in particular with respect to playing properties and
the influence on timbre. Experimental and commercial bows of fibre glass and carbon fibre composites are included
as comparisons to normal wooden bows. Current work has focused on measurements of the dynamical compliance of the
bow under realistic loading of the bow. The bow compliance seems to be a key parameter in string players´
characterisation of bows of various quality. As before, the investigations are carried out in co-operation with
Prof. Knut Guettler, Norwegian State Academy of Music, Oslo.