Annual Report 1999

Table of Contents

Johan Sundberg
Professor of
Music Acoustics


The research within the music acoustics group runs in four main streams: solo and choral singing, stringed instruments, music performance, and electro-acoustic music composition. While running in parallel for periods, they often merge and interact. An experimental approach has long been a signature of the research. In particular, experimental work with "analysis-by-synthesis" has proved to be useful.

In the area of music acoustics, the trend continues toward more sophisticated synthesis strategies, based on a direct mathematical description of the instrument/voice. This strategy commonly referred to as numerical or "physical modelling," would be generally adopted within the next five years or so at most major synthesiser laboratories. We have started physical modelling of the singing voice in terms of articulatory synthesis, controlled by physiologic control parameters such as jaw opening and tongue shape.

The interest for numerical modelling has revealed a lack of reliable and detailed experimental data in many areas. Our research group is one of very few in the world that works specifically in the area of scientific research on musical instruments and sound. Hence, basic research is an important responsibility of our group. Examples are function of control systems used in actual performances, such as breathing behaviour in singers and wind instrumentalists, or bow movement in the playing of string instruments. Lack of basic knowledge of such control systems forces the experimenter to explore an overwhelmingly rich parameter space, and the chance of real success in reasonable time is small.

Our research on musical performance by computer modelling is a long-term project. Its focus is presently to compare characteristics of locomotion in running with timing in music

performance. Our generative music performance grammar and music examples illustrating the effects on synthesised performances have been made accessible on the Internet.

Our overall aim is to continue the established experimental tradition in the study of the voice and the stringed instruments (now needed more than ever, it seems), while striving to take part in the development of the new synthesis methods. As before, the parallel work on musical performance bridges several of these tasks.


Breathing and phonation

Breathing and phonation is one of the core themes. Lung volumes used by professional classical singers have been measured under realistic performance conditions. Earlier, Monica Thomasson analysed breathing behaviour during singing in professional operatic singers, using inductive plethysmography to record rib cage and abdominal wall movements. Using the same method, she has now completed a study of their inhalatory behaviour. As with the phonatory breathing patterns, the results showed a high degree of reproducibility, when the singers repeated the same phrase. A high degree of consistency was found even when patterns for quick inhalations in different contexts were compared. Again, she found that while some singers rely mainly on rib cage movements to increase lung volume, others consistently use a combination of rib cage and abdominal wall movements.

The shape of the body associated with different inhalatory behaviours is sometimes considered a factor of relevance to voice production in clinical voice therapy as well as in singing pedagogy. Jenny Iwarsson has compared two different inhalatory behaviours; 1) with a "paradoxical" inward movement of the abdominal wall, and 2) with an expansion of the abdominal wall and studied their effects on vertical laryngeal position. Seventeen male and 17 female, healthy, vocally untrained subjects were used. No instructions were given regarding movements of the rib cage. The subjects inhaled up to 70% of inspiratory capacity as measured by respiratory inductive plethysmography. Vertical laryngeal position was recorded by two-channel electroglottography during the subsequent vowel production. A significant effect was found; the abdomen-out inhalation was associated with a higher larynx position than the abdomen-in inhalation. This result apparently contradicted a theory based on the assumption that an expansion of the abdominal wall would allow the diaphragm to descend deeper in the torso, thus increasing the tracheal pull, which would result in a lower laryngeal position. In a post-hoc experiment including six of the subjects, the body posture was studied by digital video recording. The results revealed that the two inhalatory modes were clearly associated with postural changes that are likely to affect the vertical position of the larynx. The "paradoxical" inward movement of the abdominal wall was associated with a recession of the chin towards the neck, such that the larynx appeared in a lower position in the neck, for reasons of a postural change. The results thus suggest that the larynx position tend to be affected by inhalatory behaviour, if no attention is paid to posture. This implies that clinicians’ and pedagogues’ instructions regarding breathing behaviour must be carefully formulated and adjusted so as to ensure that the intended goals are reached.


Sundberg re-examined a material where professional baritone singer Håkan Hagegård performed a set of musical excerpts in two ways, as in a concert and as void of musical expression as possible. A number of performance aspects were examined. In general, the singer increased the differences between peaceful and agitated examples when he sang as in a concert. Certain emphasis markers were observed, e.g., delayed arrival; the singer lengthened the consonant preceding the stressed vowel of an emphasised syllable, even if it appeared on the upbeat of the note carrying the syllable. As a result, the upbeat was lengthened. The means used by the singer for the purpose of emphasis was found also in emphatic speech as produced by a professional actor and voice coach.

Country singing

Sundberg’s co-operation with doctors Tom Cleveland and Ed Stone of the Vanderbilt Voice Centre, Vanderbilt University, Nashville, Tennessee, USA, has been devoted to the long-time-average spectrum characteristics in country singers’ speech and singing. Again, the results show that these singers sing in much the same way as they speak.

Vocal registers

The properties of the human voice source is another core theme in our voice research. One investigation concerns vocal registers, which is generally agreed to reflect phonatory phenomena. Voice source differences between the male modal (chest) and falsetto registers have been studied in co-operation between Johan Sundberg and singing teacher Carl Högset, Norway. Subjects were professional baritone, tenor and counter tenor singers. The results confirmed observations in earlier studies. The falsetto register contains a stronger fundamental than the modal register. This can be explained as a consequence of a difference in vocal fold thickness. Counter tenor singers were observed to use lower subglottal pressures than tenors and baritones.

Singer´s formant

Sundberg returned to question related to the singer’s formant. This is an unusually prominent peak near 3 kHz in the spectra of professional classical bass, baritone, tenor, and alto singers. A revised method, based on the predictability of formant levels, was elaborated for measuring the amplitude of the singer’s formant. The method calculates an expected level of formant 3 from the frequencies of formants 1 and 2. The difference between observed and expected values proved to be useful as an indicator of a singer’s formant. Also, the centre frequency of the singer’s formant was investigated in commercial recordings of 20 singers’ renderings of various excerpts of opera arias. The centre frequency was determined from long-term-average spectra. The mean values across singer classifications were 2.4, 2.6, 2.8 and 3.0 for basses, baritones, tenors, and altos, respectively. The peak was much higher for the male voices than for the altos. The sopranos showed a split peak, presumably simply reflecting the mean of formants 3 and 4.

Nasal resonance

Sundberg is engaged in a co-operation with the Royal Academy of Music, Aarhus, The Royal Danish Academy of Music, Copenhagen, and the Bispebjergs Hospital, Copenhagen. The aim is to find out to what extent professional opera singers use their nasal cavity as a resonator when they sing vowels. Using a flow mask, nasal and oral airflow was separately recorded in 16 opera singers of different classifications, who sang sustained tones on different pitches. In addition, the singers’ velopharyngeal openings were recorded by means of nasofiberscopy. Some singers were found to sing with a slight velopharyngeal opening. As a next step, the acoustic significance of such openings to the spectral characteristics will be analysed.

Articulatory modeling

Sundberg continued his co-operation with Björn Lindblom and co-workers at the Linguistics department, Stockholm University, on APEX, the articulatory model of the vocal tract. A version of the computer program is presently on exhibition at the Naturhistoriska Riksmuseet, Stockholm, where visitors can explore the vowel quality obtained by different articulatory constellations. Thus, the program allows various deformations of the tongue, varying of the lip and jaw openings and of the larynx position. The synthesis quality obtained was greatly improved by implementing a realistic voice source, derived by inverse filtering of a real voice.

Voice source

Sundberg supervised the thesis work of logopeds Maria Andersson and Clara Hultqvist, Karolinska Institute. The investigation concerned the dependence of the voice source on subglottal pressure in five premiere baritone singers. The results showed various voice source parameters as function of subglottal pressure, sampled at ten equidistantly spaced values between the lowest and highest pressures. This study was published in the March issue of JASA. The study has now been continued in terms of another thesis work of two logopeds, Anja Morelli and Ellinor Fahlstedt, Lund University, supervised by Sundberg. A group of 17 male and 17 female untrained voices volunteered as subjects. Their task was to vary vocal loudness while keeping pitch constant. As an increase in vocal loudness is normally accompanied by an increase in pitch in untrained voices, the subjects had to be taught how to tease out the vocal loudness variable from the pitch. This task turned out to be quite simple when the subjects were provided with visual real-time feedback on a display, showing fundamental frequency and sound level on the axes. The results are under analysis. It is clear that the data showed a much greater scatter than in the case of the baritone singers. This reflects the fact that professional singers use their voice much more consistently than untrained subjects do.

Analysis and simulation of the glottal oscillator

Sten Ternström has worked half time on the Analysis and simulation of the glottal oscillator, a project funded by the Swedish Research Council for Engineering Sciences (TFR). One goal is to devise a model of the glottal source that allows for perceptually relevant control of the voice source spectrum, and that has high precision in fundamental frequency and negligible aliasing distortion even at low sampling rates. One solution, presented in August at the PEVOC III (3rd Pan European Voice Conference) in Utrecht is to generate a flat but unaliased spectrum using time-domain sinc pulses, and to tailor this spectrum with parameter-controlled filters. The next steps are to refine this source, and to implement control rules for it based on measurements of real speakers and singers. An alternative source model, which proved especially appropriate for child voices, was demonstrated in a synthesis of child singing voice, presented at the Voice Foundation symposium in Philadelphia.

Child voice

Peta White, University of Surrey, Roehampton, received a two years post-doc- fellowship from the European Commission, starting 1999. She has been preparing manuscripts for publication from her dissertation and edited the proceedings of the 1998 symposium on Child Voice, arranged by the KTH Voice Research Centre.

KTH Voice Research Centre

The KTH Voice Research Centre arranged an international symposium on Real-time Biofeedback in Voice Therapy and Training. The basic idea was to present different methods now available to provide such feedback, such as real-time spectrum and spectrogram analysis, flow glottography by inverse filtering, nasality indicator, real-time phonetography, EGG etc. Each method was presented by one representative of the company selling the device and one clinician with solid experience in using it. More than 200 participants from Europe and USA attended the meeting.

The Centre published Rösten i vårt samhälle, the proceedings of the first of the four symposia it arranged at KTH in 1998. This book is a compilation of scientific facts documenting the frequency of voice disturbances in various professions and the associated costs they cause society.

Voice accumulator

Female school teachers often develop voice problems. In co-operation with Maria Södersten of the Huddinge Hospital, Karolinska Institute, Svante Granqvist has developed an automatic voice accumulator that has been used for measuring speaking time and mean sound level of speaker and environment. The potential of his Correlogram program, which analyses periodicity, appears more useful for distinguishing different voice qualities than traditional programs. It has also been found extremely successful for measuring fundamental frequency in piano accompanied violin playing.

Choir singing

Ternström’s long-term commitment to the acoustics of choir singing has taken a back seat for the time being, after the publication in June of a final article in JASA. His participation in the RJ project Music and Motion was reflected in his docent lecture "How to invent a musical instrument", which will appear in adapted form in the Journal of New Music Research. He continues to participate in the development of a new Media Technology programme at KTH.

Music performance

"From Air to Music"

Leonardo Fuks, doctoral student from Rio de Janeiro, Brazil, completed his doctoral thesis "From Air to Music " in December 1998 and defended his dissertation in January. It contained a series of papers on blowing pressures and breathing behaviours in eight professional woodwind players. The results showed that both lung pressures and breathing strategies differed from those observed in professional classically trained singers. An investigation of the relation between the player’s perceived loudness and his/her lung pressures revealed an almost linear relationship in these subjects.

Tibetan chant

The final article of the dissertation presented an investigation of a type of chanting similar to that used by Tibetan monks. Fuks studied the voice source by inverse filtering and the vocal fold vibration characteristics by high-speed photography, performed by Per Åke Lindestad at the Huddinge Hospital. The extremely low fundamental frequency used in this type of singing was found to originate by a combined effect of vocal fold and ventricular band vibration. The vibration of the latter was half that of the former. As a result, every second flow pulse produced by the vocal fold vibration is attenuated by the closing of the ventricular folds. Hence, fundamental frequency dropped one octave below that of the vocal fold vibration. The vibrations of the two pairs of folds were coupled, and also frequency ratios other than 1:2 could be achieved. After successfully defending his dissertation, Fuks returned to Rio de Janeiro, Brazil.


Our research on musical performance by computer modelling has focused on fine analysis of fundamental frequency during violin playing. Four professional violinists played a set of compositions for violin and piano in a concert-like setting together with professional pianists playing on a regular grand piano. The sound of the violin, picked up by a special microphone mounted on the top plate, was successfully separated from the sound of the accompaniment, such that the violin performance could be analysed. Fundamental frequency was measured by means of Granqvist’s autocorrelation functions program. Guest researcher Julietta Gleiser, Argentina, collected data on vibrato rate and extent as well as on sound level. The results reveal that different vibrato parameters, such as extent and frequency, vary substantially depending on musical context and performer.


Sofia Dahl has continued her analyses of timing of percussion players. Recordings of a simple pattern of single strokes with interleaved accents were studied. The recorded sequences showed examples of both short and long term variations. A tendency of prolongation of the interval beginning with the accented stroke could be seen. The perceptual significance of these systematic variations was demonstrated by a listening test. The test showed that the listeners correctly identified the grouping of strokes in sequences with large cyclic variations in timing data.

Emotional coloring

Roberto Bresin has continued his research on the synthesis of performances of the same piece of music that differ with respect to the emotional ambience of the performance, using Director Musices, our generative grammar of music performance. By selecting a subset of performance rules and by tuning the magnitude of their effects on the performance, clearly differing versions were obtained. A forced-choice listening test revealed that listeners were able to classify the versions in accordance with the intended emotional ambience. Analysis of the rule set-ups showed that the rule "Duration Contrast" played a particularly important role in characterising different emotions. This rule enhances the contrast between long and short note values, short notes being played shorter and long notes longer than nominally specified in the score. Highest degree of duration contrast was successfully applied in the "fear" set-up and lowest degree in the "tenderness" set-up. The other emotions considered in this research were happiness, sadness, fear and solemnity. The results were exposed during January-April at the Stockholm house of culture (Kulturhuset) at an exhibition called "Know yourself".

Bresin and Anders Friberg were invited to present their results at the IEEE 1999 "Systems, Man and Cybernetics Conference (SMC'99), October, Tokyo, Japan. Results were also presented at International Symposium on the Neuroscience of Music 1999, October, Brain Research Institute, University of Niigata, Niigata, Japan. Examples are available at the Internet address ~roberto/emotion

Results from some listening tests made during two seminars are available at the following addresses:



Music and Motion

Our four-year project Music and Motion, jointly financed by The Bank of Sweden Tercentenary Foundation and KTH, terminated at the end of the year. The event was celebrated by a public symposium, hosted by the Italian Institute of Culture in Stockholm. The program contained presentations of all researchers who had participated in the project and was crowned by a public concert, presenting music emerging from the project. The Director Musices program was combined with the dance animation system developed within the Kacor -group. A suite of dances was played, composed for computer animation by Peter Rajka to three sonatas by Domenico Scarlatti, and one specially composed piece by Julieta Gleiser, in which the musical interpretation was produced by Director Musices. Tamas Ungvary performed a piece on his new instrument Sensorg, developed within the project, and controlled by finger gestures.

Color of keys

In collaboration with two KTH students, Lysiane Salzmann and Richard Donselius, Bresin investigated how transposition of a recorded music performance can influence listeners’ perception. A composition in C major starting at the pitch of C4 was performed on a synthesiser and recorded in computer MIDI format. The performance was then transposed, such that it started at pitches C3, C5, E3, A4b, A3b, E4, H3 or H4, i.e. after transposition up and down by an octave, a minor sixth, and a third, or by a second down and a major seventh up. In a listening test, subjects were asked to characterise the different performances with respect to a set of adjectives: dark/bright, heavy/light, hard/soft, warm/cold. Preliminary results show that performances in tonalities lower than C4 were classified as dark and higher than C4 as bright, with C3 as the darkest one and C5 as the brightest one.

Temporal aspects of perception

Eric Prame has continued his investigations of temporal aspects of perception by studying the perception of 100% AM of broad band noise and of AM and FM of complex tones and their dependencies on the modulation frequency. This has been done both for modulation frequencies above 8 Hz, the region of "roughness", as well as below that frequency. Previous studies of the roughness region has shown many similarities between AM and FM, which, however, is not the case in the region below 8 Hz. The phenomena in the lower region are also less well known. At 4 Hz, for instance, the perception of the increasing phase of an AM modulation period is clearly dominating, whereas both the increasing and decreasing phases in FM are equally well perceived. This observation is important for understanding the vibrato phenomenon. One explanation is neurophysiological. The dividing frequency of about 8 Hz between the two regions is based upon the research of Prof. Robert Efron, who discovered that the minimum duration of a perception is about 130 ms; when interonset intervals between repetitive stimuli are clearly shorter than 130 ms, perception of roughness emerges.



The top and back plates of the violin are arched. Technically they are shells rather than plates. For a shell, out-of-plane vibrations are coupled to in-plane vibrations. Using TV-holography at the Luleå University of Technology, Erik Jansson in collaboration with Nils-Erik Molin and Anna Runnemalm investigated this phenomenon in a high-quality violin, built by Leon Bernardel in 1909. The instrument was mounted in a cage such that its vibrations could be investigated from all directions without changing the driving or the holding of the violin. The experiments verified the expected vibration properties of the shell and confirmed that this is important. Furthermore, it showed that the neck has a large influence on the body resonances and that the violin, in its bridge plane, acts as an elliptical tube.

An investigation was carried of old Polish violins from Poznan and Krakow, Poland with support from the Polish Science Department at the Music Museum in Poznan and Ms Alicja Knast. Our acoustical method for measuring bridge mobility was found the best. Jansson was invited to perform the acoustical measurements together with the violin builder Benedykt Niewczyk, Poznan. Excitation signals and responses were recorded on a DAT-recorder and were later evaluated in terms of frequency responses at KTH. Calibration recordings showed that the measurements were accurately repeatable and yielded absolute calibrations. The old violins showed some of the properties typically found in violins built a hundred years earlier, during the golden age of Stradivarius and the Cremona school. Some violins were part of the Swedish-Polish history in the sense that they were built by the violinmaker at Sigismund Wasa’s court.

The "bridge hill"

The curve showing the mobility of the bridge of good violins contains a maximum near 2.5 kHz, called the bridge hill. Traditionally this maximum is ascribed to the first in-plane resonance of the bridge. However, experiments carried out by Jansson together with Niewczyk have shown that this maximum rather derives from the local springiness of the top plate at the bridge feet and the bridge’s moving as a rigid mass. Introducing a springy support (rubber) between bridge feet and top plate induced a filter action of this resonance. This is especially interesting as Stradivarius violins as well as old Polish violins have a smooth bridge hill maximum, which most likely derives from the aged wood.

Violin bows and bow-string interaction

The aim of the bow project is to identify and measure mechanical properties of violin bows that are important to the quality of a bow, in particular with respect to playing properties and the influence on timbre. Experimental and commercial bows of fibre glass and carbon fibre composites are included as comparisons to normal wooden bows. Current work has focused on measurements of the dynamical compliance of the bow under realistic loading of the bow. The bow compliance seems to be a key parameter in string players´ characterisation of bows of various quality. As before, the investigations are carried out in co-operation with Prof. Knut Guettler, Norwegian State Academy of Music, Oslo.

Published by: TMH, Speech, Music and Hearing

Last updated: 2005-11-25