Please read news on problems with Opera 7.11 browsers

Session 09 - Singing

Singing voice modeling - as we know it today
M Kob
Aachen University, Dep. of Phoniatrics, Pedaudiology, and Communication Disorders, Aachen, Germany

The human voice can be described as the joint product of the functional components breathing, phonation, articulation, and radiation. Most approaches for singing voice modeling follow the beaten path of source-filter separation of the complex system: the air flow modulation at the glottis is described as a harmonic-rich source signal that is subsequently filtered by a transfer function modeling the vocal tract passage and radiation. Such models allow easy parametrisation of the voice signal but most of them do not include the interaction between the components which has been found important for singing voice modeling. Still, there is a gap in both accuracy and feasibilty between source-filter approaches and complex time-domain models that can take into account this interaction and often consist of a multiple-mass vocal fold model, aerodynamically motivated noise generation, wave propagation through the vocal tract, and radiation at the mouth. As example for a time-domain model the MATLAB® program "VOX" is demonstrated. It includes a graphic user interface allowing the modification of parameters as lung pressure, glottis configuration, and vocal tract shape during calculation. The application of the model to singing voice synthesis (e.g. overtone singing) and generation of pathologic vocal fold movement (singer's nodules) is demonstrated.

Using imaging and modeling techniques to understand the relationship of vocal tract shape to acoustic characteristics
B Story
University of Arizona, Speech and Hearing Sciences, Tucson, AZ, United States

The respiratory, laryngeal, and vocal tract systems are precisely orchestrated by humans as an instrument of speech and song. This presentation will focus primarily on the role of the vocal tract as a versatile acoustic device capable of producing a wide range of possible vowel and vowel-like sounds. First, it will be shown how recent imaging experiments (MRI) have demonstrated some of the differences and similarities of vocal tract shape across speakers and singers as well as across various voice qualities. The image data collected in these experiments has also facilitated development of a technique for modeling the static and time-varying shape of the vocal tract area function. The model is based on decomposing a specific person's image data into a set of orthogonal basis functions and neutral vowel area functions. It will be used to demonstrate some relations between the area function and resulting acoustic characteristics pertaining to voice quality, formant clustering and tuning, and vowel modification.

Sample-based singing voice synthesizer by spectral concatenation
J Bonada, A Loscos
Pompeu Fabra University, IUA, Barcelona, Spain

The singing synthesis system we present generates a performance of an artificial singer out of the musical score and the phonetic transcription of a song using a frame-based frequency domain technique. This performance mimics the real singing of a singer that has been previously recorded, analyzed and stored in a database, in which we store his voice characteristics (phonetics) and his low-level expressivity (attacks, releases, note transitions and vibratos). To synthesize such performance the systems concatenates a set of elemental synthesis units (phonetic articulations and stationeries). These units are obtained by transposing and time-scaling the database samples. The concatenation of these transformed samples is performed by spreading out the spectral shape and phase discontinuities of the boundaries along a set of transition frames that surround the joint frames. The expression of the singing is applied through a Voice Model built up on top of a Spectral Peak Processing (SPP) technique. SPP considers the spectrum as a set of regions. Each region is made up of one spectral peak and its surroundings and can be shifted both in frequency and amplitude. The Voice Model is based on an improved version of the traditional excitation/filter approach. The system will be demonstrated with several performances.

Finite element model of supraglottal spaces, cleft palate and listener's judgement of velopharyngeal insufficiency
L Cerny¹, J Vokral¹, K Dedouch²
¹Charles University, 1st Faculty of Medicine, Phoniatric Dep., Prague, Czech Republic; ²Czech Technical University, Faculty of Mechanical Engineering, Institute of Mechanics, Prague, Czech Republic

Approximate finite element (FE) models of acoustic spaces corresponding to the human adult male vocal tract and nasal cavity were created according to geometrical data obtained by MRI. The FE models for Czech vowels /u/, /a/, /i/ were analysed for increasing size of a cleft, which joints both acoustic spaces and changes of formants were evaluated. The FE models were designed using the FE system ANSYS and composed from finite elements FLUID 30. The boundary areas of the model were defined as acoustical hard. The results are related to data from adult patients with hypernasality problems as well as to data from normal adults. The palatophony in speech samples of 90 children were evaluated by 17 listeners, seven of them were professionals and ten non-professionals in the field of speech pathology. The palatophony of 54 speech samples before and after the operation of the palate was also evaluated. The subjective evaluation by professionals reflected their greater experience in speech problems evaluation. Acoustical analyses of speech samples was performed by programme Multi-Speech (Kay Elemetrics Corp.).

Controlled use of nonlinear phenomena in contemporary vocal music
M E Edgerton¹, J Neubauer², H Herzel²
¹Freelance Composer, Music, Berlin, Germany; ²Humboldt University, Institute for Theoretical Biology, Berlin, Germany

Extra-normal, complex and multiphonic voice signals of vocal performers were analyzed within the framework of nonlinear dynamics. We give evidence that nonlinear phenomena are extensively and intentionally used by performers associated with contemporary music. Narrow-band spectrograms were used as empirical bifurcation diagrams in order to lend insight into the dynamic parameters of complex vocalizations. Estimates of the changing vocal tract configurations (formants) were extracted from these visualizations. Out of hundreds of samples of extra-normal sonorities we chose examples for period doubling cascades, biphonation and deterministic chaos. Furthermore, instances of glottal whistle production featuring biphonation and triphonation are presented. These acoustic analyses, that when combined with previous research, personal performance and pedagogical experiences provide insight into possible production mechanisms. Instances for formant-induced bifurcations to subharmonics and toroidal oscillation are provided which show the recurrent use of nonlinear phenomena by performers. It is argued that mechanisms such as source-tract coupling or desynchronization due to asymmetry are intentionally used by performers to reproduce extra-normal sonorities for musical tasks.

Acoustical effects of the core principles of the 'Bel Canto' method on choral singing
L Fagnan¹, X Rodet²
¹University of Alberta, Faculté Saint-Jean, Edmonton, Canada; ²Ircam, Analysis/synthesis, Paris, France

The vocal principles of the bel canto method of singing have been endorsed for centuries for solo singing but rarely for ensembles. The study undertaken at Ircam has shed light on the benefits they hold for choral singing.
A variety of choral ensembles was chosen. Each ensemble's own conductor initially led the ensemble in a series of exercises. The choir then worked on learning the bel canto principles and applying them to the same exercises. Finally, the acoustical comparison of these two "before and after" versions was undertaken.
The validity of Manuel Garcia's theory on le coup de glotte and how it affects vocal vibration and spectral energy was examined. Substantial increases in energy in the 2.5 to 4kHz band have been consistently observed. This principle's influence on precision of attack and release was also studied. It has been observed that allowing the vocal folds to begin their vibratory cycle from the closed (adducted) position improves the exactitude of pitch at onset. The paper also explores how the maxim of chiaroscuro (light-dark) resonance acts upon the intrinsic pitch of vowels and the effect on typical articulatory F0 perturbation patterns. Finally, we tested how the maintenance of bel canto vibratory and resonance energy affected the acoustic energy and pitch of ensembles during diminuendos and soft singing. Surprising improvements in both upper harmonic energy and intonation were observed. Sound examples from choral experiments will be provided.

The influence of the practice of basso continuo on the intonation of a professional singer in the time of Monteverdi
G Faraone¹, S Johansson², P Polotti³
¹International Academy of Music, IRMUS, Milano, Italy; ²International Academy of Music, Department of Early Music, Milano, Italy; ³Politecnico di Milano, Dipartimento di Elettronica e Informazione, Milano, Italy

The goal of this work is to investigate the influence of the practice of basso continuo, realized on the harpsichord according to the temperaments in use in the period of Monteverdi, on the intonation of a professional singer. Our investigation will focus on the tuning relationship between the voice and the accompaniment. In selected pieces, several cadences, habitually adopted by the basso continuo players of the late 1500's and the early 1600's, according to the indications in the treatises, are analized. Each cadence is realized in several different ways by the musicians and a spectral analysis is made to evaluate the tonal consonance between the voice and the accompaniment. The same material will be put before expert listeners for their subjective evalutations. Finally, we compare the spectral measurement to both the subjective test results and the rules of the early treatises. Our idea is that spectral analysis can be a good measurament tool for the effectiveness of a cadence. The goals of such work are to see if the rules and suggestions of the treatises, aside from relating to musical esthetics, can also have a foundation in acoustics and therefore furnish additional advise to the singer relative to correct intonation in singing.

Acoustic pulse reflectometry for vocal tract measurement
C D Gray, D M Campbell, C A Greated
University of Edinburgh, School of Physics, Edinburgh, United Kingdom

As a musical instrument, the human vocal tract can be reconfigured in a variety of shapes through the use of articulators which filter the complex glottal waveform into what we perceive as voice sounds.
In voice synthesis, modern models of the vocal tract are often based on geometric data obtained using MRI (Magnetic Resonance Imaging) scans from which a piecewise cylindrical section model can be devised. The disadvantages associated with using MRI for this purpose include; the extended period for which the subject must remain immobile during scanning; the post processing of images; access to such expensive equipment.
As an alternative to MRI, an experimental attempt using acoustic pulse reflectometry for vocal tract measurement is described.
The time domain technique of acoustic pulse reflectometry is used to measure the input impulse response of a tubular object, such as the vocal tract, from which both the bore profile and input impedance can be calculated. Recent developments have allowed a combined measurement and reconstruction time typically in the range of several seconds to real-time.
Results from modelling the bore profile of the vocal tract during vowel production are presented. Applications for this technique in voice analysis and synthesis systems and in clinical usage are discussed.

On the use of electroglottography for characterisation of the laryngeal mechanisms
N Henrich¹, B Roubeau², M Castellengo¹
¹CNRS, Laboratoire d'Acoustique Musicale, Paris, France; ²Hôpital Tenon, Service ORL, Paris, France

In order to produce a wide range of pitches, the human vocal production is characterised by the use of four laryngeal mechanisms, which differs with regards to the vocal fold vibratory length, thickness, tension and adduction. We will first briefly discuss about the terminology used to describe these laryngeal mechanisms and the confusion linked to the notion of register in singing. So as to add more clarity, the following terminology is proposed: mechanism 0 ("vocal fry"), mechanism I ("modal" or "chest" register), mechanism II ("falsetto" for male voice and "head" register for female voice) and mechanism III ("whistle" register). We will then provide a shape description of the electroglottographic signal (EGG) and its derivative (DEGG) in the case of the four laryngeal mechanisms. In the same time, the relation between electroglottography-based open quotient measurements and laryngeal mechanisms will be discussed. We will show how the transition between laryngeal mechanisms can be detected on the EGG and DEGG signals by an amplitude change. Finally, we will discuss the possibilities and limitations of using the electroglottography in order to detect a given laryngeal mechanism.

Vocal fold resonances at low and high pitch tuning
S Hertegård¹, S Granqvist², H Larsson¹
¹Karolinska Institute, Logopedics and Phoniatrics, Stockholm, Sverige; ²Royal Inst of Technology, Speech Music Hearing, Stockholm, Sverige

Stroboscopic examinations indicate that the mechanical properties differed between high and low pitch singing. There is also interaction with the vocal tract resonances. The aim of the study was to examine in detail the vocal fold resonances at low and high pitch tuning Method: A technique was previously developed for studying the mechanical properties of the vocal fold by means of external acoustic stimulation through the pharynx. The vocal fold vibrations were recorded with a high-speed camera (Weinberger system) at 2000 and 3300 frames per second. Trained male singers were examined while they kept the vocal fold in a phonatory position (so-called Kaneko manouver ) and simulated singing at low and high pitch. The vocal folds were excited by means of a frequency sweep and the resulting vocal fold vibrations were analysed with custom made software in order to find resonance peaks. We suggest that the resonances are related to the mechanical vocal fold properties.

A capella SATB quartet in-tune singing: evidence of intonation shift
D M Howard
Electronics, York, United Kingdom

This paper hypothesises that if an a capella SATB quartet tends towards mean intonation, then their absolute pitch reference should drift away from its starting point with modulation. In order to test this hypothesis, special SATB exercises were written that consisted of a number of modulations between the starting and finishing chords which were either identical or exactly an octave apart. The intonation difference between these two chords (after allowing for the octave difference where appropriate) was measured relative to mean tone and equal tempered tuning, and pitch shifts have been observed. Four electrolaryngographs were employed, one per singer, set up so their radio frequency outputs did not interfere with each other. Mean fundamental frequency measurements were made on a note-by-note basis to an accuracy of 1microsecond. The results suggest that conductors who worry about overall intonation drift in a capella performance of musical material that has significant modulation may be misguided, since this can be the only way to remain in-tune. The paper further suggests that achieving consistent equal tempered tuning when singing a capella is an even greater challenge which has en less chance of being achieved.

Larynx closed quotients in a capella SATB quartet singing
D M Howard
Electronics, York, United Kingdom

The percentage of each vocal fold cycle for which the folds remain in contact, or larynx closed quotient (CQ), has been documented for trained and untrained adults and children. In all cases, patterned variations have been observed with singing training and experience. Each member of an SATB a capella group was recorded with an electrolaryngograph when singing carols and other items. The individual plots of CQ against the logarithm of fundamental frequency (f0) for each singer were combined into one. Whilst the plots for each of the four singers were distinct in both their CQ value as well as their pitch range (f0), the outputs for the four singers lay in a line in SATB part order. Any f0 separation is a function of the scoring, but ordering in terms of CQ values is not a result that would suggest itself from knowledge of the patterned variation of CQ and f0 with singing experience and training. One conjecture might be that this provides evidence of a strategy being employed for choral blend, differentiating the parts evenly in terms of their area of operation in terms of f0 and CQ.

The effect of the hypopharyngeal and supra-glottic shapes for the singing voice
H Imagawa¹, K - I Sakakibara², N Tayama³, S Niimiº
¹The University of Tokyo, Department of Speech Physiology, Tokyo, Japan; ²NTT, Communication Science Laboratories, Atsugi-shi, Japan; ³International Medical Center of Japan, Tokyo, Japan; ºInternational University of Health and Welfare, Otawara, Japan

The timbre of the singing voice is strongly effected by the shapes of the laryngeal tube and hypopharynx. In this paper, we propose the physical modeling of the vocal tract and larynx including the vocal and ventricular folds. We study the effect of the shapes of the laryngeal tube and hypopharynx on the synthesized voice by using our proposed model. Our proposed model is realized by acoustical coupling of the the the two-mass model and vocal tract with the piriform fossa and infraglottic cavity. The vocal tract consists of twenty-two cylindrical sections and each cross-sectional area is determined by referring to actual human tract shape obtained by MRI. We assume the sections from the 1st to the 4th to be the laryngeal tube, especially the 1st section to be the laryngeal ventricle, the 2nd and 3rd sections to be the supra-glottic structure which includes the ventricular folds. Furthermore, the ventricular folds is permitted to vibrate. We synthesize Japanese five vowels with various dimension of the hypopharynx and approximation of the supra-glottic structure by changing the area of the sections from the 1st to 6th, and evaluated the acoustical effects of the various shapes of the hypopharynx and laryngeal tube.

The effect of vocal vibrato on blending individual singer characteristic with hall acoustics
K Kato¹, D Noson², Y Ando³
¹Kumamoto University, Science and Technology, Kumamoto, Japan; ²BRC Acoustics Inc., Seattle, United States; ³Kobe University, Science and Technology, Kobe, Japan

New approaches are being made in the study of concert and opera singers and the response of the listener and performer to the acoustics of the performing environment. Significant progress into characterizing the optimum performing environment have followed from singer studies which utilize the effective duration (te) of the running autocorrelation function (ACF) - i.e. the 10-percentile decay of the ACF envelope- as a key parameter for evaluating the source sound signal (that is, the musical performance). A close correlation of the minimum value of the running te, (te)min, with the subjective response of both listeners and performers to time-varying sound fields has been confirmed [Y. Ando Architectural Acoustics, AIP/Springer-Verlag, New York (1998)]. The investigation of the relation between the values of te and musical performing styles provides a basis for supporting mutual understanding of singers and acousticians with each other. For example, Kato and Ando reported the values of (te)min for falsetto and medium falsetto are longer than that for operatic singing style in vocal performances [Journal of Sound and Vibration 258-(3), 463-472 (2002)]. The present study attempts to evaluate changes in vibrato by analyzing running te of the singing voice of twelve singers (3 sopranos, 3 altos, 3 tenors, and 3 bases). The results show that the values of log10(te) have negative linear correlation with a frequency vibrato parameter.

Singer identification through dynamic modeling of vocal fold and vocal tract parameters
Y Kim
MIT, Media Lab, Cambridge, United States

Oftentimes when we listen to a familiar singer, the unique qualities of that performer's voice allow us to establish the singer's identity with relative ease. It is believed that the unique acoustic qualities of an individual singer's voice arise from a combination of innate physical factors (e.g. vocal tract and vocal fold physiology) and individual characteristics of performance and expression (e.g. pronunciation and accent). In this research, we jointly estimate pole-zero filter and LF glottal waveform model parameters to model the shape of the vocal tract and the glottal excitation, respectively over short time periods. These time-varying parameters, corresponding to the physical characteristics of the singer, are used to train a Hidden Markov Model (HMM). The HMM is used to model the dynamic behavior of the source-filter parameters, corresponding to some of the expressive characteristics of the singer. We propose a system that is able to identify singers based upon the model of greatest likelihood among the individually trained HMMs. The data used in this analysis was recorded from five conservatory-level classically trained singers. It is our hope that this analysis will also yield results informative to the dynamic modeling of source-filter parameters for singing voice synthesis.

Spectral modeling of the singing voice using asymmetric generalized Gaussian functions
M Lee¹, M J T Smith²
¹Georgia Institute of Technology, Center for Signal and Image Processing, School of Electrical and Computer Engineering, Atlanta, GA, United States; ²Purdue University, School of Electrical and Computer Engineering, West Lafayette, IN, United States

Analysis and synthesis of the human singing voice has been a focal point of much research within both the music and speech processing communities. In many applications, it is desirable to modify spectral characteristics prior to the synthesis process to enhance pleasing qualities, suppress unwanted ones, or to impart qualities that are specific to a particular style of singing. Existing formant models such as all-pole models offer little flexibility with respect to a formant's shape and form. Furthermore, methods for formant modification can be difficult to control. We propose a new spectral model for formant modification in which formants in the spectral envelope are represented as asymmetric generalized Gaussian functions. These functions do not restrict formants to be symmetric and characterize each formant's shape in addition to its amplitude and bandwidth. An investigation into the ability of this model to characterize spectral differences between classical and pop singing styles is also presented. We show that the new spectral model is capable of identifying certain formant characteristics of each of the singing styles. In addition, by combining these parameters with a sinusoidal synthesis procedure, vocal quality enhancements can be made that can improve style-specific qualities as well as accommodate pitch corrections.

Change of voice characteristics in a human centrifuge
D Mürbe¹, G Hofmann¹, P Lindner¹, P Zöllner¹, J Sundberg²
¹Technical University Dresden, Dept of Otorhinolaryngology, Dresden, Germany; ²KTH Stockholm, Dept. of Speech, Music and Hearing, Stockholm, Sweden

Phonation control relies on auditory and kinaesthetic feedback. The kinaesthetic feedback includes a laryngeal reflex system, depending on discharges of mechanoreceptors in the laryngeal muscles and joints and in the subglottic mucosa. Whereas the influence of auditory feedback might be analysed by masking experiments, the characteristics of the kinaesthetic feedback circuit could be investigated by a change of the laryngeal elastic properties. This change can be simulated by application of a defined acceleration in a human centrifuge.
The voice signal of 6 pilots and a singer was investigated in a centrifuge for flight simulation. After reading a standard text in rest position the same sample was recorded under centrifugal acceleration of 3g. Both test runs were repeated with masking of the auditory feedback, that prevented the subjects from hearing their own voices. Further, sustained vowel tasks were performed. Microphone signal and a signal reflecting the acceleration produced by the centrifuge were recorded on DAT.
Different voice parameters like voice fundamental frequency, formant frequencies and vibrato rate were extracted from the microphone signal for the different test runs. A dependence of some voice parameters on the acceleration profile was found. It was concluded, that investigations in a human centrifuge are suitable to assess the kinaesthetic control of laryngeal vibration.

Comparison of directional sources in simulating a soprano voice
L Parati¹, F Otondo²
¹UniversitaŽdegli Studi di Ferrara, Dipartimento di Ingegneria, Ferrara, Italy; ²Technical University of Denmark, Ørsted-DTU, Acoustic Technology, Kgs. Lyngby , Denmark

The study of the acoustical balance between the singer and the orchestra by means of room acoustical measurements has shown that the directional characteristics of the source are important. This investigation compares the directional characteristics of two loudspeakers used for room acoustic measurements in a historical opera house when simulating a soprano. Directivity measurements of a soprano singer were done in an anechoic chamber and used as a basis for comparison in room acoustic simulations of the Royal Theatre of Copenhagen. The influence of the directional sources in the distribution of the simulated acoustical parameters in the room was compared and evaluated with the distribution obtained from the soprano.

Neck muscle activation in professional classical singing
V Pettersen¹, R H Westgaard²
¹Stavanger University College, School of Art Education, Stavanger, Norway; ²Norwegian University of Science and Technology, I.Ø.T.L, Trondheim, Norway

The study aimed to examine electromyographic (EMG) activity of muscles in the neck and shoulder, to determine the contribution of these muscles to respiratory control during classical singing. Five professional opera singers, 2 males and 4 females, participated. Neck EMG activity was recorded from sternocleidomastoideus (STM), the scalenus muscles (SC), and from the posterior part of the neck (PN) Shoulder EMG activity was recorded from upper trapezius (TR). Upper thorax (TX) movement was traced by a strain gauge sensor. The level and pattern of muscle activity was determined in relation to respiratory phasing. Three singing tasks were performed: aria, sustained tones and extreme tones. The EMG activity of TR and STM was thereafter lowered by EMG biofeedback (BF), to examine whether changes in the activity pattern of these muscles have consequences for the activation of SC and PN. Muscle activity was then recorded in a repeat performance of the same tasks.
Results: Substantial muscle activity was observed in the posterior neck muscles during inhalation and phonation (i.e., exhalation). STM and SC showed a marked, transient component in their activity pattern, coinciding with peak inspiration, while this component was weaker in PN. The activity pattern of the three muscles was substantially correlated, showing synergistic activation in forceful respiration. EMG biofeedback to reduce TR and STM activity had the secondary effect of lowering EMG activity of SC and PN.

The laryngeal flow model for pressed-type singing voices
K - I Sakakibara¹, H Imagawa², S Niimi³, N Osaka¹
¹NTT, Communication Science Laboratories, Atsugi-shi, Japan; ²The University of Tokyo, Department of Speech Physiology, Tokyo, Japan; ³International University of Health and Welfare, Otawara, Japan

The Asian traditional pressed-type singing voices are different from the European traditional singing voice in their timbre and voice production mechanism. In throat singing, the ventricular folds vibrate as well as the true vocal folds and resulting in the generation of the special laryngeal voice sources. On the other hand, in some other pressed-type singing voices, such as Japanese Minyou, the ventricular folds only approximate but do not vibrate. In this paper, we propose a new laryngeal flow model incorporating the effect of the ventricular fold vibration and laryngeal ventricle resonance. The model is a combination of the known glottal airflow model (R model), the laryngeal ventricle resonance (Helmholtz resonator), and the modulation of ventricular fold vibration. We will also demonstrate the relation between model parameters and voice quality with the use of synthesized sustained vowels.

Acoustic and aerodynamic analyses of support and resonance demonstrated by an elite baritone singing teacher
R C Scherer¹, N Radhakrishnan¹, A Poulimenos²
¹Bowling Green State University, Department of Communication Disorders, Bowling Green OH, United States; ²Indiana University, School of Music, Bloomington IN, United States

An elite singer/singing teacher (baritone) used a pedagogical approach to demonstrate levels of support and resonance: (1) the location of support - glottis (poor), chest (better), abdomen (best), and (2) locations of resonance - hard palate/straight tone (poor), mouth (better), sinus/head (best), using a single frequency (196 Hz), mezzo-forte loudness, and the smooth /pae pae pae/ technique. The support sequence was characterized by formant frequency lowering, suggestive of vocal tract lengthening. The resonance sequence was characterized by AC flow and mean flow increases and inferred adduction decreases (the vocal processes were always together, however). Flexible video fiberoptics suggested lowering of the larynx and a narrower opening at the top of the larynx tube from poor to best productions. The best locations had the widest F2 bandwidths. The better and best locations had the largest intensity difference between F2 and F3. Although acoustic power increased from the poor to best productions, acoustic efficiency was not a discriminating factor. Open and speed quotients were also not differentiating. The flow resistance was highest and aerodynamic power the lowest for the poor productions. The maximum flow declination rate correlated highly with the AC flow and SPL.

Perceptual Relevance of the 5kHz spectral region to sex identification in children's singing voices
P J Sjölander
KTH, TMH, Stockholm, Sverige

In a previous investigation, the recorded voices of fifty-nine children (30 boys and 29 girls) aged between 3 and 12 years were evaluated with respect to perceived gender and actual sexual identity by a group of experienced listeners. The audio recordings were later acoustically analysed using long-term average spectrum (LTAS) analysis and the results were compared to those of the perceptual evaluation. The results revealed a peak in the average spectrum at 5 kHz for children perceived with confidence as boys (whether male or female in actuality), and a flat spectrum at 5 kHz for children perceived as girls. These findings suggested that an acoustically measurable long-term, and therefore persistent, difference may exist between boys' and girls' voices. However, it was unclear if the peak itself carried any perceptual information. In the present experiment, the recordings were re-submitted for perceptual analysis. This time each voice was presented twice, quasi-randomly, the second sample having been filtered to manipulate the effect of the higher frequencies on perception. A panel of listeners was asked to judge the sex of each subject as presented via audio headphones in order to shed light on the significance of the 5 kHz peak.

Maintained by