Computer methods for voice analysis

Svante Granqvist


Doctoral dissertation, Stockholm, March 28, 2003


Department of Speech, Music and Hearing


This thesis consists of five articles and a summary. The thesis deals with methods for measuring properties of the voice. The methods are all computer-based, but utilise different approaches for measuring different aspects of the voice.

Paper I introduces the Visual Sort and Rate (VSR) method for perceptual rating of voice quality. The method is based on the Visual Analogue Scale (VAS), but simultaneously shows all stimuli as icons along the VAS on the computer screen. As the listener places similar-sounding stimuli close to each other during the rating process, comparing stimuli becomes easier.

Paper II introduces the correlogram. Fundamental frequency F0 sometimes cannot be strictly defined, particularly for perturbed voice signals. The method displays multiple consecutive correlation functions in a grey scale image. Thus, the correlogram avoids selecting a single F0 value. Rather it presents an unbiased image of periodicity, allowing the investigator to select among several candidates, if appropriate.

Paper III introduces a method for detection of phonation to be utilised in voice accumulators. The method uses two microphones attached near the subject’s ears. Phase and amplitude relations of the microphone signals are used to form a phonation detector. The output of the method can be used to measure phonation time, speaking time and fundamental frequency of the subject, as well as sound pressure level of both the subject’s voicing and the ambient sounds.

Paper IV introduces a method for Fourier analysis of high-speed laryngoscopic imaging. The data from the consecutive images are re-arranged to form time-series that reflect the time-variation of light intensity in each pixel. Each of these time series is then analysed by means of Fourier transformation, such that a spectrum for each pixel is obtained. Several ways of displaying these spectra are demonstrated.

Paper V examines a test set-up for simultaneous recording of airflow, intra-oral pressure, electro-glottography, audio and high-speed imaging. Data are analysed with particular focus on synchronisation between glottal area and inverse filtered airflow. Several methodological aspects are also examined, such as the difficulties in synchronising high-speed imaging data with the other signals.


Key words: voice analysis, perceptual analysis, fundamental frequency, correlogram, aperiodicity, Fourier analysis, high-speed imaging, laryngoscopy, vocal fold vibration, voice accumulation.


