Annual Report 1999

Table of Contents

Inger Karlsson

Director of
undergraduate studies

Undergraduate Level

The undergraduate courses given by the department are all aimed at the last two years of the Master of Science (civilingenjör). All courses belong to the pool of optional courses that undergraduate students choose among for their specialisation. The students that choose the department's courses usually follow the M.Sc. programmes Electrical engineering, Computer science and engineering or Technical physics.

The department has had a very low teaching activity on the undergraduate level but that is changing. Two of the courses are new from this year, namely the courses in Pattern recognition and Source coding theory (this explains their low number of students). In the spring of 2000, a further course in Digital speech coding will start. The department will also be involved in the new M.Sc. programme Media Technology .

To complete their M.Sc. degree the undergraduates are required to do a thesis that corresponds to five months full time work. During the last year, 17 students chose to do their thesis at the department.

2F1111 Speech technology (4 cr)

27 students

Inger Karlsson

Elements of information theory, linguistics and phonetics applied to speech transmission and communication. Speech signal analysis and processing. Acoustic theory of speech production, implementation in speech synthesis. Auditory functions and perception of speech. Automatic speech recognition and text-to-speech conversion. Speech quality and intelligibility assessments. Automatic speaker veriication. Text-to-speech and speech-to-text in systems for human-computer interaction, especially multi-modal dialogue systemsEffects of system distortion including room acoustics and hearing loss. Technical aids for people with auditory, visual, speech, and language handicaps.

2F1212 Music acoustics (4 cr)

26 students

Anders Askenfelt

Elements of acoustics including wave propagation, electro-mechanical analogies, string membranes, tubes. Fourier transform and elements of auditory theory. Design and function of musical instruments. Musical scales. Music generation and performance rules. Elements of room acoustics. Synthesis of instrument and voice.

2F1400 Electro acoustics (4 cr)

261 students

Johan Liljencrants

General theory of sound waves. Auditory functions and perceptual limitations. Analogies between electrical, mechanical, and acoustic systems. Acoustical impedance. Basic equations for electro-mechanical four-terminal systems. Loudspeakers, microphones. Sound-recording techniques. Ultrasound. Measurement techniques.

2F1510 Pattern recognition (4 cr)

6 students

Arne Leijon

Classical Bayesian decision theory, signal classification in additive white Gaussian noise, matched filter design, parameter estimation and supervised learning, non-parametric classification techniques, linear discriminant functions, and an introduction to neural network design. These concepts and methods are extended to the classification of pattern sequences using the Hidden Markov Model. Human communication by speech and hearing in the framework of signal classification theory.

2F1520 Digital speech compression (4 cr)

10 students

Arne Leijon, Bastiaan Kleijn

Från förra året

With respect to source coding, the principles of rate distortion theory, lossless (entropy) coding (Huffmann, Ziv-Lempel), scalar and vector quantization, and predictive, transform, and subband coding are presented. Applications of these methods to the coding of speech, audio, images, and image sequences, including the procedures found in the GSM, JPEG, and MPEG standards, are also described.

With respect to pattern recognition, the course covers Bayes decision theory, signal classification in additive white Gaussian noise, signal decomposition and signal spaces, matched filter design, maximum likelihood estimation, and supervised and unsupervised learning. These concepts and methods are applied to the Hidden Markov Model commonly used in automatic speech recognition. Some aspects of human communication by speech and hearing are also discussed in the framework of signal classification theory.


The course treats current speech-coding technology by means of laboratory exercises, projects, and lectures. It provides hands-on experience with the application of signal processing methods.


To provide a good understanding of current speech coding technology and to provide practical experience in signal processing.


Classification and overview of speech coders, overview of speech-coding standards. Uniform, nonuniform, and adaptive quantization.

Pulse code modulation and adaptive pulse code modulation. Vocoders which use models of the vocal tract and its excitation.

Waveform coding techniques including differential PCM, adaptive DPCM and delta modulation. Analysis-by-synthesis coding

methods including multi-pulse LPC, RPLPC, and CELP. Subband and transform coding. Sinusoidal coding and waveform

interpolation. Detailed description of several current speech-coding standards including the GSM speech-coding algorithm.

2F1530 Source coding theory (4 cr)

6 students

Bastiaan Kleijn


The course treats the principles of the encoding of speech, audio, video, and images at low bit rates. Source coding techniques such as scalar and vector quantization, orthogonal transforms, and linear prediction are introduced and their performance is analyzed theoretically. The theoretical bounds on the performance of source coders are discussed.


To provide an understanding of the theoretical basis for source coding.


Basic information theory of discrete and continuous variables: entropy and differential entropy, mutual information, asymptotic equipartition.

Lossless coding: Shannon, Huffmann, Ziv-Lempel codes.

Rate-distortion Theory: the rate-distortion function, Shannon lower bound, rate distribution over independent variables, Blahut algorithm.

Scalar and vector quantization: entropy-constrained quantization, resolution-constrained quantization, high-resolution theory, Lloyd training algorithm, structured and lattice vector quantization.

Orthogonal transforms: relation to source coding, Karhunen-Loeve transform and its approximations, energy concentration, best-basis search, filterbanks.

Linear prediction: relation to source coding, computation, Kolmogorov's Formula, spectral flatness, noise shaping and closed-loop prediction.

Rolf Carlson

Director of
Graduate Studies

Graduate Level

Graduate students comprise about one third of the personnel at the department. A few students are financed through external resources. Graduate studies towards the Doctor of Science degree require a minimum of four years after the M.Sc. graduation. Since most students are financed by research projects this limit is generally exceeded.

The requirements include theoretical studies and a thesis. The thesis may be composed of several articles published in scientific journals of international standard including our Speech, Music and Hearing, Quarterly Progress and Status Report .

The theoretical studies are individually tailored within the domain of graduate courses. Requirements include participation in research seminars and attending occasional lectures which supplement literature assignments. Credits are also given for certain undergraduate courses on top of the undergraduate requirements and for courses taken at Stockholm University, e.g. linguistics and phonetics. In addition to the teaching arranged by the staff at the department, special "bullet" courses are organised every year. At such an event a well-known researcher is invited to give a course during a limited period of time, typically a week. A special doctoral course, which the department organised in 1999, was the 7 th European Summer School on Language and Speech Communication (MiLaSS). Several students at the department have earlier participated in similar summer schools in Europe.

In the fall of 1999, the graduate studies were reorganised according to the KTH goal of reducing the number of doctoral programmes. Two main programmes were defined at the department.

Speech and Music Communication

The Speech and Music Communication programme includes studies of human communication primarily with the help of acoustic signals such as speech and music. Communication with visual signals such as facial gestures during speech production is also included. The programme contains descriptions, theories, models and applications covering all parts of the communication chain from production to acoustic transmission to perception and understanding or impression.

The programme has two subtopics: Speech Communication and Music Acoustics.

Acoustic Signal Processing

The Acoustic Signal Processing programme covers theory and application in the field of acoustic signal processing, signal coding and information transmission, related to human sound production and signal processing by the human senses.

The programme has two subtopics: Hearing Technology and Speech Signal Processing.

Courses, graduate laval

2F5307 Psycho-acoustics and speech perception (3-6 cr)

13 students

Arne Leijon

Fundamental auditory physiology and psycho-acoustics, including demonstration experiments. Psycho-acoustic test methods: criterion-independent test methods, measurement accuracy, and models for interpretation of test results. Exercises with application of signal-detection theory and estimation theory for analysis of psycho-acoustic experiments. Optional individual project.

MiLaSS -
7th European Summer School on Language and Speech Communication

July 12-23

Dominic W. Massaro: Multimodal Speech Perception: A Paradigm for Speech Science (Plenary)

David McNeill: Multimodality of meaning in speech and gesture (Parallel)

Niels Ole Bernsen: Multimodality in language and speech systems - from theory to design support tool (Parallell)

Elisabeth André and Björn Granström : Intelligent Multimedia Presentation Systems (Parallel)

Kristinn Thórisson: Architectures for integrated multimodal input-output systems and the humanoid interface (Parallel)

Jens Allwood: Face-to-face communication including different modalities (Plenary)

Alistair Edwards: Multimodal interaction and people with disabilities (Parallel)

Paul Mc Kevitt: Developing intelligent multimedia applications (Parallel)

Hands-on sessions, the KTH group: Multimodal dialogue systems & audio-visual synthesis (Parallel)

Bullet course, April 26 – 28

Chin-Hui Lee, Lucent Technologies, USA

Three lectures on automatic speech and speaker recognition:

    1. Robust speech recognition -- overview of statistical pattern recognition approach and the implied robustness problems and solutions.

    2. Speaker and speech verification -- overview of statistical pattern verification approach and applications to speaker and speech verification.

    3. A detection approach to speech recognition and understanding -- a paradigm to combine feature-based detection and state-of-the-art recognition to open up new possibilities.

Bullet course, May 17 – 19

Anne Cutler, Max Planck Institute for Psycholinguistics, Nijmegen, Holland

Three lectures on human recognition of spoken words:

1. The psycholinguistic approach

2. From input to lexicon

3. Language-specificity

Bullet course, June 14 – 16

Gerard Chollet, ENST, Paris,France

    1. Speech analysis, coding, synthesis and recognition:Time-frequency representations, wavelets, Temporal decomposition, Time-dependent models, Analysis by synthesis, H+N, HMMs, Markov fields,…

    2. ALISP (Automatic Language Independent Speech Processing): Learning from examples, Segmental models, Speaker normalisation, Very Low bit rate coding, Multilingual speech recognition…

    3. Applications: Identity verification, decision fusion, Interactive Voice Servers, 'Majordome', Multicasting….

Pre-Speech-Coding-Workshop, June 18

The adaptive multi-rate speech coder

Roar Hagen (Ericsson)

Reverse water-filling in predictive encoding of speech

Soren V Andersen (KTH)

Auditory modeling for speech coding

Gernot Kubin, Vienna Technical University

Making waveform coded WI possible

Ian Burnett, University of Wollongong

Spectrum Analyisis: Alternative Linear Prediction Techniques

Nicola Chong, University of Wollongong

Optimized error correction of MELP speech parameters via maximum a posteriori (MAP) techniques

Douglas Rahikka (NSA)

Speech services in third generation wireless systems

Eric Ekudden (Ericsson)

Interoperable global communications, a government perspective

John Collura (NSA)

Discussions and KTH demos

Bullet course, October 11-13

Sharon Oviatt, Professor and Co-Director, Center for Human-Computer Communication, Dept of Computer Science, Oregon Graduate Institute of Science & Technology

    1. Modeling hyperarticulate speech to interactive systems

    2. Mutual disambiguation of recognition errors in a multimodal architecture

    3. Designing and evaluating conversational interfaces with animated characters

Further Education Courses

2F4215 The functioning of the singing voice (4 cr)

15 students

Johan Sundberg

The course was arranged again, this time in a four-weekends format. Fifteen students from different parts of Sweden and some from Norway participated. The lectures concerned physiologgy, breathing, voice source, formants/articulation, proprioception, perception and hygiene. Participants spent about half of the course in workshops, where they analysed their own voice production by various real-time biofeedback devices, such as pletysmography for tracking breathing behaviour, inverse filtering for visualizing voice source etc.

Published by: TMH, Speech, Music and Hearing

Last updated: 2004-10-25