Annual Report 1999
Table of Contents
The undergraduate courses given by the department are all aimed at the
last two years of the Master of Science (civilingenjör). All courses belong to the pool of optional courses
that undergraduate students choose among for their specialisation. The students that choose the department's courses
usually follow the M.Sc. programmes Electrical engineering, Computer science and engineering or Technical physics.
The department has had a very low teaching activity on the undergraduate
level but that is changing. Two of the courses are new from this year, namely the courses in
Pattern recognition and
Source coding theory (this
explains their low number of students). In the spring of 2000, a further course in
Digital speech coding will start. The department will also be
involved in the new M.Sc. programme Media Technology
To complete their M.Sc. degree the undergraduates are required to do
a thesis that corresponds to five months full time work. During the last year, 17 students chose to do their thesis
at the department.
2F1111 Speech technology (4 cr)
Elements of information theory, linguistics and phonetics applied to
speech transmission and communication. Speech signal analysis and processing. Acoustic theory of speech production,
implementation in speech synthesis. Auditory functions and perception of speech. Automatic speech recognition and
text-to-speech conversion. Speech quality and intelligibility assessments. Automatic speaker veriication. Text-to-speech
and speech-to-text in systems for human-computer interaction, especially multi-modal dialogue systemsEffects of
system distortion including room acoustics and hearing loss. Technical aids for people with auditory, visual, speech,
and language handicaps.
2F1212 Music acoustics (4 cr)
Elements of acoustics including wave propagation, electro-mechanical
analogies, string membranes, tubes. Fourier transform and elements of auditory theory. Design and function of musical
instruments. Musical scales. Music generation and performance rules. Elements of room acoustics. Synthesis of instrument
2F1400 Electro acoustics (4 cr)
General theory of sound waves. Auditory functions and perceptual limitations.
Analogies between electrical, mechanical, and acoustic systems. Acoustical impedance. Basic equations for electro-mechanical
four-terminal systems. Loudspeakers, microphones. Sound-recording techniques. Ultrasound. Measurement techniques.
2F1510 Pattern recognition (4 cr)
Classical Bayesian decision theory, signal classification in additive
white Gaussian noise, matched filter design, parameter estimation and supervised learning, non-parametric classification
techniques, linear discriminant functions, and an introduction to neural network design. These concepts and methods
are extended to the classification of pattern sequences using the Hidden Markov Model. Human communication by speech
and hearing in the framework of signal classification theory.
2F1520 Digital speech compression (4 cr)
Arne Leijon, Bastiaan Kleijn
Från förra året
With respect to source coding, the principles of rate distortion theory,
lossless (entropy) coding (Huffmann, Ziv-Lempel), scalar and vector quantization, and predictive, transform, and
subband coding are presented. Applications of these methods to the coding of speech, audio, images, and image sequences,
including the procedures found in the GSM, JPEG, and MPEG standards, are also described.
With respect to pattern recognition, the course covers Bayes decision
theory, signal classification in additive white Gaussian noise, signal decomposition and signal spaces, matched
filter design, maximum likelihood estimation, and supervised and unsupervised learning. These concepts and methods
are applied to the Hidden Markov Model commonly used in automatic speech recognition. Some aspects of human communication
by speech and hearing are also discussed in the framework of signal classification theory.
The course treats current speech-coding technology by means of laboratory
exercises, projects, and lectures. It provides hands-on experience with the application of signal processing methods.
To provide a good understanding of current speech coding technology and
to provide practical experience in signal processing.
Classification and overview of speech coders, overview of speech-coding
standards. Uniform, nonuniform, and adaptive quantization.
Pulse code modulation and adaptive pulse code modulation. Vocoders which
use models of the vocal tract and its excitation.
Waveform coding techniques including differential PCM, adaptive DPCM
and delta modulation. Analysis-by-synthesis coding
methods including multi-pulse LPC, RPLPC, and CELP. Subband and transform
coding. Sinusoidal coding and waveform
interpolation. Detailed description of several current speech-coding
standards including the GSM speech-coding algorithm.
2F1530 Source coding theory (4 cr)
The course treats the principles of the encoding of speech, audio, video,
and images at low bit rates. Source coding techniques such as scalar and vector quantization, orthogonal transforms,
and linear prediction are introduced and their performance is analyzed theoretically. The theoretical bounds on
the performance of source coders are discussed.
To provide an understanding of the theoretical basis for source coding.
Basic information theory of discrete and continuous variables: entropy
and differential entropy, mutual information, asymptotic equipartition.
Lossless coding: Shannon, Huffmann, Ziv-Lempel codes.
Rate-distortion Theory: the rate-distortion function, Shannon lower bound,
rate distribution over independent variables, Blahut algorithm.
Scalar and vector quantization: entropy-constrained quantization, resolution-constrained
quantization, high-resolution theory, Lloyd training algorithm, structured and lattice vector quantization.
Orthogonal transforms: relation to source coding, Karhunen-Loeve transform
and its approximations, energy concentration, best-basis search, filterbanks.
Linear prediction: relation to source coding, computation, Kolmogorov's
Formula, spectral flatness, noise shaping and closed-loop prediction.
Graduate students comprise about one third of the personnel at the department.
A few students are financed through external resources. Graduate studies towards the Doctor of Science degree require
a minimum of four years after the M.Sc. graduation. Since most students are financed by research projects this
limit is generally exceeded.
The requirements include theoretical studies and a thesis. The thesis
may be composed of several articles published in scientific journals of international standard including our
Speech, Music and Hearing, Quarterly Progress and Status Report
The theoretical studies are individually tailored within the domain of
graduate courses. Requirements include participation in research seminars and attending occasional lectures which
supplement literature assignments. Credits are also given for certain undergraduate courses on top of the undergraduate
requirements and for courses taken at Stockholm University, e.g. linguistics and phonetics. In addition to the
teaching arranged by the staff at the department, special "bullet" courses are organised every year.
At such an event a well-known researcher is invited to give a course during a limited period of time, typically
a week. A special doctoral course, which the department organised in 1999, was the 7
th European Summer School on Language
and Speech Communication (MiLaSS). Several students at the department have earlier participated in similar summer
schools in Europe.
In the fall of 1999, the graduate studies were reorganised according
to the KTH goal of reducing the number of doctoral programmes. Two main programmes were defined at the department.
Speech and Music Communication
The Speech and Music Communication programme includes studies of human
communication primarily with the help of acoustic signals such as speech and music. Communication with visual signals
such as facial gestures during speech production is also included. The programme contains descriptions, theories,
models and applications covering all parts of the communication chain from production to acoustic transmission
to perception and understanding or impression.
The programme has two subtopics:
and Music Acoustics.
Acoustic Signal Processing
The Acoustic Signal Processing programme covers theory and application
in the field of acoustic signal processing, signal coding and information transmission, related to human sound
production and signal processing by the human senses.
The programme has two subtopics: Hearing
Technology and Speech
Courses, graduate laval
2F5307 Psycho-acoustics and speech perception (3-6 cr)
Fundamental auditory physiology and psycho-acoustics, including demonstration
experiments. Psycho-acoustic test methods: criterion-independent test methods, measurement accuracy, and models
for interpretation of test results. Exercises with application of signal-detection theory and estimation theory
for analysis of psycho-acoustic experiments. Optional individual project.
7th European Summer School on Language and Speech Communication
Dominic W. Massaro:
Multimodal Speech Perception: A Paradigm
for Speech Science (Plenary)
David McNeill: Multimodality of meaning in speech and gesture
Niels Ole Bernsen:
Multimodality in language and speech
systems - from theory to design support tool (Parallell)
Elisabeth André and Björn Granström
Multimedia Presentation Systems (Parallel)
Architectures for integrated multimodal
input-output systems and the humanoid interface (Parallel)
Jens Allwood: Face-to-face communication including different
Multimodal interaction and people with
Paul Mc Kevitt:
Developing intelligent multimedia applications (Parallel)
Hands-on sessions, the KTH group:
Multimodal dialogue systems & audio-visual synthesis (Parallel)
Bullet course, April 26 – 28
Chin-Hui Lee, Lucent Technologies, USA
Three lectures on automatic speech and speaker recognition:
1. Robust speech recognition -- overview of statistical pattern recognition
approach and the implied robustness problems and solutions.
2. Speaker and speech verification -- overview of statistical pattern
verification approach and applications to speaker and speech verification.
3. A detection approach to speech recognition and understanding -- a
paradigm to combine feature-based detection and state-of-the-art recognition to open up new possibilities.
Bullet course, May 17 – 19
Anne Cutler, Max Planck Institute for Psycholinguistics, Nijmegen,
Three lectures on human recognition of spoken words:
1. The psycholinguistic approach
2. From input to lexicon
Bullet course, June 14 – 16
Gerard Chollet, ENST, Paris,France
1. Speech analysis, coding, synthesis and recognition:Time-frequency
representations, wavelets, Temporal decomposition, Time-dependent models, Analysis by synthesis, H+N, HMMs, Markov
2. ALISP (Automatic Language Independent Speech Processing): Learning
from examples, Segmental models, Speaker normalisation, Very Low bit rate coding, Multilingual speech recognition…
3. Applications: Identity verification, decision fusion, Interactive
Voice Servers, 'Majordome', Multicasting….
Pre-Speech-Coding-Workshop, June 18
The adaptive multi-rate speech coder
Roar Hagen (Ericsson)
Reverse water-filling in predictive encoding of speech
Soren V Andersen (KTH)
Auditory modeling for speech coding
Gernot Kubin, Vienna Technical University
Making waveform coded WI possible
Ian Burnett, University of Wollongong
Spectrum Analyisis: Alternative Linear Prediction Techniques
Nicola Chong, University of Wollongong
Optimized error correction of MELP speech parameters via maximum a posteriori
Douglas Rahikka (NSA)
Speech services in third generation wireless systems
Eric Ekudden (Ericsson)
Interoperable global communications, a government perspective
John Collura (NSA)
Discussions and KTH demos
Bullet course, October 11-13
Sharon Oviatt, Professor and Co-Director, Center for Human-Computer
Communication, Dept of Computer Science, Oregon Graduate Institute of Science & Technology
1. Modeling hyperarticulate speech to interactive systems
2. Mutual disambiguation of recognition errors in a multimodal architecture
3. Designing and evaluating conversational interfaces with animated characters
Further Education Courses
2F4215 The functioning of the singing voice (4 cr)
The course was arranged again, this time in a four-weekends format. Fifteen
students from different parts of Sweden and some from Norway participated. The lectures concerned physiologgy,
breathing, voice source, formants/articulation, proprioception, perception and hygiene. Participants spent about
half of the course in workshops, where they analysed their own voice production by various real-time biofeedback
devices, such as pletysmography for tracking breathing behaviour, inverse filtering for visualizing voice source