Course in Speech and Speaker Recognition

Spring Semester 2007

Purpose

The purpose of this 5 p doctoral course is to give students with basic knowledge of speech technology a deeper understanding of techniques for speech and speaker recognition. 

Contents

The course consists of lectures, practical assignments, exercises and the writing of a term paper on an individually selected topic.

The following topics are treated in the course:

Prerequisites

The course is aimed at students with a basic knowledge of speech technology (the equivalent to a GSLT level 1 course in Speech technology). Basic programming skills are useful as well as knowledge of basic statistics and probability theory.

Schedule

The lecture dates are preliminary and can be slightly changed according to the participants preferences. Fridays are tentatively selected. Please inform Mats if you think other lecture days would fit better. A third lecture day can be added in order to give the topics more detailed presentation. This decision will be taken in consultation with the students during the first lecure day. For example, the second occasion might be prolonged to two days.

Lectures

Date

Time

Room

Contents

Thursday, March 29

10-12, 13-15


PROGRAM


15-18

Fantum, TMH



Office rooms

Introduction
Probability, Statistics and Information Theory
Pattern Recognition
Speech Signal Representations
Hidden Markov Models


HTK tutorial, HTK lab

Friday, March 30


9-12, 13-15

PROGRAM

15-16

Fantum, TMH



Seminar room/ office rooms

HMM Training and Adaptation
Acoustic Modeling
Environmental Robustness


Computational exercises


Friday,
May 11

10-12, 13-15


Preliminary
program


15-17

Fantum, TMH





Speaker Recognition
Language Modeling
Basic and Large Vocabulary Search

Finite State Transducers
Short Demonstration of Dictation

Friday,
June 08

10-11
11-12, 13-15,
15-16
Program

Fantum, TMH



Presentation solution to exercises

Presentations of term papers

Discussion

The final lecture notes will be posted after each lecture.

Dead-lines

Date

Content

April 18

Select topic for term paper

May 2
Select two papers to review

May 7

Mail exercise solutions to teacher

May 20

Mail draft paper to reviewers

May 25
Reviewers return comments to author

June 1

Mail final paper to teacher and the reviewers



Reading material

The main course book is Huang, Acero and Hon (2001): Spoken Language Processing (Prentice Hall, ISBN 0-13-022616-5).
The course will mainly cover chapters 3, 4, 5 (partly), 6, 8, 9, 10, 11, 12, and 13 (if time). Since the book doesn't include speaker recognition, this literature will be in the form of selected papers.
A selection of papers will be used as additional reading material for topics not covered in the book.


Requirements

In order to pass the course the students must:

Practical assignments and exercises

A practical assignment will use the recognition software package HTK and will consist of building a simple recognition task and performing training and evaluation.
Exercises on speech recognition problems will be presented during the first lecture. These will also be downloadable from Exercises. Solutions will be presented during the closing seminar.

Term paper

During the course a term paper shall be prepared by each student and be presented during the closing seminar. The paper shall be reviewed by two fellow students. Choose a topic after discussion with the teachers. This can be an idea of your own, related to your own work or selected from the list below.

Topic suggestions:
Perform recognition experiments with HTK and report results
Limitations in standard HMM and alternative approaches
Pronunciation variation and its importance for speech recognition
Language models for speech recognition
Search methods
Techniques for robust recognition of speech
Confidence measures in speech recognition
The role of prosody for speech recognition
Speaker recognition
More topics can be added during the course


Chosen term paper topics and assigned reviewers


Author
Preliminary Title
Reviewer
Reviewer
Ansis Berzins



Maria Eskevich
Pronunciation variation and its importance for speech recognition
Lisa
Harald
Vera Evdokimova
Automatic recognition of emotions and physical state of the speaker
Maria
Harald
Lisa Gustavsson
Creating an automatic model of speech imitation Daniil
Vera
Harald Hammarström
Machine Learning Experiments on Speech-to-Phoneme Classification using Cepstrum Coefficients Jonas
Andrejs
Daniil Kocharov
The use of articulatory features for speech recognition
Lisa
Anton
Jonas Lindh
Automatic Aligning of Swedish in Praat using HTK HVite Function
Valentin
Vera
Anton Ragni
Subword Language Modelling Using Morphological Units Induced from Lexicon Automata
(Oral presentation on May 11)
Daniil
Valentin
Valentin Smirnov
Phonetic Modelling in ASR (Russian speech) - the impact on performance
Anton
Jonas
Andrejs Vasiljevs
First experiments on Latvian ASR with HTK toolkit
Maria ?


Closing seminar

The closing seminar includes:



Teachers

Mats Blomberg matsb@speech.kth.se      http://www.speech.kth.se/~matsb
Kjell Elenius kjell@speech.kth.se      http://www.speech.kth.se/~kjell
Dept. Speech, Music and Hearing, School of Computer Science and Communication, KTH (Royal Institute of Technology)Lindstedtsvägen 24
SE-100 44 Stockholm, Sweden

How to get to TMH and some travel information
http://www.speech.kth.se/info/location.html

Accomodation
Hotel Arcadia is close to KTH, 10 min walking distance from TMH, and offers lower price for KTH guests. The current single room price is 773 SEK. Adress: Körsbärsvägen 1,  http://www.elite.se/eng/hotell/stockholm/arcadia/ .
A few other hotels in the vicinity of KTH are
Hotel Oden, Karlbergsv. 24, www.hoteloden.se
Hotel Brunnen, Surbrunnsg. 38, www.hotelbrunnen.se

Some low cost hotel and hostel alternatives are
Hostel Bed and Breakfast, Rehnsg. 21, www.hostelbedandbreakfast.com
Hostel Fridhemsplan Vandrarhem STF, S:t Eriksg. 20, www.fridhemsplan.se
Vanadis Hotell o. Bad www.vanadishotel.com