Course in Speech Recognition

Fall Semester 2003


Purpose

The purpose of this 5 p course is to give students with basic knowledge of speech technology a deeper  understanding of techniques for speech recognition. The course includes practical assignments and a term paper on an individually selected topic.

Contents

The course consists of lectures, practical assignments and the writing of a term paper on an individually selected topic.

The following topics are treated in the course:

Prerequisites

The course is aimed at students with a basic knowledge of speech technology (the equivalent of a GSLT level 1 course in Speech technology). Basic programming skills are useful as well as a knowledge of basic statistics and probability theory.

Schedule

Lectures

Date
Time
Room
Contents Lecture Notes
September 19
10-12, 13-15, 15-16
Utställningen, TMH
1st Lecture
Slides_lecture_1
November 7
10-12, 13-15, 15-16
Utställningen, TMH
2nd Lecture
Slides_lecture_2
HTK Tutorial
November 28
10-12, 13-15
Utställningen, TMH
3rd Lecture
Slides_lecture_3_Ch_11
Slides_lecture_3_Ch_12
February 6
10-12, 13-15, 15-16 Utställningen, TMH Closing seminar


Dead-lines

Date

Content

Nov 21 Select topic for term paper
December 5 Mail solutions to exercises  to Mats
  December 12   Mail selection of 2 papers for review
December 26
Author mail draft to reviewer
January 17 Mail review to author
January 24 Mail final paper to Mats

Reading material

The main text book for the course is Huang, Acero and Hon (2001), Spoken Language Processing.
A selection of papers will be used as additional reading material for topics not covered in the book.


Requirements

In order to pass the course the students must:

Practical assignments

Exercises on speech recognition problems will be presented during the second lecture, Nov 7. The exercises will also be downloadable from Exercises

Send solutions not later than December 5 to Mats Blomberg matsb@speech.kth.se

Solutions will be presented during the closing seminar.

Term paper

During the course a term paper shall be prepared by each student and be presented during the closing seminar. The paper shall be reviewed by two fellow students.

Topic suggestions:
Choose a topic after discussion with the teacher (e.g. your own work and experiments)
Limitations in standard HMM and alternative approaches
Pronunciation variation and its importance for speech recognition
Language models for speech recognition
Search methods
Techniques for robust recognition of speech
Confidence measures in speech recognition
The role of prosody for speech recognition

Report your choice of topic and preliminary title by November 21 to Mats
Report your choice of 2 titles for review before Dec 12.

Chosen topics and assigned reviewers

Author
Title/Topic
Reviewer
Reviewer
Daniel Elenius
Adaptation techniques for children's speech recognition
Anna
Susanne
Genevieve Gorell
Language Modelling for Spoken Dialogue Systems; Grammar-Based and Robust Approaches Compared and Contrasted Gustav
Magnus
Leif Grönqvist
Robust Methods for Automatic Transcription and Alignment of Speech Signals
Per-Anders
Daniel
Per-Anders Jande
Automatic Detailed Transcription of Speech Using Forced Alignment
and Naïve Pronunciation Rules
Leif
Genevieve
Magnus Nordstrand
"The role of prosody for speech recognition" Anna
Susanne
Anna Sandell
An exploration of hidden Markov models Leif
Magnus
Susanne Schötz
Automatic prediction of speaker age using CART Per-Anders
Gustav
Gustav Öquist
Speech Recognition for Context Awareness
Daniel
Genevieve


Closing seminar

The closing seminar includes:

  The closing seminar will take place at TMH, KTH 6 February 2004.

Seminar program

Time
Name
Title
10.15
Mats
Presentation of solutions to the exercises
11.00

Break
11.10
Anna
An exploration of hidden Markov models
11.35
Daniel
Adaptation techniques for children's speech recognition
12.00

Lunch
13.10
Genevieve
Language Modelling for Spoken Dialogue Systems; Grammar-Based and Robust Approaches Compared and Contrasted
13.35
Leif
Robust Methods for Automatic Transcription and Alignment of Speech Signals
14.00
Per-Anders Automatic Detailed Transcription of Speech Using Forced Alignment
and Naïve Pronunciation Rules
14.25
Susanne
Automatic prediction of speaker age using CART
14.50

Coffee break
15.10
Gustav
Speech Recognition for Context Awareness
15.35

Discussion
16.00

End



Teacher

Mats Blomberg matsb@speech.kth.se      http://www.speech.kth.se/~matsb
Dept. Speech, Music and Hearing, KTH (Royal Institute of Technology)
SE-100 44 Stockholm, Sweden

How to get to TMH and some travel information

http://www.speech.kth.se/info/location.html