|
|
The purpose of this 5 p doctoral course is to give students with
basic knowledge of speech technology a deeper understanding of
techniques for speech and speaker recognition.
The course consists of lectures, practical assignments, exercises and the writing of a term paper on an individually selected topic.
The following topics are treated in the
course:
The course is aimed at students with a basic knowledge of speech technology (the equivalent to a GSLT level 1 course in Speech technology). Basic programming skills are useful as well as knowledge of basic statistics and probability theory.
The lecture dates are preliminary and can be slightly changed according to the participants preferences. Fridays are tentatively selected. Please inform Mats if you think other lecture days would fit better. A third lecture day can be added in order to give the topics more detailed presentation. This decision will be taken in consultation with the students during the first lecure day. For example, the second occasion might be prolonged to two days.
Date |
Time |
Room |
Contents |
Thursday, March 29 |
10-12, 13-15
|
Fantum, TMH
Office rooms |
Introduction |
Friday, March 30 |
|
Fantum, TMH
|
HMM Training
and Adaptation
|
Friday, |
10-12, 13-15
15-17 |
Fantum, TMH
|
Speaker
Recognition Finite State
Transducers |
Friday, |
10-11 |
Fantum, TMH
|
Presentation solution to exercises Discussion |
The final lecture notes will be posted after each lecture.
Date |
Content |
April 18 |
Select topic for term paper |
May 2 |
Select two papers to review |
May
7 |
Mail exercise
solutions to teacher |
May
20 |
Mail draft
paper to reviewers |
May 25 |
Reviewers return comments to
author |
June
1 |
Mail final paper
to teacher and the reviewers |
The main course book is Huang, Acero and Hon
(2001): Spoken Language Processing (Prentice Hall, ISBN 0-13-022616-5).
The course will mainly cover chapters 3, 4, 5 (partly), 6, 8, 9, 10,
11, 12, and 13 (if time). Since the book doesn't include speaker
recognition, this literature will be in the form of selected papers.
A selection
of papers will be used as additional reading material for topics
not covered in the book.
Requirements
In order to pass the course the students must:
Practical assignments and exercises
A practical assignment will use the recognition software package HTK
and will
consist of building a simple recognition task and performing training
and evaluation.
Exercises on speech recognition problems will be presented during
the first lecture. These will also be downloadable from Exercises.
Solutions will be presented during the closing seminar.
During the course a term paper shall be prepared by each student
and be presented during the closing seminar. The paper shall be
reviewed by two fellow students. Choose a topic after discussion with
the teachers. This can be an idea of your own, related to your own work
or selected from the list below.
Topic suggestions:
Perform recognition experiments with HTK and report results
Limitations in
standard HMM and alternative approaches
Pronunciation variation
and its importance for speech recognition
Language models for
speech recognition
Search methods
Techniques for robust
recognition of speech
Confidence measures in speech
recognition
The role of prosody for speech recognition
Speaker recognition
More topics can be added during the course
Author |
Preliminary
Title |
Reviewer |
Reviewer |
Ansis Berzins |
|||
Maria Eskevich |
Pronunciation
variation and its importance for speech recognition |
Lisa |
Harald |
Vera Evdokimova |
Automatic
recognition of emotions and physical state of the speaker |
Maria |
Harald |
Lisa Gustavsson |
Creating an automatic model of speech imitation | Daniil |
Vera |
Harald Hammarström |
Machine Learning Experiments on Speech-to-Phoneme Classification using Cepstrum Coefficients | Jonas |
Andrejs |
Daniil Kocharov |
The use of
articulatory features for speech recognition |
Lisa |
Anton |
Jonas Lindh |
Automatic
Aligning of Swedish in Praat using HTK HVite Function |
Valentin |
Vera |
Anton Ragni |
Subword
Language Modelling Using Morphological Units Induced from Lexicon
Automata (Oral presentation on May 11) |
Daniil |
Valentin |
Valentin Smirnov |
Phonetic
Modelling in ASR (Russian speech) - the impact on performance |
Anton |
Jonas |
Andrejs Vasiljevs |
First
experiments on Latvian ASR with HTK toolkit |
Maria | ? |
Closing seminar
The closing seminar includes:
Mats Blomberg matsb@speech.kth.se
http://www.speech.kth.se/~matsb
Kjell
Elenius kjell@speech.kth.se
http://www.speech.kth.se/~kjell
Dept.
Speech, Music and Hearing, School of Computer Science and
Communication, KTH (Royal Institute of Technology)Lindstedtsvägen
24
SE-100 44 Stockholm, Sweden
How
to get to TMH and some travel
information
http://www.speech.kth.se/info/location.html
Accomodation
Hotel Arcadia is close to KTH, 10 min walking
distance from TMH, and offers lower price for KTH guests. The current
single room price is 773 SEK. Adress:
Körsbärsvägen 1, http://www.elite.se/eng/hotell/stockholm/arcadia/
.
A few other hotels in the vicinity of KTH are
Hotel Oden, Karlbergsv. 24, www.hoteloden.se
Hotel Brunnen, Surbrunnsg.
38, www.hotelbrunnen.se
Some low cost hotel and hostel alternatives are
Hostel Bed and Breakfast, Rehnsg. 21, www.hostelbedandbreakfast.com
Hostel Fridhemsplan Vandrarhem STF, S:t Eriksg. 20, www.fridhemsplan.se
Vanadis Hotell o. Bad www.vanadishotel.com