The class will be jointly organized by
Rolf Carlson rolf-at-speech.kth.se http://www.speech.kth.se/~rolf/
Dept. Speech, Music and Hearing, KTH,
Torbjørn Svendsen torbjorn-at-iet.ntnu.no http://www.tele.ntnu.no/users/svendsen/
Dept. of Electronics and Telecommunications, NTNU, Trondheim, Norway
Teachers:
Björn Granström
bjorn-at-speech.kth.se http://www.speech.kth.se/~bjorn
David House davidh-at-speech.kth.se http://www.speech.kth.se/~davidh
Rolf Carlson rolf-at-speech.kth.se http://www.speech.kth.se/~rolf/
Torbjørn Svendsen torbjorn-at-iet.ntnu.no http://www.tele.ntnu.no/users/svendsen/
The start of the class will be in Göteborg September 14, 2004 followed by one meeting in Göteborg November 1 – 2 and one meeting in Stockholm in January 19-21, 2005.
The aim of this course is to give an overview of speech technology, some of the underlying theories and models and how these are integrated into applications, such as multimodal dialog systems.
The course is intended for both students with a limited knowledge of the field and for students with a more extensive background in speech technology, who will be expected to take a more active part in the discussion of current research. In this way, the course is meant to contribute to the common platform for students with different backgrounds in the Nordic graduate school of language technology supported by NorFA.
The course is divided into 5 parts:
Introductory lectures; Reading the listed material; Individual practical exercises; Preparing a term paper; and a Closing seminar including discussions, practical exercises and presentation of the term papers.
Introductory lectures will be held in September and November. in Göteborg. These lectures will give an overview of the field with an emphasis on basic concepts and standard methods.
Individual practical exercises will include speech analysis and some other specific tasks related to speech technology. The results should be reported and discussed during the fall period.
During the course a term paper should be prepared by each student. The paper should be presented during the closing seminar in January (Stockholm). The closing seminar includes: Exercises, Presentation of term papers, Discussion of the reading material and the term papers.
Introductory lecture slides will be linked to each topic.
Date |
Time |
Content |
Teacher |
14/9 |
8.00-10.00 |
Introduction |
Rolf Carlson, |
David House |
|||
14/9 |
10.00-12.00 |
Björn Granström |
|
1/11 |
10.00 – 12.00 |
Speech Recognition |
Torbjørn Svendsen |
2/11 |
09.00 – 11.00 |
Dialog
systems |
Rolf Carlson |
January |
Wednesday 19 13.00 |
Closing seminar |
All teachers |
Phonetic analysis: Each
student should carry out an acoustic investigation of their own speech. This
exercise will make the student familiar with speech analysis and the basic
structure of speech sounds. The results should be summarised and discussed by all
students. More information can be found on http://www.speech.kth.se/~rolf/NGSLT/SpeechTech1phonlab.html
Synthesis: Each student
should make an attempt to make a synthesis system based on waveform
concatenation. The results should be summarised and discussed by all students.
More information can be found on
http://www.speech.kth.se/~rolf/NGSLT/SpeechTech1synthlab.html
Link to the OVE 1
synthesizer: http://www.speech.kth.se/wavesurfer/formant/
Speech
Recognition: Speech recognition: Each student should make an attempt to
construct a simple speech recognizer based on Hidden Markov models using
available software and data. The results should be summarised and discussed by
all students.
Some
exercises may be too easy for students experienced in the specific area. In
this case a more advanced subject will be specified together with the teacher.
During the closing seminar additional obligatory exercises will be included.
During the course a term paper should be prepared by each student and reviewed by two other students. The paper should be presented during the closing seminar.
Papers from the last classes can be found in the old Closing seminar links bellow.
The closing seminar includes:
· Exercises
· Presentation of term papers
· Discussion of the reading material and the term papers.
Acoustic and Auditory Phonetics, Keith
Johnson, ISBN# 0-631-20094-0 (a second edition is also available)
An Introduction to Text-To-Speech Synthesis,
Thierry Dutoit, ISBN# 0-7923-7923-4498-7
Holmes, John and Wendy Holmes (2001 2nd): Speech Synthesis and Recognition, London: Taylor & Francis, ISBN 0-7484-0856-8 (hardback), ISBN 0-7484-0857-6 (paperback)
Michael F McTear
(2002) Spoken dialogue technology: enabling the conversational interface. ACM
Computing Surveys, Volume 34 , Issue 1 (March 2002), pp. 90 - 169.
http://www.infj.ulst.ac.uk/~cbdg23/interests.html
A selection of papers and other publications will be used as additional reading material for each subtopic.
In order to pass the course the students must: Complete the practical exercises; Prepare and present the term paper; Review two term papers; Participate actively in the discussions in the closing seminar.
Information on how to apply and course requirements can be found on http://ngslt.org/application/ or follow the links from http://ngslt.org/ . Deadline for application is one month before the start of the course.
Last updated: June 21, 2005