Seminar at Speech, Music and Hearing:

X-job seminar:

Audiovisual detection of pronunciation errors

Sebastian Picard

Opponent: Chris Koniaris


As part of the project Computer-Animated LAnguage TEAchers, this thesis contributes to the creation of a framework for automatic detection of mispronunciation to be used in a Computer Assisted Language Learning (CALL) system. The aim is to provide users with informative feedback about their pronunciation.

This work relies on the use of time-normalized Discrete Cosine Tranforms (DCTs) to extract audiovisual features so as to process vowels of different duration and generate visual feature vectors of constant length. A combination of filter and wrapper feature selection methods was employed to combine both modalities and demonstrated a great ability to reduce the features to a subset suitable for classification. Support Vector Machines (SVMs) were used as classifiers and enabled the use of a sparse dataset. We concluded that the addition of visual cues contributed to improving the performance of the classifiers. We achieved 95 to 100% correct recognition rate for each pairwise classifier.

15:15 - 17:00
Tuesday March 9, 2010

The seminar is held in Fantum.

