Predicting visual intelligibility gain in the Synface application
Opponent: Kristian Ronge
When the acoustic signal is weak or disturbed, normal hearing people as well as hearing impaired make use of visual information as a support for their interpretation of the speech. Thus, a talking head could be of great support for hearing impaired people when engaged in speech activities where no visual information is available, such as a telephone conversation or when listening to an audio book. SynFace, a talking head developed at the Royal Institute of Technology, is an application where the lipmovements are driven by the acoustic signal and synchronized with the speech. Thus, it could be a valuable support since it contributes with sufficient visual information. The purpose of this thesis project is to find an error metric that could be used to predict the visual intelligibility gain when using the Synface application. The idea is to base the metric on confusion matrixes for Swedish phonemes. An experiment – including both a synthetic and a natural face – where the subjects watched silent movie clips containing nonsense words, was carried out to calculate the confusion matrixes for consonants and vowels respectively. The error metric was calculated using frame-by-frame comparison, where the data from the confusion matrixes were used as weights for the phoneme recognition errors. The calculated error metrics were then mapped to intelligibility in regard to SRT (speech reception threshold) reduction. A linear relationship between the error metrics and the SRT levels was found.