Auditory models and speech recognition: a path to build robust front-ends
What is the impact of a small change in speech on the perceptually relevant space? How can robustness in speech recognition be privileged by models of the human auditory periphery without increasing computational complexity? In this talk, a general method is discussed that has been used in front-ends for speech recognition and possible answers to these questions are given. The method relies on maximizing the similarity of the Euclidean geometry of the feature set and the human auditory representation of the signal in order to select or optimize acoustic features. Two different psycho-acoustic auditory masking models are considered, yet without being directly involved in the front-end. The results show that the robustness of a speech recognizer in environmental noise increases.