Methods for automatic speech understanding

Main responsibility for the research area:
TeknD Kjell Elenius

Within international speech technology research, automatic speech understanding has attracted a substantial amount of resources. The focus has shifted increasingly from automatic recognition to the general problem of understanding what was said. However, unsolved fundamental problems still exist, even concerning the primary acoustic analysis of the speech signal. For instance, hearing-based models allow us to exploit knowledge about the biological system. This knowledge is, however, incomplete, and the complexity of such models requires a significant amount of computing power.

A number of alternative methods have been used over the years, from knowledge-based systems to Markov models and artificial neural networks, of which the latter two statistically oriented techniques have been most successful and still remain the focus of speech technology research. Speech recognition may alternatively be seen as a very challenging search problem - the search for probable utterances, based on the initial analysis, constitutes a fundamental problem. From a practical perspective, it is impossible to evaluate all possibilities. Thus, search methods constitute an important subarea of research. Besides the signal-governed processes, the linguistic analysis must be integrated optimally.

Increased robustness towards environmental and channel noise, interruptions and deviant speech is of great importance for real applications and will be prioritized in CTT. Results from speaker characterization will be used for quick adaptation to new speakers. Moreover, lexicon representation and search methods will be streamlined to handle large vocabularies.

Important Research Topics for Stage 3

  • development of robust speech recognition that uses a relatively restricted vocabulary and is applicable in such noisy environments as are expected within the chosen domain
  • speaker-independent recognition of large vocabularies (exceeding 10,000 words)

Published by: TMH, Speech, Music and Hearing

Last updated: 2006-12-05