A multi-tier theoretical framework for understanding spoken language
Steven Greenberg, ICSI, Berkeley, now at Technical University of Denmark
Spoken language is often viewed merely as sequences of words and phonemes. The listener's task is one of decoding the speech signal into its constituent elements derived from spectral decomposition of the acoustic signal. However, under acoustic interference, spectral decomposition is particularly challenging. Future-generation speech separation methods are likely to utilize a more comprehensive set of representational approaches than merely decoding words and phonemes. This presentation outlines a multi-tier theory of spoken language in which utterances are composed not only of words and phones, but also syllables, articulatory-acoustic features and (most importantly) prosemes, encapsulating the prosodic pattern in terms of prominence and accent. This multi-tier framework portrays pronunciation variation and the phonetic micro-structure of the utterance with far greater precision than the conventional lexico-phonetic approach, and thereby offers the prospect of improving machine-based recognition and separation systems in the future.