Multimodality in language and speech systems - from theory to design support tool.
Niels Ole Bernsen, Odense, Denmark
Increasingly, speech input and speech output is being used in combination
with other modalities for the representation and exchange of information
with, or mediated by, computer systems. Therefore, a growing number of
developers of systems and interfaces are faced with the question of whether
or not to use speech input and/or speech output in multimodal combinations
for the applications they are about to build. What the developers are
facing is the speech functionality problem.
The speech functionality problem is the question of what speech is good
bad for, or under which conditions to use, or not to use, speech for
information representation and exchange - either speech alone or in
combination with other modalities. This is a hard problem because of the
complexity involved. There are several speech modalities, such as keywords
and full-blown, unrestricted discourse; there is speech as input and speech
as output which are not necessarily useful for the same purposes; there are
scores of non-speech modalities with which speech might conceivably be
combined; and the success of a particular modality choice is subject to an
unlimited number of design context variables, including task type (e.g.
text annotation), communicative act (e.g. alarm), user group (e.g. expert
typists), work environment (e.g. noisy), system type (e.g. digital
roadmap), performance parameters (e.g. more efficient), learning parameters
(e.g. learning overhead), and cognitive properties (e.g. attention load).
It would seem unlikely that empirical studies will suffice in telling
system developers what they need to know in a timely fashion in order to
avoid user dissatisfaction or poor system performance due to erroneous
choices of modality combinations.
Modality Theory is a general theory of the properties of unimodal
modalities in the media of graphics, acoustics and haptics. Two large-scale
data studies have demonstrated that a limited set of modality properties
have the power to justify or support most claims about speech functionality
made in the literature. The lectures will (1) present Modality Theory; (2)
show how the theory is applied to the speech functionality problem through
'information mapping'; and (3) discuss and demonstrate a web-based speech
functionality tool aimed at supporting developers faced with issues of
Bernsen, N.O.: Defining a taxonomy of output modalities from an HCI
perspective. Computer Standards and Interfaces, Special Double Issue, 18,
6-7, 1997, 537-553.
Bernsen, N. O.: Towards a tool for predicting speech functionality.
Communication 23, 1997, 181-210.
Bernsen, N. O. and Dybkjær, L.: Is speech the right thing for
application? Proceedings of ICSLP'98. Sydney: Australian Speech Science and
Technology Association 1998, 3209-3212.
Bernsen, N. O., Dybkjær, H. and Dybkjær, L.: Designing Interactive
Systems. From First Ideas to User Testing. Springer Verlag 1998.
Niels Ole Bernsen, Prof., Dr.
The Maersk Mc-Kinney Moller Institute for Production Technology
Science Park 10
5230 Odense M
Tel. ( +45) 65 57 35 44 (direct)
Tel. ( +45) 66 15 86 00 (switchboard)
Fax (+45) 63 15 72 24