A multimodal spoken dialogue system for browsing apartments on the Stockholm real estate market
Introduction
The AdApt project, which ran in the years 1998 to 2002, had as its goal to be the foundation for the development and evaluation of advanced multimodala spoken dialogue systems. Within the project, a spoken dialogue system where a user could cooperate with an animated talking agent to solve more complex problems than what had previously been achieved in our systems.
The project was a joint project between TMH and Telia Research within the framework of CTT (the Centre for Speech Technology), and engaged, amongst others, the following researchers:
- TMH/CTT
- Linda Bell (currently TeliaSonera), Jonas Beskow, Rolf Carlson, Jens Edlund, Joakim Gustafsson (currently TeliaSonera), Anna Hjalmarsson, Magnus Nordstrand
- Telia Research
- Johan Boye, Mats Wirén
The domain chosen was one where multimodal communication matters, and one that engaged a broad audience: the real estate market in downtown Stockholm.
In the Adapt system, the compuiterperson plays a part close to that of the real estate broker: to help people find apartments, to describe apartments, to reply to and answer questions, and to lend support by finding information in apartment ads.
The information used by the system comes from authentic apartment ads published on the internet.
In addition to spoken input, the user can provide information by clicking or marking areas on an interactive map over downtown Stockholm. The system output consists of a talking, animated head and graphical animated icons on the interactive map. The system is also capable of presenting information as text in the form of tables, although this is rarely used. There is a lot more info on the animated talking head on our
multimodal speech synthesis pages.
This is a screen dump from an early version of the system interface. The image contains Urban, the interactive map, icons, and a table.
There are a couple of films featuring the animated agent Urban below:
The Adapt project was features in the exhibition
Fritt Fram which ran from 2000 to 2001 in
Tekniska Museet in Stockholm, together with several of the TMH/CTT talking heads.
The picture shows the Talking Heads we displayed at Fritt Fram.
Background
AdApt was a natural continuation of previous dialogue projects at TMH and CTT, in that it placed higher and different demands on the technologies. It used live sophisticated information extraction (interpretation of open decsriptions of apartments in internet apartment ads), storage and structuring of information (all highly generalised; seamless conversion between formats, e.g. Prolog facts and XML structures), semantic representation of information and utterances, and generation of utterances where facial expressions, stress, etc. is tantamount for the interpretation. The system also features a dialogue manager that can handle a number of misunderstandings in a sensible manner. Notable predecessors include the August and Waxholm systems. For an overview, see
Joakim Gustafson's doctoral thesis.
Future
Many of the participants in the AdApt project are active in new spoken dialogue system projects. At TeliaSonera (former Telia Research), the AdApt researchers have completed the Pixie and Nice projects, and at TMH/CTT, the
Higgins project is an immediate successor of AdApt and our research in the
CHIL project draws heavily on experiences from AdApt.
Comments?
Contact Jens (edlund@speech.kth.se).
Bibliography
- Beskow J, Edlund J G & Nordstrand M (2005): A model for multi-modal dialogue system output applied to an animated talking head. In Minker, W., Bühler, D. and Dybkjaer, L. (eds) Spoken Multimodal Human-Computer Dialogue in Mobile Environments, Text, Speech and Language Technology , Vol. 29, Dordrecht, The Netherlands, Kluwer Academic Publishers. [pdf]
- Beskow J, Edlund J G & Nordstrand M (2002): Specification and Realisation of Multimodal Output in Dialogue Systems, Proc of ICSLP 2002, 181-184. [pdf] [ps]
- Edlund J, Beskow J G & Nordstrand M (2002): GESOM - A Model for Describing and generating Multi-modal Output, Proc of ISCA Workshop Multi-Modal Dialogue in Mobile Environments. [pdf] [ps]
- Jens Edlund G & Magnus Nordstrand (2002): Turn-taking Gestures and Hour-Glasses in a Multi-modal Dialogue System. Proc of ISCA Workshop Multi-Modal Dialogue in Mobile Environments. [pdf] [ps]
- Gustafson J, Bell L, Boye J, Edlund J & Wiren M (2002): Constraint Manipulation And Visualization In A Multimodal Dialogue System. Proc of the ISCA Workshop Multi-Modal Dialogue in Mobile Environments [pdf]
- Boye, J. & Wirén, M. (2003): Robust Parsing of Utterances in Negotiative Dialogue, Proc. Eurospeech 2003. [pdf]
- Boye, J. & Wirén, M. (2003): Negotiative Spoken-Dialogue Interfaces to Databases, Proc. 7th workshop on the semantics and pragmatics of dialogue. [pdf]
- Hjalmarsson, A. (2002): Evaluating AdApt, a multi-modal conversational dialogue system using PARADISE, Master Thesis, KTH, Stockholm, Sweden. [pdf]
- Bell, L., Boye, J., & Gustafson, J (2001): Real-time Handling of Fragmented Utterances, in Proceedings of the NAACL Workshop on Adaption in Dialogue Systems, Pittsburgh, PA, June 2001 [pdf]
- Bell, L., Boye, J., Gustafson, J., & Wirén, M. (2000). Modality Convergence in a Multimodal Dialogue System. Proc of Götalog 2000, Fourth Workshop on the Semantics and Pragmatics of Dialogue, 29-34. [ps]
- Bell, L. & Gustafson, J. (2000). Positive and negative user feedback in a spoken dialogue corpus. Proc of ICSLP 2000. [ps]
- Bell, L., Eklund, R., & Gustafson, J. (2000). A comparison of disfluency distribution in a unimodal and a multimodal speech interface. Proc of ICSLP 2000. [ps]
- Gustafson J, Bell L, Beskow J, Boye J, Carlson R, Edlund J, Granström B, House D & Wirén M (2000): AdApt - a multimodal onversational dialogue system in an apartment domain, Proc of ICSLP 2000, 2:134-137. [html] [ps]