Developing intelligent multimedia applications
Paul Mc Kevitt, Visiting Professor

Center for PersonKommunikation (CPK), Aalborg University, DENMARK &
EPSRC Advanced Fellow, Department of Computer Science, University of Sheffield, ENGLAND


 

COURSE OUTLINE:
Intelligent MultiMedia   (IntelliMedia)  focusses  on   the   computer processing and understanding of signal and symbol  input from at least speech, text and visual images  in terms of semantic  representations. I  will  start  with some  background  and history   of   the field of IntelliMedia as seen from the viewpoints of Artificial Intelligence, Cognitive Science and Philosophy.

I will discuss a general suite  of tools in  the form of a software and hardware platform called  {\em CHAMELEON} that  can be tailored to conducting IntelliMedia in various application domains.  CHAMELEON has an open distributed processing architecture and currently includes ten agent  modules: blackboard,   dialogue manager,  domain model, gesture recogniser, laser system,  microphone array, speech recogniser, speech synthesiser,  natural  language  processor,   and a distributed  Topsy learner.   Most of the modules are   programmed in C   and C++ and are glued together  using the DACS  communications  system. In effect, the blackboard,  dialogue manager and DACS  form the  kernel of CHAMELEON. Modules can communicate with each other and the blackboard which keeps a record  of interactions over   time via semantic  representations in frames.  Inputs to CHAMELEON can  include synchronised spoken dialogue and images and outputs include  synchronised laser pointing and spoken dialogue.

An initial prototype application of  CHAMELEON is an {\em IntelliMedia WorkBench}  where a user   will be able  to  ask for information about things   (e.g.  2D/3D models, pictures,  objects,  gadgets, people, or whatever) on  a physical table.  The  current domain is  a {\em Campus Information System} for 2D  building plans which  provides information about tenants, rooms and routes and  can answer questions like ``Whose office  is this?''   and ``Show  me the  route  from Paul  Mc Kevitt's office to Paul Dalsgaard's office.'' in real  time.  CHAMELEON and the IntelliMedia WorkBench  are  ideal for  testing integrated  signal and symbol  processing   of  language and    vision   for  the  future  of SuperinformationhighwayS.

The course will be accompanied by videotapes of the CHAMELEON and other sample systems in action.
 

COURSE DETAILS:
MiniModule-I: Introduction
Intelligent MultiMedia   (IntelliMedia)  focusses  on   the   computer processing and understanding of signal and symbol  input from at least speech, text and visual images  in terms of semantic  representations.
I  will  start  with some  background  and history   of   the field of IntelliMedia as seen from the viewpoints of Artificial Intelligence, Cognitive Science and Philosophy.
Readings:
Rohini Srihari (1994) ``Computational models for integrating linguistic and visual information: a survey''
Mark Maybury (1995) `Research in multimedia and multimodal parsing and generation''
Oliviero Stock (1995) ``A third modality of natural language?''
 

MiniModule-II: IntelliMedia 2000+/ CHAMELEON/ IntelliMedia Workbench
Here I will give an introduction to the IntelliMedia 2000+ research and education programme at Aalborg University, Denmark and introduce the CHAMELEON, the IntelliMedia Workbench application and alternative applications.
Readings:
papers on Chameleon and IntelliMedia 2000+
Other Readings:
Naoyuki Okada (1996) ``Integrating vision, motion and language through mind''
Alex Waibel et al. (1996) ``Multimodal interfaces''
 

MiniModule-III: Blackboard frame semantics
Here I will present the blackboard frame semantics for CHAMELEON including (1) input, (2) output and (3) integration frames and show complete runs of blackboard semantics for ``Point to Hanne's office'' and ``Whose office is this?''
Readings:
Mc Kevitt, P. and P. Dalsgaard (1997) ``A frame semantics for an IntelliMedia TourGuide'' Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97), Volume 1, University of Ulster, Northern Ireland, September, 104-111.
 

MiniModule-IV: CHAMELEON meets spatial cognition
Here we look at how the frame semantics looks in the case of spatial relations with endophoric reference and show complete runs of the blackboard semantics for "Who's in the office beside him?"
Readings:
Mc Kevitt, P. (1998) "CHAMELEON meets spatial cognition", Proceedings of MIND-III: The Annual Conference of the Cognitive Science Society of Ireland, Theme: Spatial Cognition, Mary Hegarty and Sean O Nuallain (Eds.), Part II, 70-87. Dublin City University (DCU), Dublin, Ireland, August.

MiniModule-V: Philosophical and alternative implications of CHAMELEON
We look at the Searle's Chinese Room Problem and how it can be solved with the Irish and Danish rooms, relations to Harnad's symbol grounding problem, the circular dictionary problem and Wittgenstein's duck-rabbit.
Readings:
Mc Kevitt, P. and Guo Chengming (1996) ``From Chinese rooms to Irish rooms: new words on visions for language'' In ``Two sciences of  mind: readings in cognitive  science and consciousness, Advances in Consciousness Research'' (AiCR 9), Mc Kevitt, P., Sean O Nuallain and Eoghan Mac Aogain (1997) (Eds.), US: John Benjamins, Also in  Mc Kevitt, Paul (Ed.) (1995/1996) Integration of Natural Language and Vision Processing (Vols. I-IV), Dordrecht, The Netherlands: Kluwer-Academic Publishers.
Mc Kevitt, P. (1997) ``From Irish Rooms to Danish Rooms'' Proceedings of Workshop on Thought and Language, Naoyuki Okada (Ed.), Kyushu Institute of Technology (KIT), Iizuka, Japan, September, 54-62.
Mc Kevitt, P. (1997) ``SuperinformationhighwayS'' In Sprog og Multimedier (Speech and Multimedia), Proc. of the Sixth Annual Meeting of the Danish Linguistics Society (DAnsk DAtalingvistik Forening, DAlF), Aalborg University, Denmark, May 1996, Tom Brondsted and Inger Lytje (Eds.), 167-184, April 1997, Aalborg, Denmark: Aalborg Universitetsforlag (Aalborg University Press).
Other Readings:
Diego Marconi (1996) ``Work on the integration of language and vision at the university of Torino''
Diego Marconi (1996) ``On the referential competence of some machines''
Yorick Wilks (1996) ``Language, vision and metaphor''
Roger Schank and Andrew Fano (1996) ``Ongoing research projects at the Institute for the Learning Sciences''
Roger Schank and Andrew Fano (1996) ``Memory and expectations in learning, language, and visual understanding''
 

NOTES:
The other readings above are available in Mc Kevitt, Paul (Ed.) (1995/1996) Integration of Natural Language and Vision Processing (Vols. I-IV),
Dordrecht, The Netherlands: Kluwer-Academic Publishers.

Of particular importance is the large Technical Report on CHAMELEON:
Broendsted, T., P. Dalsgaard, L.B. Larsen, M. Manthey, P. Mc Kevitt, T.B. Moeslund and K.G. Olesen (1998) ``A platform for developing Intelligent MultiMedia applications''
Technical Report R-98-1004, Center for PersonKommunikation (CPK), Aalborg University, Denmark, May.

Students should also look at:

More readings

"Interactive Dialogue in MultiModal Systems", Kloster Irsee, Germany, June 22-25/1999

"Language, Vision & Music", Galway, Ireland, August 9-11/1999

BIO:
Paul Mc Kevitt is 35 and from D/un Na nGall  (Donegal), IRELAND on the Northwest  of the   EU.  He  is  a  Visiting Professor of  Intelligent MultiMedia Computing at  Aalborg   University, Denmark and   a British EPSRC (Engineering  and  Physical Sciences Research  Council) Advanced Fellow in  the Department  of Computer Science   at the University  of Sheffield, Sheffield, England.  The Fellowship, commenced in 1994, and releases him  from his Associate Professorship  (tenured Lecturership) for  5 years   to  conduct full-time  research  on  the integration of natural language,  speech   and vision  processing.   He is  currently pursuing   a  Master's degree    in  Education  at the  University  of Sheffield.    He  completed his  Ph.D.  in    Computer Science at  the University of Exeter, England in 1991. His Master's degree in Computer Science was obtained from New Mexico  State University, New Mexico, US in 1988 and his Bachelor's  degree in Computer Science from University College Dublin  (UCD), Dublin, Ireland  in 1985.  His primary research interests are in    Natural Language Processing   (NLP) including  the processing  of pragmatics, beliefs  and intentions in dialogue.  He is also interested in Philosophy,   MultiMedia  and the general area   of Artificial Intelligence.
 

CONTACT DETAILS:
Paul Mc Kevitt
Center for PersonKommunikation (CPK)
Fredrik Bajers Vej 7-A5
Institute of Electronic Systems (IES)
Aalborg University
DK- 9220, Aalborg
DENMARK

E-mail: pmck@cpk.auc.dk
WWW:    http://www.kom.auc.dk/~pmck/
Phone:
        (+45) 96 35 86 56  (Office)
        (+45) 96 35 86 40  (CPK Secretary)
        (+45) 96 35 80 80  (Aalborg main switch)
FaX:    (+45) 98 15 15 83