A System for Teaching Spoken Dialogue Systems Technology
The aim of this work has been to put a fully functioning spoken dialogue system into the hands of the students as an instructional aid. They can test it themselves and are able to examine the system in detail. They are shown how to extend and develop the functionality. In this way, we hope to increase their understanding of the problems and issues involved and to spur their interest for this technology and its possibilities.  The TMH speech toolkit, including a broker system with distributed servers, has been used to create an integrated lab environment that can be used on Unix machines. The system has been used in the courses on spoken language technology given at Masters level at the Royal Institute of Technology (KTH), at Linköping University and at Uppsala University in Sweden.

Figure. A screenshot showing the control window (upper left), the dialogue application (right), and the speech recognition module (bottom left).  

In this environment, students are presented with a simple spoken dialogue application for searching in the web-based Yellow pages on selected topics using speech, presently in the Swedish language. The system is initialized with knowledge about streets, restaurants, hotels, museums and similar services. Results are presented using combinations of speech synthesis, an interactive map and Netscape Navigator. This application is accompanied by a development environment which enables the students to interactively study and modify the innards of the system even when it is running. The main window shows an outline of the components of the system and how they interact. For each box the corresponding module window can be opened with a mouse click. Also, the complete dialogue application is launched from this window. There is no explicit building step involved, as all changes to the system are made incrementally. When the system runs, each box highlights when processing in the module takes place. Each module has its own control window, which dynamically updates to reflect the processing as it takes place. 

The database used in the system is the publicly available web based version of the Yellow Pages. In the database module of the dialog system it is possible to browse all 1384 different information categories of the Yellow Pages. The search result can be edited and saved locally for faster and more secure future access. The new category is added to the lexicon with automatic transcription by a simple mouse click and a new recognizer lexicon is generated by a second mouse click. The system is after this modification ready to accept questions about the new category. In the lexicon module students add new words and pronunciations which can be checked by listening to the speech synthesis output. The lexicon is used together with a set of example sentences to expand the speech recognizer in the recognizer module. In this module students can also listen to the recording of what they said in the previous utterance and view the 10 most probable sentences suggested by the recognizer. It is possible to use different pruning parameters in the recognition search to trade recognition accuracy for speed. Recognition output can also be edited and sent further in the system to simulate the recognizer output. Parsing of this output is either done by a statistical parser or by a simple keyword spotter. The current system uses the simple keyword spotter to extract the keywords that are used in the database query. The search result is either displayed on an interactive map or as synthesized speech. In the speech synthesis module the students can add templates for response generation. These templates include some prosodic information, which the students easily add by coloring the words with colors corresponding to different stress levels. The system can randomly choose from alternative responses. 

The exercises 
This year (spring 1998) the instructional environment has been used in five different courses by four different departments at three universities in Sweden. These were followed by a total of 150 last-year Masters students. The students worked in groups of two and were given a list of modifications to apply to the system. 
  • The first task was to use the dialogue application in order to determine its capabilities and limitations.
  • The next task was to test the speech recognition module stand alone, with the explicit purpose that they should gain some insight into the limitations of current HMM based speech recognition technology. For example, regarding noise, speaking style and out of vocabulary words.
  • The main assignment was to add new fields from the Yellow Pages, new streets names from the map; and new words or phrases to the system. All new words in the lexicon had to be labeled with appropriate syntactic and semantic tags, and correct transcriptions.
  • They had to extend the example based grammar with new constructs.
  • In the text generation module, they had to insert additional response templates to handle the new facilities. This included experimenting with different prosodic patterns in the sentences.
  • Finally, the extended system had to be demonstrated to show that it worked according to the specification.
Overall, the students were very satisfied with the system and they rated it four on a five point scale in the course evaluation. The main criticism was that they wanted to be able to make greater changes to the system and to go deeper into some of its modules. We believe that the lab environment, together with the underlying toolkit, is an important aid in giving students an understanding of spoken language technology. 

Future developments 

Our main focus concerning the continued development of the dialogue environment is to integrate a real dialogue manager into the system. We are engaged in a joint research project together with the Natural Language Processing Laboratory, NLPLAB, at the University of Linköping, which aims at integrating the highly flexible dialogue manager into the system. In this new module, a dialogue grammar based on speech act information is used together with a dialogue tree that handles focus structure.  

In the current environment, the emphasis has been to give students an understanding of technology integration rather than letting them build actual new systems themselves. Student projects, with focus on system building, will be a natural and very interesting development in our future courses. 

We have already used the dialogue environment inside a Netscape-browser using the tcl-plugin enhanced with our plugins for audio and speech recognition. This is the web-page, but so far it is only runable locally, since we have not made our recognition plugin shareware yet.  

Some work has been made to port the lab to the English language, mostly for demonstration purposes. One problem for the current system in the Yellow pages domain is the unpredictable pronunciation of Swedish street names by English speaking subjects. 

The system will be developed further as a part in the new Swedish research programme, Swedish Dialogue Systems, which also includes projects from the Universities in Linköping, Lund and Gothenburg.  


Joakim Gustafson, Patrik Elmberg, Rolf Carlson and Arne Jönsson "AN EDUCATIONAL DIALOGUE SYSTEM WITH A USER CONTROLLABLE DIALOGUE MANAGER", paper submitted to ICSLP98, (paper in html

Kåre Sjölander, Jonas Beskow, Joakim Gustafson, Erland Lewin, Rolf Carlson, and Björn Granström "WEB-BASED EDUCATIONAL TOOLS FOR SPEECH TECHNOLOGY", paper submitted to ICSLP98, (paper in html

Joakim Gustafson and Kåre Sjölander "Educational tools for speech technology", paper to be published at the Swedish phonetcs conference Fonetik 98 (paper in HTML)

Rolf Carlson, Björn Granström, Joakim Gustafson, Erland Levin and Kåre Sjölander "HANDS-ON SPEECH TECHNOLOGY ON THE WEB", paper to be published at the conferance ELSNET in Wonderland (paper - 7 pages, Postscript 2 Mb) HTML version

Sjölander, K. & Gustafson, J. (1997): "An Integrated System for Teaching Spoken Dialogue Systems Technology", Paper submitted to Eurospeech '97. Abstract (paper - 4 pages, Postscript, gzip 600 kb or HTML )