Figure. A screenshot showing the control window (upper left), the dialogue application (right), and the speech recognition module (bottom left). In this environment, students are presented with a simple spoken dialogue application for searching in the web-based Yellow pages on selected topics using speech, presently in the Swedish language. The system is initialized with knowledge about streets, restaurants, hotels, museums and similar services. Results are presented using combinations of speech synthesis, an interactive map and Netscape Navigator. This application is accompanied by a development environment which enables the students to interactively study and modify the innards of the system even when it is running. The main window shows an outline of the components of the system and how they interact. For each box the corresponding module window can be opened with a mouse click. Also, the complete dialogue application is launched from this window. There is no explicit building step involved, as all changes to the system are made incrementally. When the system runs, each box highlights when processing in the module takes place. Each module has its own control window, which dynamically updates to reflect the processing as it takes place. The database used in the system is the publicly available web based
version of the Yellow Pages. In
the database module of the dialog system it is possible to browse all 1384
different information categories of the Yellow Pages. The search result
can be edited and saved locally for faster and more secure future access.
The new category is added to the lexicon with automatic transcription by
a simple mouse click and a new recognizer lexicon is generated by a second
mouse click. The system is after this modification ready to accept questions
about the new category. In the lexicon module students add new words and
pronunciations which can be checked by listening to the speech synthesis
output. The lexicon is used together with a set of example sentences to
expand the speech recognizer in the recognizer module. In this module students
can also listen to the recording of what they said in the previous utterance
and view the 10 most probable sentences suggested by the recognizer. It
is possible to use different pruning parameters in the recognition search
to trade recognition accuracy for speed. Recognition output can also be
edited and sent further in the system to simulate the recognizer output.
Parsing of this output is either done by a statistical parser or by a simple
keyword spotter. The current system uses the simple keyword spotter to
extract the keywords that are used in the database query. The search result
is either displayed on an interactive map or as synthesized speech. In
the speech synthesis module the students can add templates for response
generation. These templates include some prosodic information, which the
students easily add by coloring the words with colors corresponding to
different stress levels. The system can randomly choose from alternative
responses.
Future developments
In the current environment, the emphasis has been to give students an understanding of technology integration rather than letting them build actual new systems themselves. Student projects, with focus on system building, will be a natural and very interesting development in our future courses. We have already used the dialogue environment inside a Netscape-browser using the tcl-plugin enhanced with our plugins for audio and speech recognition. This is the web-page, but so far it is only runable locally, since we have not made our recognition plugin shareware yet. Some work has been made to port the lab to the English language, mostly for demonstration purposes. One problem for the current system in the Yellow pages domain is the unpredictable pronunciation of Swedish street names by English speaking subjects. The system will be developed further as a part in the new Swedish research programme, Swedish Dialogue Systems, which also includes projects from the Universities in Linköping, Lund and Gothenburg. |
Publications:Joakim Gustafson, Patrik Elmberg, Rolf Carlson and Arne Jönsson "AN EDUCATIONAL DIALOGUE SYSTEM WITH A USER CONTROLLABLE DIALOGUE MANAGER", paper submitted to ICSLP98, (paper in html
Kåre Sjölander, Jonas Beskow, Joakim Gustafson, Erland Lewin, Rolf Carlson, and Björn Granström "WEB-BASED EDUCATIONAL TOOLS FOR SPEECH TECHNOLOGY", paper submitted to ICSLP98, (paper in html
Joakim Gustafson and Kåre Sjölander "Educational tools for speech technology",
paper to be published at the Swedish phonetcs conference Fonetik 98
(paper in HTML)
Rolf Carlson, Björn Granström, Joakim Gustafson,
Erland Levin and Kåre Sjölander "HANDS-ON SPEECH TECHNOLOGY
ON THE WEB", paper to be published at the conferance ELSNET in Wonderland
(paper - 7 pages, Postscript
2 Mb) HTML
version
Sjölander, K. & Gustafson, J. (1997): "An
Integrated System for Teaching Spoken Dialogue Systems Technology", Paper
submitted to Eurospeech '97. Abstract (paper
- 4 pages, Postscript, gzip 600 kb or HTML ) |