Higgins

Publications





The Publications section contains publications made within the project, as well as those made in related areas at CTT. Our ambitions is to list each article with a very brief abstract, but in some cases, expect to see nothing but a bibliographic reference and a link. In its present state, the articles are listed in one large, chronologically ordered, list.

For other publications on similar subjects, we recommend that you search the NEC Research Institute CiteSeer.

Please take a moment to read the disclaimer and the copyright notice

Conference & workshop articles

2006

Heldner M, Carlson R & Edlund J (2006):
Interruption impossible. [pdf - final draft] In Proceedings of Nordic Prosody IX.
ABSTRACT:
Most current work on spoken human-computer interaction has so far concentrated on interactions between a single user and a dialogue system. The advent of ideas of the computer or dialogue system as a conversational partner in a group of humans, for example within the CHIL-project1 and elsewhere (e.g. Kirchhoff & Ostendorf, 2003), introduces new requirements on the capabilities of the dialogue system. Among other things, the computer as a participant in a multi-part conversation has to appreciate the human turn-taking system, in order to time its own interjections appropriately. As the role of a conversational computer is likely to be to support human collaboration, rather than to guide or control it, it is particularly important that it does not interrupt or disturb the human participants. The ultimate goal of the work presented here is to predict suitable places for turn-takings, as well as positions where it is impossible for a conversational computer to interrupt without irritating the human interlocutors.
Skantze, G. Edlund, J., & Carlson R (forthcoming):
Talking with Higgins: challenges in a spoken dialogue system for pedestrian city navigation. Proc of Tutorial and Research Workshop on Perception and Interactive Technologies (PIT06), Kloster Irsee, Germany
ABSTRACT:
This paper presents the current status of the research in the Higgins project and provides background for a demonstration of the spoken dialogue system implemented within the project. The project represents the latest development in the ongoing dialogue systems research at KTH. The practical goal of the project is to build collaborative conversational dialogue systems in which research issues such as error handling techniques can be tested empirically.
Wallers, Å., Edlund, J., & Skantze, G. (forthcoming):
The effect of prosodic features on the interpretation of synthesised backchannels. Proc of Tutorial and Research Workshop on Perception and Interactive Technologies (PIT06), Kloster Irsee, Germany
ABSTRACT:
A study of the interpretation of prosodic features in backchannels (Swedish /a/ and /m/) produced by speech synthesis is presented. The study is part of work-in-progress towards endowing conversational spoken dialogue systems with the ability to produce and use backchannels and other feedback.

2005

Edlund, J & Heldner, M (2005):
Exploring Prosody in Interaction Control. [pdf - final draft] Phonetica, 62(2-4), 215-226.
ABSTRACT:
This paper investigates prosodic aspects of turn-taking in conversation with a view to improving the efficiency of identifying relevant places at which a machine can legitimately begin to talk to a human interlocutor. It examines the relationship between interaction control, the communicative function of which is to regulate the flow of information between interlocutors, and its phonetic manifestation. Specifically, the listener’s perception of such interaction control phenomena is modelled. Algorithms for automatic online extraction of prosodic phenomena liable to be relevant for interaction control, such as silent pauses and intonation patterns, are presented and evaluated in experiments using Swedish Map Task data. We show that the automatically extracted prosodic features can be used to avoid many of the places where current dialogue systems run the risk of interrupting their users, and also to identify suitable places to take the turn.
Edlund, J, Heldner, M, & Gustafson, J (2005):
¨
Utterance segmentation and turn-taking in spoken dialogue systems. [pdf] In Fisseni, B, et.al. (eds) Computer Studies in Language and Speech, Vol. 8, pp. 576-587, Frankfurt am Main, Germany, Peter Lang.
ABSTRACT
A widely used method for finding places to take turn in spoken dialogue systems is to assume that an utterance ends where the user ceases to speak. Such endpoint detection normally triggers on a certain amount of silence, or non-speech. However, spontaneous speech frequently contains silent pauses inside sentencelike units, for example when the speaker hesitates. This paper presents /nailon/, an on-line, real-time prosodic analysis tool, and a number of experiments in which end-point detection has been augmented with prosodic analysis in order to segment the speech signal into what humans intuitively perceive as utterance-like units.
Edlund, J & Hjalmarsson, A (2005):
Applications of Distributed Dialogue Systems: the KTH Connector.[pdf] In Proceedings of ISCA Tutorial and Research Workshop on Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005) Aalborg, Denmark.
ABSTRACT:
We describe a spoken dialogue system domain: that of the personal secretary. This domain allows us to capitalise on the characteristics that make speech a unique interface; characteristics that humans use regularly, implicitly, and with remarkable ease. We present a prototype system - the KTH Connector - and highlight several dialogue research issues arising in the domain.
Edlund, J, House, D, & Skantze, G (2005):
The Effects of Prosodic Features on the Interpretation of Clarification Ellipses. [pdf] In Proceedings of Interspeech 2005, Lisbon, Portugal.
ABSTRACT:
In this paper, the effects of prosodic features on the interpretation of elliptical clarification requests in dialogue are studied. An experiment is presented where subjects were asked to listen to short human-computer dialogue fragments in Swedish, where a synthetic voice was making an elliptical clarification after a user turn. The prosodic features of the synthetic voice were systematically varied, and the subjects were asked to judge what was actually intended by the computer. The results show that an early low F0 peak signals acceptance, that a late high peak is perceived as a request for clarification of what was said, and that a mid high peak is perceived as a request for clarification of the meaning of what was said. The study can be seen as the beginnings of a tentative model for intonation of clarification ellipses in Swedish, which can be implemented and tested in spoken dialogue systems.
Edlund, J, House, D, & Skantze, G (2005):
Prosodic Features in the Perception of Clarification Ellipses. [pdf] In Proceedings of Fonetik 2005, Gothenburg, Sweden.
ABSTRACT:
We present an experiment where subjects were asked to listen to Swedish human-computer dialogue fragments where a synthetic voice makes an elliptical clarification after a user turn. The prosodic features of the synthetic voice were systematically varied, and subjects were asked to judge the computer’s actual intention. The results show that an early low F0 peak signals acceptance, that a late high peak is perceived as a request for clarification of what was said, and that a mid high peak is perceived as a request for clarification of the meaning of what was said. The study can be seen as the beginnings of a tentative model for intonation of clarification ellipses in Swedish, which can be implemented and tested in spoken dialogue systems.
Skantze, G. (2005)
:
Exploring Human Error Recovery Strategies: Implications for Spoken Dialogue Systems. [pdf] Speech Communication, 45(3) (pp. 325-341).
ABSTRACT:
In this study, an explorative experiment was conducted in which subjects were asked to give route directions to each other in a simulated campus (similar to Map Task). In order to elicit error handling strategies, a speech recogniser was used to corrupt the speech in one direction. This way, data could be collected on how the subjects might recover from speech recognition errors. This method for studying error handling has the advantages that the level of understanding is transparent to the analyser, and the errors that occur are similar to errors in spoken dialogue systems. The results show that when subjects face speech recognition problems, a common strategy is to ask task-related questions that confirm their hypothesis about the situation instead of signalling non-understanding. Compared to other strategies, such as asking for a repetition, this strategy leads to better understanding of subsequent utterances, whereas signalling non-understanding leads to decreased experience of task success.
Skantze, G. (2005)
:
Galatea: a discourse modeller supporting concept-level error handling in spoken dialogue systems. [pdf] In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue. Lisbon, Portugal.
ABSTRACT:
In this paper, a discourse modeller for conversational spoken dialogue systems, called GALATEA, is presented. Apart from handling the resolution of ellipses and anaphora, it tracks the "grounding status" of concepts that are mentioned during the discourse, i.e. information about who said what when. This grounding information also contains concept confidence scores that are derived from the speech recogniser word confidence scores. The discourse model may then be used for concept-level error handling, i.e. grounding of concepts, fragmentary clarification requests, and detection of erroneous concepts in the model at later stages in the dialogue.

2004

Edlund J, Skantze G & Carlson R (2004):
Higgins - a spoken dialogue system for investigating error handling techniques. [PDF] [PS]
Proceedings of ICSLP 2004.
ABSTRACT
In this paper, an overview of the Higgins project and the re­search within the project is presented. The project incorpo­rates studies of error handling for spoken dialogue systems on several levels, from processing to dialogue level. A domain in which a range of different error types can be studied has been chosen: pedestrian navigation and guiding. Several data collec­tions within Higgins have been analysed along with data from Higgins' predecessor, the AdApt system. The error han­d­ling research issues in the project are presented in light of these analyses.
Heldner M, Edlund J, & Björkenstam T (2004):
Automatically extracted F0 features as acoustic correlates of prosodic boundaries. [PDF]
Published in Proceedings of Fonetik 2004, Stockholm
ABSTRACT
This work presents preliminary results of an investigation of various automatically extracted F0 features as acoustic correlates of prosodic boundaries. The F0 features were primarily intended to capture phenomena such as boundary tones, F0 resets across boundaries and position in the speaker's F0 range. While there were no correspondences between boundary tones and boundaries, the reset and range features appeared to separate boundaries from no boundaries fairly well.
Skantze G & Edlund J (2004):
Robust interpretation in the Higgins spoken dialogue system. [PDF] [PS]
In Proceedings of Robust 2004, Norwich.
ABSTRACT
This paper describes Pickering, the semantic interpreter developed in the Higgins project - a research project on error handling in spoken dialogue systems. In the project, the initial efforts are cen­tred on the input side of the system. The semantic interpreter combines a rich set of robustness techniques with the production of deep semantic structures. It allows insertions and non-agreement inside phrases, and com­bines partial results to return a lim­ited list of semantically distinct solutions. A preliminary evalua­tion shows that the interpreter performs well under error conditions, and that the built-in robustness tech­niques contribute to this performance.
Skantze G & Edlund J (2004):
Error detection on word level. [PDF] [PS]
In Proceedings of Robust 2004, Norwich.
ABSTRACT
In this paper two studies are presented in which the detection of speech recognition errors on the word level was examined. In the first study, memory-based and transformation-based machine learning was used for the task, using confi­dence, lexical, contextual and discourse features. In the second study, we investigated which factors humans benefit from when detecting errors. Information from the speech recogniser (i.e. word confidence scores and 5-best lists) and contextual information were the factors investigated. The results show that word confidence scores are useful and that lexical and contextual (both from the utterance and from the discourse) features further improve performance.

2003

Gabriel Skantze (2003):
Exploring Human Error Handling Strategies: Implications for Spoken Dialogue Systems [PDF]
Published in Proceedings of ISCA Workshop on Error Handling in Spoken Dialogue Systems
ABSTRACT
In this paper, an experiment with an alternative to the Wizard of Oz method is presented. To be able to study typical error-handling situations, the users had to speak through a speech recognizer, and the operators could not hear what the users said; they could only read the speech recognition result. Since we wanted to investigate natural error handling strategies, we used untrained operators who were not experienced with designing dialogue systems and did not have ideas on how errors traditionally are handled in dialogue systems. Because of this, the operators were allowed to speak freely and the users were told that they interacted with a human operator. However, they were fully informed about the setup, and that the speech recognition should constrain their speech. The operators' speech was distorted through a vocoder and the user and operator were not allowed to see each other before the experiment, in order to minimize common ground. The results clearly illustrate that different knowledge sources (such as confidence scores, syntactic structure and context) can be used to detect errors in the recognition result and react to them in an appropriate way. When non-understandings occur in spoken dialogue systems, a good domain model and robust parsing techniques should be used to pose relevant questions to the user (instead of signaling non-understanding), so that errors can be efficiently resolved without the user experiencing the dialogue as problematic and dominated by error handling.
  • 2004-09-21: Mattias Heldner & Jens Edlund, A suitable place to speak, seminar held at KTH, Dept. of Speech, Music and Hearing by Mattias and Jens [pps] [pdf]
  • Posters & presentations

    Jens Edlund & Mattias Heldner
    Seminar held at KTH, Dept. of Speech, Music and Hearing, 2006-02-22: Prosody in interaction control
    Gabriel Skantze & David House
    Poster at Interspeech 2005 in Lisbon, Portugal, 2005: The Effects of prosodic features on the interpretation of clarification ellipses
    Gabriel Skantze
    Poster at ICSLP 2004 on Jeju Island, Korea, 2004-10-05: Higgins - A spoken dialogue system for investigating error handling techniques
    Mattias Heldner & Jens Edlund
    Seminar held at KTH, Dept. of Speech, Music and Hearing, 2004-09-21: A suitable place to speak
    Gabriel Skantze & Jens Edlund:
    Poster at Robust 2004 in Norwich, UK, 2004-08-31: Pickering - Robust interpetation in the Higgins spoken dialogue system
    Gabriel Skantze:
    Presentation at Robust 2004 in Norwich, UK, 2004-08-31: Error detection on word level
    Gabriel Skantze [in Swedish]:
    Brief presentation of the Pickering semantic interpreter given at a CTT meeting 2004-02-11: Pickering
    Rolf Carlson:
    Brief presentation on domain and research issues given in late 2003: Higgins
    Jens Edlund & Gabriel Skantze:
    Poster and brief presentation given at the CTT-day 2003-10-02: Higgins
    Jens Edlund & Gabriel Skantze:
    Poster and brief presentation given at the CTT-day 2003-04-02: The contribution of speech recognition confidence and dialogue context to error detection and correction


    Valid XHTML 1.1!|Valid CSS!|Level A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0