The Pickering semantic interpreter is the main interpreter (often parser in speech technology) implementation in the Higgins project. It is implemented in Mozart/Oz, an open source concurrent object-oriented language which combines concurrent and distributed programming with logical constraint-based inference.
The primary goals for the interpreter are robustness and flexibility. The goals are achieved through a number of techniques, some old and tested, some of a more exploratory kind.
A binary version of Pickering is available for download. It runs under Windows. We intend to add a Linux version when we have time to compile and test it. Pickering and the related protocols are in beta version, and may be considered a preview.
Pickering is implemented in Mozart/Oz. A binary beta version for Windows is available for download, as is a beta version of the Pickering manual. Installation and usage instructions are included in the manual.
- Pickering v. 0.9.0 (Zipped Windows binary)
- Pickering manual v. 0.1.2 (HTML)
- Pickering manual v. 0.1.2 (PDF)
Note that both the manual and the binary are intended for use with the http://www.speech.kth.se/higgins/2003/pickering/ namespace and nothing else.
Grammar rules, lexicon, in and out data
The interpreter takes (incremental) speech recognition strings as input, or, alternatively, typed text. The strings may consist of XML markup containing extra information such as confidence measures, time stamps, etc. If XML markup is used in the input strings, it may be propagated to the parse tree that is included in the interpretation result. This makes it significantly easier to find out why the interpreter results look the way they do in post-processing.
The interpreter lexicon and grammar rules are encoded in XML. The full specification will be made available in time, but for now, suffice it to say that the encoding benefits from the inherent tree structure of any XML formalism, but at the same time allows for discontinuous markup. The formalism, theoretically, may be used to describe anything from a stringent CFG to phrase or word matching rules. The format makes a difference between the semantic content of a word or a phrase and its surface representation.
Finally, the interpreter output is divided into two parts. One is, as has already been mentioned, the parse tree. It is intended for logging, training of post-parsing models, testing, and bug-catching. The other part is tightly connected to the formats used by the Higgins knowledge manager. The object parts of the interpretation results may be used directly as search terms in the Higgins database. They are, formally, underspecified Higgins world documents, and as such may be represented using any of the available transformations (world to SVG, X3D, etc.).
For the grammar and lexicon used in the Higgins domain, some words that are written as compounds in Swedish (i.e. without a space) are written as two separate words. This is by no means a new notion (and especially not to e.g. English speakers), but our motivation may be of some interest. Commonly, separation of compounds in Swedish is proposed as a part of an effort to create morphological lexica, which would facilitate generative tasks. In our case, the generative strength of our lexicon and grammar in a general context is of little interest to us. The main interest is the robustness of the interpreter, and its ability to extract the semantic of an utterance given the domain. Note that we concentrate exclusively on semantics that are relevant to the given domain. Our motives for separating (some) Swedish compounds, then, are semantic and, to some extent, phonetic/phonological:
- It is done in spoken language (pauses occur). Domain data supporting this will be presented in time.
- The speech recogniser used in Higgins (Daytona) won't care (uses triphones over words)
- We only separate words where one of the resulting parts, or preferably both, have domain relevant semantics. This may improve the correlation between recogniser confidence measures and concept accuracy.
- There are some evidence in our data that users are more disfluent, or takes more time to decide, when uttering attributes than head words. This seems to go for the first and second part of compunds as well. Combined with the pauses between first and second words in compounds, this may constitute a phonetic motivation.
- If a compound carries two bits of semantic meaning, like trähus (eng. wooden house, we would get either both or none right unless the compound was split. In hus av trä (eng. house of wood), we get none, one or both meanings. By separation the compound, the two semantically equal structures behave the same.
Naturally, we'll report test results from this.
Much more to come...
A number of ideas used in the AdApt parser have been transferred to Pickering. For example, the output can in part be viewed as database search constraints, and some references are treated as underspecified search constraints. The AdApt parser is described in:
- Johan Boye & Mats Wirén (2003): Negotiative spoken-dialogue interfaces to databases, in Proceedings of DiaBruck 2003
- Johan Boye & Mats Wirén (2003): Robust parsing of utterances in negotiative dialogue, in Proceedings of Eurospeech 2003