GSLT: NLP-course HT02 Assignment 3

Gabriel Skantze

The Centering Theory was applied to a passage from the bible, according to Jurafsky & Martin (2001), pp 691-694. The theory can be used for analysis of both text coherence and reference resolution. The focus in this study has been on the latter.

Application of the theory

In order to analyze the text, it must first be divided into "utterances". This appeared to be the hardest part of the analysis, since the definition of an "utterance" has great impact on the result. In this analysis, an utterance was defined as a sentence.

Next, a set of "centers" should be identified in each utterance:
Forward looking center, Cf, is the set of referents in the utterance, ranked in the order: subject, existential predicate nominal, object, indirect object or oblique, demarcated adverbial PP.
Backward looking center, Cb, is the the most highly ranked element of the previous utterance, that is mentioned in the current utterance.
Preferred center, Cp, is the highest ranked element in Cf.

Each utterance should also be classified according to how it relates to the previous utterance. There are four categories, defined as follows:
Continue: Cb(current utterance) = Cp(current utterance) & Cb(current utterance) = Cb(previus utterance)
Retain: Cb(current utterance) != Cp(current utterance) & Cb(current utterance) = Cb(previus utterance)
Smooth-Shift: Cb(current utterance) = Cp(current utterance) & Cb(current utterance) != Cb(previus utterance)
Rough-Shift: Cb(current utterance) != Cp(current utterance) & Cb(current utterance) != Cb(previus utterance)

Jurafsky and Martin doesn't tell you what to do if a Cb could not be identified, i.e. if none of the forward looking centers in the previous utterance is found in the current utterance. In these cases, it was set to "undefined", and the utterance was classified as a Rough-Shift.

Different interpretations of pronoun referents will give different interpretations of the utterance. In order to choose the right interpretation, there are two rules in the Centering Theory:
Rule 1: If any element of the Cf in the previous utterance is realized as a pronoun in the current utterance, the Cb of the current utterance must be realized as a pronoun also.
Rule 2: Continue is preferred to Retain is preferred to Smooth-Shift is preferred to Rough-Shift.
Apart from these rules, there are also other constraints that do not belong to the centering theory that could be used to filter out the right interpretation, such as coreference constraints and selectional restrictions.

Analysis

The text was analyzed according to the theory. In cases where pronouns appeared, only interpretations where gender and number filters already had been applied were considered. I.e. interpretations such as "he refers to Eve" were not even considered.

To see the analysis you must have a browser that can do XSLT (such as Internet Explorer). The result is here. In cases where multiple interpretations were considered, the reason for making the correct selection is also marked. Interpretations that were considered but not selected are colored in gray.

Evaluation

There were only 5 cases where the referent could not be simply solved by using basic constraints such as gender and number.

3 of these could be solved using selectional constraints:
det var en fröjd för ögat och ett härligt träd, eftersom det skänkte vishet. ("det" = "ögat")
Och de fäste ihop fikonlöv och band dem kring höfterna. ("dem" = "fokonlöv")
Herren sade till Kain: "Var är din bror Abel?" Han svarade: "Det vet jag inte. Skall jag ta hand om min bror?" ("han" = "Kain")

The last one could also be solved using the centering Rule 2, since the right interpretation gives a "Continue" compared to a "Smooth Shift".

The remaining 2 cases could be solved using centering:
Herren såg med välvilja på Abel och hans gåva men inte på Kain och hans gåva. Då blev Kain vred, och han sänkte blicken. ("han" = "Kain"; using Rule 1)
Kain sade till sin bror Abel: "Kom med ut på fälten." Där överföll han sin bror Abel och dödade honom. ("han" = "Kain"; using Rule 2)

Thus, any evaluation metric would result in 100% recall/precision/success/accuracy

Discussion

As mentioned before, the result does depend on the definition of an "utterance". Jurafsky & Martin does not give any guidelines for this. There were also other things that were unclear:

Jurafsky & Martin defines Cb as "the most highly ranked element of [Cf(previous utterance)] mentioned in [the current utterance]". This very specific definition was used in this analysis. However, they also describe it as "the entity currently being focused on in the discourse after [the current utterance] is interpreted". I can't really see how these definitions fit together, or how they necessarily mean the same thing. The latter definition rather seems to refer to Cp. As mentioned before, they do not say anything about what should be done when there are no candidates for Cb, i.e. when the Cf of the previous utterance does not have any elements in common with Cf in the current utterance.

What should be done when the referent of a pronoun cannot be found in the previous utterance (see the first three utterances in the example)? The approach taken here was to backtrack further through the previous utterances. However, should Cb in the third utterance be undefined or set to the found referent?

How should pronouns be resolved when they are evoked and accessed in the same utterance? The Centering algorithm doesn't seem to give any answer to this.

What should be done when a set of referents are referred, such as "de" = "eva och adam"? Does both "eva" and "adam" belong to the Cf, or some sort of set, such as [eva, adam]?

The common use of citations in the example text was also problematic. I chose not to deal with them, since they seem to have their own "focus space".