The place of syntax in large-scale text analysis - towards implementing a distributional radical construction grammar
Jussi Karlgren, Gavagai och ny adjungerad professor vid KTH
Distributional models, such as collocational analyses or probabilistic language models, are based on the analysis of item distribution in linguistic data. In many data-intensive large-scale applications, semantic similarity between words is modelled by computing the contextual agreement between words over a large collection of textual data. The idea is that if we find words that tend to occur in the same contexts then we can assume that they have related meanings. These models are usually very simple: counting white-space delimited words next to each other. These models disregard \"grammar\" or syntactic dependencies and are used wherever high-recall results are necessary. Grammatical models, on the other hand, tell us if words or entitities in a clause have relations to other words or entities as seen is the structure of that clause, are almost invariably built on human hand-computed knowledge, and are used wherever precise results are necessary.
There is little reason to keep models separate. There is but one linguistic signal and there is no reason not to accept the distributional model as a basis for all linguistic processing. This talk discusses the exploration of how grammatical analysis can be formulated in terms which might improve results of distributional models applied to language understanding tasks and what requirements should be posed on a distributional grammatical analysis. This talk will not present a complete model but the starting points towards building one.