Computing Natural-Language Meaning for The Semantic Web
- 16.4.2007 Institute of Computer Science Academy of Sciences of the Czech Republic
- Patrick Hanks (Fakulta Informatiky Masarykovy University Brno)
[edit] Invitation text
The Semantic Web aims to enable computers to manipulate data meaningfully, not merely to do pattern matching for document retrieval. It does not process everyday natural language, but only precisely defined technical terms in specific domains. Its RDF (Resource Description Framework) claims to be ?a lightweight ontology ... intended to express propositions using precise formal vocabularies... [It] is not an inference system? . [Hayes 2004] RDF is thus the latest in a long line of attempts to create ?the perfect language? [Eco 1997] without vagueness and without ambiguity, which have been going on since the European Enlightenment, from Wilkins (1668) and Leibniz to Bertrand Russell. If RDF has greater viability than the failed attempts of those great thinkers, this is only because it is intended to be read and processed by computers rather than people.
Sooner or later, however, the Semantic Web will have to come to terms with the challenge of processing natural language as it is used by real people. The vagueness and ambiguity that Wilkins, Leibniz, and Russell perceived as imperfections are now seen by some theorists as essential design features of a flexible system. Be that as it may, it is inconceivable that ?all those intelligent people on the Internet? will be persuaded to abandon natural language and start to use a language consisting of a precisely defined formal vocabulary, so we had better continue to work on the problem of meaning in natural language for computational processing.
In this talk, I present a resource currently being developed at the Masaryk University, Brno, which deals with many (though not all ? there are no magic bullet in lexical semantics) problems of vagueness and ambiguity in real texts. The resource is ?The Pattern Dictionary of English Verbs?. First, all uses of a given verb in a large sample of actual uses are analysed, distinguishing normal, conventional uses from ad-hoc metaphors, ellipses, and other exploitations of norms. Rarely do the exploitations account for more than about 10% of all uses, but in the past they have proved an insuperable distraction for those linguists who had set themselves the goal of accounting for all possible uses of a word. Our target is all normal uses. This goes a long way towards resolving the problem of vagueness. Full resolution will depend on the creation of a set of empirically well founded exploitation rules, which have yet to be written. Our current goals are to get the norms right for the verbs and to see how many of the nouns fall into place as a result.
The normal uses of verb are classified into semantically motivated syntagmatic patterns, in which the arguments (together with some subargumental cues) are correlated. Meanings are associated, not with a target word in isolation, but with a pattern: a target word in any of its normal contexts. This simple reformulation (?First find the pattern, then assign the meaning?) goes a very long way to solving the word sense disambiguation problem. In most cases, the syntagmatic patterns in which a given verb is used, when correctly formulated, are mutually exclusive. For many verbs, different patterns are associated with the same meaning, but it is rare for one pattern to be associated with two or more meanings. In the rare cases when this does happen, either the domain or the wider context (or both) can provide a disambiguating clue. In the extremely rare cases (when neither the pattern nor the context disambiguate), there is a genuine ambiguity in the language?not merely a function of a combinatorial explosion in processing.
The talk will focus on showing sample entries from the Pattern Dictionary and discuss some of the issue that arise.
[edit] Conclusions
- Meanings are best associated with normal contexts, rather than words in isolation.
- The whole language system is probabilistic and preferential.
- Normal contexts correlate statistically significant collocations in different clause roles.