Introduction
WordNet
is basically a hierarchy of words of English language. It is a multidimensional hierarchy with following dimensions:
- word is-a-kind-of word
- word is-derived-from word
- word is-a-part-of word
- word is-synonym-of word
- word is-a-member-of word
- word is-antonym-of word
- word is-substance-of word
WordNet covers large part of the language. There are 114648 nouns, 11306 verbs, 21436 adjectives, 4669 adverbs in version 2.0. Pretty cool!
It might be interesting from the design point of view that this hierarchy has a small number of roots. These roots are:
- abstraction (a general concept formed by extracting common features from specific examples)
- act, human action, human activity (something that people do or cause to happen)
- entity (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))
- event (something that happens at a given place and time)
- group, grouping (any number of entities (members) considered as a unit)
- phenomenon (any state or process known through the senses rather than by intuition or reasoning)
- possession (anything owned or possessed)
- psychological feature (a feature of the mental life of a living organism)
- state (the way something is with respect to its main attributes; "the current state of knowledge"; "his state of health"; "in a weak financial state")
Original distribution of WordNet is a set of declarative prolog rules for each of the dimensions above.
We have written some Java code to convert these prolog rules (text files) into MySQL database. We also have all the queries that navigate above dimensions.
Another very cool project is Link Grammar
. The Link Grammar is a system for parsing English language sentences.
No comments yet
Leave a comment