Software Secret Weapons™


 
Word Net
by Pavel Simakov on 2006-03-29 01:26:35 under Text Mining, view comments
Bookmark and Share
 


Introduction
WordNet is basically a hierarchy of words of English language. It is a multidimensional hierarchy with following dimensions:

  • word is-a-kind-of word
  • word is-derived-from word
  • word is-a-part-of word
  • word is-synonym-of word
  • word is-a-member-of word
  • word is-antonym-of word
  • word is-substance-of word

WordNet covers large part of the language. There are 114648 nouns, 11306 verbs, 21436 adjectives, 4669 adverbs in version 2.0. Pretty cool!

It might be interesting from the design point of view that this hierarchy has a small number of roots. These roots are:

  • abstraction (a general concept formed by extracting common features from specific examples)
  • act, human action, human activity (something that people do or cause to happen)
  • entity (that which is perceived or known or inferred to have its own distinct existence (living or nonliving))
  • event (something that happens at a given place and time)
  • group, grouping (any number of entities (members) considered as a unit)
  • phenomenon (any state or process known through the senses rather than by intuition or reasoning)
  • possession (anything owned or possessed)
  • psychological feature (a feature of the mental life of a living organism)
  • state (the way something is with respect to its main attributes; "the current state of knowledge"; "his state of health"; "in a weak financial state")

Original distribution of WordNet is a set of declarative prolog rules for each of the dimensions above. We have written some Java code to convert these prolog rules (text files) into MySQL database. We also have all the queries that navigate above dimensions.

Another very cool project is Link Grammar. The Link Grammar is a system for parsing English language sentences.

No comments yet


Leave a comment


 
Dog Emotional 2010 Calendar Dog Emotional Mousepad Dog Fashionable 2010 Calendar Dog Fashionable Mousepad

Copyright © 2004-2010 by Pavel Simakov
any conclusions, recommendations, ideas, thoughts or the source code presented on this site are my own and do not reflect a official opinion of my current or past employers, partners or clients
SourceForge.net Logo