Software Secret Weapons™


 
Working With WORDNET With Java And MYSQL
by Pavel Simakov on 2007-05-07 16:12:39 under Text Mining, view comments
Bookmark and Share
 


Introduction
The original WordNet database is distributed as a set of Prolog predicate files. There were some attempts to provide access to WordNet from Java, including Java WordNet Library (JWNL). But I did not find a package that is simple, uses pure Java (non-JNI), has database as backend.

To address this problem, I created a parser to import WordNet into MySQL database. Later, I added oy-wn, simple multidimensional WordNet navigation Java API. I describe here the most relevant information that you need to use this package. Please contact me if you need help or have any comments. The oy-wn package can be downloaded at the bottom of the page.

How to convert WordNet 2.0 Prolog files to MySQL database
It is quite simple to create MySQL (or any other SQL database for that matter) database from WordNet 2.0 prolog files. Open com.oy.shared.wn.samples.ImportFromFiles.java in your favorite editor. Change IMPORT_FOLDER property to point to the location of WordNet prolog files, for example: d:/setup/wordnet/WNprolog-2.0/prolog/. This folder must contain *.pl files, they are part of WordNet distribution. Change CONN_STRING property to a valid connection string to your MySQL database, for example: "jdbc:mysql://localhost:10011/mysql?user=root". Run ImportFromFiles and you should see the following output:

 
Wed Mar 29 00:42:39 EST 2006	ImportFromFiles	INFO	com.oy.shared.wn.db.impl.DataAccessFactory has started
Wed Mar 29 00:42:39 EST 2006	ImportFromFiles	INFO	Added alias com.oy.shared.wn.core.WNDatabase
Wed Mar 29 00:42:39 EST 2006	ImportFromFiles	INFO	com.oy.shared.wn.core.WNDatabase has started
Wed Mar 29 00:42:39 EST 2006	ImportFromFiles	INFO	+ begin D:/SETUP/wordnet/WNprolog-2.0/prolog/
Wed Mar 29 00:42:39 EST 2006	ImportFromFiles	INFO	> begin wn_ant.pl
Wed Mar 29 00:42:44 EST 2006	ImportFromFiles	INFO	> end wn_ant.pl, 7993 items
Wed Mar 29 00:42:44 EST 2006	ImportFromFiles	INFO	> begin wn_at.pl
Wed Mar 29 00:42:45 EST 2006	ImportFromFiles	INFO	> end wn_at.pl, 1296 items
Wed Mar 29 00:42:45 EST 2006	ImportFromFiles	INFO	> begin wn_cls.pl
Wed Mar 29 00:42:49 EST 2006	ImportFromFiles	INFO	> end wn_cls.pl, 8429 items
Wed Mar 29 00:42:49 EST 2006	ImportFromFiles	INFO	> begin wn_cs.pl
Wed Mar 29 00:42:49 EST 2006	ImportFromFiles	INFO	> end wn_cs.pl, 218 items
Wed Mar 29 00:42:50 EST 2006	ImportFromFiles	INFO	> begin wn_der.pl
Wed Mar 29 00:43:12 EST 2006	ImportFromFiles	INFO	> end wn_der.pl, 42988 items
Wed Mar 29 00:43:12 EST 2006	ImportFromFiles	INFO	> begin wn_ent.pl
Wed Mar 29 00:43:13 EST 2006	ImportFromFiles	INFO	> end wn_ent.pl, 409 items
Wed Mar 29 00:43:13 EST 2006	ImportFromFiles	INFO	> begin wn_fr.pl
Wed Mar 29 00:43:23 EST 2006	ImportFromFiles	INFO	> end wn_fr.pl, 21345 items
Wed Mar 29 00:43:23 EST 2006	ImportFromFiles	INFO	> begin wn_g.pl
Wed Mar 29 00:45:00 EST 2006	ImportFromFiles	INFO	> end wn_g.pl, 115424 items
Wed Mar 29 00:45:00 EST 2006	ImportFromFiles	INFO	> begin wn_hyp.pl
Wed Mar 29 00:46:15 EST 2006	ImportFromFiles	INFO	> end wn_hyp.pl, 94842 items
Wed Mar 29 00:46:15 EST 2006	ImportFromFiles	INFO	> begin wn_mm.pl
Wed Mar 29 00:46:25 EST 2006	ImportFromFiles	INFO	> end wn_mm.pl, 12205 items
Wed Mar 29 00:46:25 EST 2006	ImportFromFiles	INFO	> begin wn_mp.pl
Wed Mar 29 00:46:32 EST 2006	ImportFromFiles	INFO	> end wn_mp.pl, 8636 items
Wed Mar 29 00:46:32 EST 2006	ImportFromFiles	INFO	> begin wn_ms.pl
Wed Mar 29 00:46:32 EST 2006	ImportFromFiles	INFO	> end wn_ms.pl, 787 items
Wed Mar 29 00:46:32 EST 2006	ImportFromFiles	INFO	> begin wn_per.pl
Wed Mar 29 00:46:39 EST 2006	ImportFromFiles	INFO	> end wn_per.pl, 7920 items
Wed Mar 29 00:46:39 EST 2006	ImportFromFiles	INFO	> begin wn_ppl.pl
Wed Mar 29 00:46:39 EST 2006	ImportFromFiles	INFO	> end wn_ppl.pl, 124 items
Wed Mar 29 00:46:39 EST 2006	ImportFromFiles	INFO	> begin wn_s.pl
Wed Mar 29 00:49:14 EST 2006	ImportFromFiles	INFO	> end wn_s.pl, 203147 items
Wed Mar 29 00:49:14 EST 2006	ImportFromFiles	INFO	> begin wn_sa.pl
Wed Mar 29 00:49:16 EST 2006	ImportFromFiles	INFO	> end wn_sa.pl, 3294 items
Wed Mar 29 00:49:16 EST 2006	ImportFromFiles	INFO	> begin wn_sim.pl
Wed Mar 29 00:49:28 EST 2006	ImportFromFiles	INFO	> end wn_sim.pl, 22196 items
Wed Mar 29 00:49:28 EST 2006	ImportFromFiles	INFO	> begin wn_vgp.pl
Wed Mar 29 00:49:29 EST 2006	ImportFromFiles	INFO	> end wn_vgp.pl, 1748 items
Wed Mar 29 00:49:29 EST 2006	ImportFromFiles	INFO	+ end D:/SETUP/wordnet/WNprolog-2.0/prolog/, done 18, skip 0
Wed Mar 29 00:49:29 EST 2006	ImportFromFiles	INFO	removed alias com.oy.shared.wn.core.WNDatabase
Wed Mar 29 00:49:29 EST 2006	ImportFromFiles	INFO	com.oy.shared.wn.db.impl.DataAccessFactory has stopped
Wed Mar 29 00:49:29 EST 2006	ImportFromFiles	INFO	com.oy.shared.wn.core.WNDatabase has stopped

At the end of the import process that takes 5 minutes you will have a complete MySQL database that contains WordNet rules. Each source file will result in one database table with the same name. The parsing occurs in theWNPrologRuleLoader class. We do not modify the source data in any way. Next section tell yo how to use the database from Java API.

WordNet Java API
The Java API to WordNet rule stored in MySQL database is quite simple. There are three essential interfaces you will need to learn: IWNDatabase, IWNSysnset, IWNWord. The IWNDatabase represents a database connection. It has methods to get root words and find words similar to a given word. The IWNSynset, as far as I know, is a set of words that have similar meaning. All searches or lookups in WordNet return IWNSynset , not individual words. From a IWNSysnset you can either get IWNWord it contains, or you can navigate along various dimensions I described above. Each navigation returns the IWNSysnset in turn. The IWNWord is a normal word; it can be printed as text. The ConsoleTest application is available for demonstration.

 
public interface IWNDatabase {
	public IWNSynset  getRoots() throws SQLException;
	public IWNSynset  getSynsetByGloss(String gloss) throws SQLException;
	public IWNSynset getSynsetById(int synset_id) throws SQLException;
	public IWNSynset  getSynsetByWord(String gloss) throws SQLException;	
}

public interface IWNSynset { public int getSynsetId(); public String getGloss(); public WNWord getWords() throws SQLException; public WNSynset getParents() throws SQLException; public WNSynset getParentsReverse() throws SQLException; public WNSynset getDerived() throws SQLException; public WNSynset getDerivedReverse() throws SQLException; public WNSynset getSimilar() throws SQLException; public WNSynset getSimilarReverse() throws SQLException; public WNSynset getPartOf() throws SQLException; public WNSynset getPartOfReverse() throws SQLException; public WNSynset getMemberOf() throws SQLException; public WNSynset getMemberOfReverse() throws SQLException; public WNSynset getSubstanceOf() throws SQLException; public WNSynset getSubstanceOfReverse() throws SQLException; public WNSynset getAntonyms() throws SQLException ; public WNSynset getAntonymsReverse() throws SQLException; }

public interface IWNWord { public String getWord(); public int getWordNum(); public WNSynset getSynset(); public int getSenseNumber(); public String getSSType(); }

Following my general policy to add self-monitoring to all software libraries, the oy-wn package is self monitorable. The SNMP monitoring is provided by my other package Linguine Watch.

Download
Licensed under LGPL. You can download full Java source code and samples:

Comments (7)

  • Comment by LARABI Boubakeur — February 6, 2008 @ 11:46 am

    I will be buil a engien researher. by wordnet fot my finqly project.

  • Comment by Aravindh.T — February 16, 2009 @ 12:59 pm

    Sir,
    I want to access the verb sense file to my database so that i can make use of it in my application. I was not able to do that sir could you help me out how to do? And also kindly tell me where the word net prolog files will be?

  • Comment by Aravindh.T — February 16, 2009 @ 1:00 pm

    Sir,
    I want to access the Word Net verb sense file to my database so that i can make use of it in my application. I was not able to do that sir could you help me out how to do? And also kindly tell me where the word net prolog files will be?

  • Comment by bablu — May 24, 2009 @ 2:16 pm

    i wann how to run java based word net software using class path in word net

  • Comment by bablu — May 24, 2009 @ 2:18 pm

    plzz write how to run classpath in java application

  • Comment by bablu — May 24, 2009 @ 2:25 pm

    i hav problem to run 1 java based application software of wordnet due to classpath problems

  • Comment by shanu — March 20, 2010 @ 1:13 am

    i cant load wn_s into mysql server.
    whenever i try to run the file ImportFromFiles.java , I received following error.
    Sun Mar 21 01:12:07 PDT 2010 testmysql.Main INFO > end wn_fr.pl, 21649 items
    Sun Mar 21 01:12:07 PDT 2010 testmysql.Main INFO > begin wn_g.pl
    Sun Mar 21 01:12:07 PDT 2010
    Sun Mar 21 01:12:07 PDT 2010 testmysql.Main ERROR java.sql.SQLException
    Sack trace:
    java.sql.SQLException: Data too long for column 'gloss' at row 1
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2001)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1168)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1279)
    at com.mysql.jdbc.Connection.execSQL(Connection.java:2281)


Leave a comment


 
Dog Emotional 2010 Calendar Dog Emotional Mousepad Dog Fashionable 2010 Calendar Dog Fashionable Mousepad

Copyright © 2004-2010 by Pavel Simakov
any conclusions, recommendations, ideas, thoughts or the source code presented on this site are my own and do not reflect a official opinion of my current or past employers, partners or clients
SourceForge.net Logo