The XC WordNet module integrates the Wordnet linguistic database into XC Drupal Toolkit.

The module has two parts. The first is to import Wordnet database into Apache Solr, and the second is the Drupal part.

Part 1: setting up the Solr index

We provided a a Java class (inside the XC module: xc_wordnet/resources/WordnetSyns2Solr.java), which transforms the Wordnet's Prolog files into an Apache Solr consumable XML file. This class is a rewriten version of Syns2Syms class available at https://gist.github.com/562776, and authored by Christopher Bradford (https://github.com/bradfordcp).

It converts the prolog files wn_s.pl, wn_sk.pl, and wn_g.pl, from the WordNet prolog download into an XML file which consumable with Apache Solr.

This has been tested with WordNet 3.0.

The structure of the resulted Solr documents is the following:

<doc>
   <field name="id">100001740</field>
   <field name="gloss_t">that which is perceived or known or inferred to have its own distinct existence (living or nonliving)</field>
   <field name="lexfile_s">03</field>
   <field name="lexdict_s">noun.Tops</field>
   <field name="word_t">entity</field>
   <field name="word_s">entity</field>
</doc>

The fields:

    id: the synset identifier
    gloss_t: the glossary (the meaning of the word group)
    lexfile_s: the number of lexfile
    lexdict: the machine name of the lexfile
    word_t: the word(s) belonging to this group indexable as text
    word_s: the word(s) belonging to this group indexable as phrase

Usage: