Sense Dictionary File

The sense dictionary file is a Berkeley DB indexed file.

It can be created with the indexdict program provided with FreeLing, which is called with the command:

   indexdict indexed-dict-name  <source-dict
See the (very simple) source code in src/main/utilities/indexdict.cc if you're interested on how it is indexed.

The source file (e.g. senses16.src provided with FreeLing) must contain the sense list of each lemma-PoS, one entry per line.

Each line has format: type:lemma:PoS synset1 synset2 ....
E.g.
W:cebolla:N 05760066 08734429 08734702
S:07389783:N chaval chico joven mozo muchacho

The type field may be either W (for Word) or S (for Sense), and indicates whether the rest of the line contains either a word and all its sense codes, or a sense code and all its synonym words.

For W entries, the sense code list is assumed to be ordered from most to least frequent sense for that lemma-PoS by the sense annotation module. This is used when value msf is selected for the SenseAnnotation option.

Type S entries are used by dependency parsing rules.

Sense codes can be anything (assuming your later processes know what to do with them). The provided files contain WordNet 1.6 synset codes.

Lluís Padró 2010-09-02