This module performs word-sense-disambiguation on content words in given sentences. This module is to be used if word sense disambiguation (WSD) is desired. If no disambiguation (or basic most-frequent-sense disambiguation) is needed, the senses module described in section 3.13 is a lighter and faster option.
The module is just a wrapper for UKB algorithm [AS09], which is integrated in FreeLing and distributed as-is under its original GPL license.
UKB algorithm relies on a semantic relation network (in this case, WN and XWN) to disambiguate the most likely senses for words in a text using PageRank algorithm. See [AS09] for details on the algorithm.
The module enriches each analysis of each word (for the selected PoS) with a ranked list of senses. The PageRank value is also provided as a result.
The API of the class is the following:
class disambiguator { public: /// Constructor. Receives a relation file for UKB, a sense dictionary, /// and two UKB parameters: epsilon and max iteration number. disambiguator(const std::string &, const std::string &, double, int); /// word sense disambiguation for each word in given sentences void analyze(std::list<sentence> &); };
The constructor receives:
This file is created from the plain files (extracted from UKB package)
data/common/wnet30-rels.txt
and data/common/wnet30g-rels.txt
by FreeLing installation scripts.
You can re-create it (or create a new one using a different
relation set) with the following commands:
compile_ukb -o $FLSHARE/common/wn30-ukb.bin $FLSHARE/common/wnet30-rels.txt compile_ukb -o $FLSHARE/common/xwn30-ukb.bin $FLSHARE/common/wnet30-rels.txt $FLSHARE/common/wnet30g-rels.txt(
$FLSHARE
refers to the share/FreeLing
directory in
your FreeLing installation, which defaults to
/usr/local/share/FreeLing
if you installed from source
(/usr/share/FreeLing
if you used a binary .deb package).
You can change this environment variable to use different linguistic data files.
You can re-create this file (or use any other sense dictionary with the right format) with the command (paths may differ):
convertdict <$FLSHARE/es/senses30.src >$FLSHARE/es/senses30.ukb(
$FLSHARE
refers to the share/FreeLing
directory in your
FreeLing installation, which defaults to /usr/local/share/FreeLing
if you installed from source, or /usr/share/FreeLing
if you
used a binary .deb package.
Obviously, if you want to convert the file for a language different
than Spanish, you have to use the right path).
Lluís Padró 2010-09-02