Part-of-Speech Tagger Module

There are two different modules able to perform PoS tagging. The application should decide which method is to be used, and instantiate the right class.

The first PoS tagger is the hmm_tagger class, which is a classical trigam Markovian tagger, following [Bra00].

The second module, named relax_tagger, is a hybrid system capable to integrate statistical and hand-coded knowledge, following [Pad98].

The hmm_tagger module is somewhat faster than relax_tagger, but the later allows you to add manual constraints to the model. Its API is the following:

class hmm_tagger: public POS_tagger {
   public:
       /// Constructor
       hmm_tagger(const std::string &, const std::string &, bool, unsigned int);

       /// disambiguate given sentences 
       void analyze(std::list<sentence> &);
};

The hmm_tagger constructor receives the following parameters:

The relax_tagger module can be tuned with hand written constraint, but is about 2 times slower than hmm_tagger.

class relax_tagger : public POS_tagger {
   public:
       /// Constructor, given the constraint file and config parameters
       relax_tagger(const std::string &, int, double, double, bool, unsigned int);

       /// disambiguate sentences
       void analyze(std::list<sentence> &);
};

The relax_tagger constructor receives the following parameters:

The iteration number, scale factor, and threshold parameters are very specific of the relaxation labelling algorithm. Refer to [Pad98] for details.



Subsections
Lluís Padró 2010-09-02