Punctuation Detection Module

The punctuation detection module assigns Part-of-Speech tags to punctuation symbols. The API of the class is the following:

  
class punts {
   public:
      /// Constructor: receives data file name
      punts(const std::string &); 
 
      /// Detect punctuation in given sentence
      void annotate(sentence &);
};

The constructor receives as parameter the name of a file containing the list of the PoS tags to be assigned to each punctuation symbol.

Note that this module will be applied afer the tokenizer, so, it will only annotate symbols that have been separated at the tokenization step. For instance, if you include the three suspensive dots (...) as a single punctuation symbol, it will have no effect unless the tokenizer has a rule that causes these substring to be tokenized in one piece.



Subsections

Lluís Padró 2010-09-02