Splitter Module

The splitter module receives lists of word objects (either produced by the tokenizer or by any other means in the calling application) and buffers them until a sentence boundary is detected. Then, a list of sentence objects is returned.

The buffer of the splitter may retain part of the tokens if the given list didn't end with a clear sentence boundary. The caller application can sumbit further token lists to be added, or request the splitter to flush the buffer.

The API for the splitter class is:

class splitter {
   public:
      /// Constructor. Receives a file with the desired options
      splitter(const std::string &);

      /// Add list of words to the buffer, and return complete sentences 
      /// that can be build.
      /// The boolean states if a buffer flush has to be forced (true) or
      /// some words may remain in the buffer (false) if the splitter 
      /// wants to wait to see what is coming next.
      std::list<sentence> split(const std::list<word> &, bool);
};



Subsections

Lluís Padró 2010-09-02