Class splitter implements a sentence splitter, which accumulates lists of words until a sentence is completed, and then returns a list of sentence objects. More...
#include <splitter.h>
Public Member Functions | |
splitter (const std::string &) | |
Constructor. | |
void | split (const std::list< word > &, bool, std::list< sentence > &ls) |
split sentences with default options | |
std::list< sentence > | split (const std::list< word > &, bool) |
Split and return a copy of the sentences. | |
Private Member Functions | |
bool | end_of_sentence (std::list< word >::const_iterator, const std::list< word > &) const |
check for sentence markers | |
Private Attributes | |
bool | SPLIT_AllowBetweenMarkers |
configuration options | |
int | SPLIT_MaxLines |
std::set< std::string > | starters |
Sentence delimiters. | |
std::map< std::string, bool > | enders |
std::map< std::string, int > | markers |
Open-close marker pairs (parenthesis, etc). | |
bool | betweenMrk |
int | no_split_count |
std::list< int > | mark_type |
std::list< std::string > | mark_form |
sentence | buffer |
accumulated list of returned sentences |
Class splitter implements a sentence splitter, which accumulates lists of words until a sentence is completed, and then returns a list of sentence objects.
splitter::splitter | ( | const std::string & | SplitFile | ) |
Constructor.
Create a sentence splitter.
References betweenMrk, enders, ERROR_CRASH, mark_form, mark_type, markers, no_split_count, SAME, SPLIT_AllowBetweenMarkers, SPLIT_MaxLines, and starters.
bool splitter::end_of_sentence | ( | std::list< word >::const_iterator | w, | |
const std::list< word > & | v | |||
) | const [private] |
check for sentence markers
Check whether a word is a sentence end (eg a dot followed by a capitalized word).
References starters.
list< sentence > splitter::split | ( | const std::list< word > & | v, | |
bool | flush | |||
) |
Split and return a copy of the sentences.
References split().
void splitter::split | ( | const std::list< word > & | , | |
bool | , | |||
std::list< sentence > & | ls | |||
) |
split sentences with default options
Referenced by split().
bool splitter::betweenMrk [private] |
Referenced by splitter().
sentence splitter::buffer [private] |
accumulated list of returned sentences
accumulated words of current sentence
std::map<std::string,bool> splitter::enders [private] |
Referenced by splitter().
std::list<std::string> splitter::mark_form [private] |
Referenced by splitter().
std::list<int> splitter::mark_type [private] |
Referenced by splitter().
std::map<std::string,int> splitter::markers [private] |
Open-close marker pairs (parenthesis, etc).
Referenced by splitter().
int splitter::no_split_count [private] |
Referenced by splitter().
bool splitter::SPLIT_AllowBetweenMarkers [private] |
configuration options
Referenced by splitter().
int splitter::SPLIT_MaxLines [private] |
Referenced by splitter().
std::set<std::string> splitter::starters [private] |
Sentence delimiters.
Referenced by end_of_sentence(), and splitter().