Class locutions recognizes multiwords belonging to a list obtained from a configuration file. More...
#include <locutions.h>
Public Member Functions | |
locutions (const std::string &) | |
Constructor. | |
void | add_locution (const std::string &) |
Add a locution rule to the multiword recognizer. | |
Private Member Functions | |
void | check (const std::string, std::set< std::string > &, bool &, bool &) |
int | ComputeToken (int, sentence::iterator &, sentence &) |
Compute the right token code for word j from given state. | |
void | ResetActions () |
Reset current multiword acumulator. | |
void | StateActions (int, int, int, sentence::const_iterator) |
Perform necessary actions in "state" reached from state "origin" via word j interpreted as code "token": Basically, when reaching a state, update accumulated multiword. | |
void | SetMultiwordAnalysis (sentence::iterator, int) |
Set the appropriate lemma and parole for the new multiword. | |
bool | ValidMultiWord (const word &) |
Perform last minute validation before effectively building multiword. | |
Private Attributes | |
std::map< std::string, std::string > | locut |
store multiword list | |
std::set< std::string > | prefixes |
store multiword prefixes | |
std::set< std::string > | acc_mw |
partially build multiword. | |
std::set< std::string > | longest_mw |
std::vector< word > | components |
store mw components in case we need to recover them | |
int | over_longest |
count words scanned beyond last longest mw found. | |
std::list< analysis > | mw_analysis |
analysis assigned to the mw by the validation step |
Class locutions recognizes multiwords belonging to a list obtained from a configuration file.
locutions::locutions | ( | const std::string & | locFile | ) |
Constructor.
Create a multiword recognizer, loading multiword file.
References add_locution(), ERROR_CRASH, automat::Final, automat::initialState, M, MAX_STATES, MAX_TOKENS, P, STOP, automat::stopState, TK_mw, TK_mwL, TK_mwP, TK_pref, TK_prefL, TK_prefP, TRACE, and automat::trans.
void locutions::add_locution | ( | const std::string & | line | ) |
Add a locution rule to the multiword recognizer.
References locut, and prefixes.
Referenced by locutions(), quantities_ca::quantities_ca(), quantities_en::quantities_en(), quantities_es::quantities_es(), and quantities_gl::quantities_gl().
void locutions::check | ( | const std::string | , | |
std::set< std::string > & | , | |||
bool & | , | |||
bool & | ||||
) | [private] |
Referenced by ComputeToken(), and ValidMultiWord().
int locutions::ComputeToken | ( | int | state, | |
sentence::iterator & | j, | |||
sentence & | se | |||
) | [private, virtual] |
Compute the right token code for word j from given state.
Implements automat.
References acc_mw, check(), components, over_longest, TK_mw, TK_other, TK_pref, and TRACE.
void locutions::ResetActions | ( | ) | [private, virtual] |
Reset current multiword acumulator.
Implements automat.
References acc_mw, components, longest_mw, and mw_analysis.
void locutions::SetMultiwordAnalysis | ( | sentence::iterator | i, | |
int | fstate | |||
) | [private, virtual] |
Set the appropriate lemma and parole for the new multiword.
Implements automat.
References mw_analysis, and TRACE.
void locutions::StateActions | ( | int | origin, | |
int | state, | |||
int | token, | |||
sentence::const_iterator | j | |||
) | [private, virtual] |
Perform necessary actions in "state" reached from state "origin" via word j interpreted as code "token": Basically, when reaching a state, update accumulated multiword.
Implements automat.
References longest_mw, and TRACE.
bool locutions::ValidMultiWord | ( | const word & | w | ) | [private, virtual] |
Perform last minute validation before effectively building multiword.
Reimplemented from automat.
References check(), components, ERROR_CRASH, locut, longest_mw, mw_analysis, over_longest, and TRACE.
std::set<std::string> locutions::acc_mw [private] |
partially build multiword.
Referenced by ComputeToken(), and ResetActions().
std::vector<word> locutions::components [private] |
store mw components in case we need to recover them
Referenced by ComputeToken(), ResetActions(), and ValidMultiWord().
std::map<std::string,std::string> locutions::locut [private] |
store multiword list
Referenced by add_locution(), and ValidMultiWord().
std::set<std::string> locutions::longest_mw [private] |
Referenced by ResetActions(), StateActions(), and ValidMultiWord().
std::list<analysis> locutions::mw_analysis [private] |
analysis assigned to the mw by the validation step
Referenced by ResetActions(), SetMultiwordAnalysis(), and ValidMultiWord().
int locutions::over_longest [private] |
count words scanned beyond last longest mw found.
Referenced by ComputeToken(), and ValidMultiWord().
std::set<std::string> locutions::prefixes [private] |
store multiword prefixes
Referenced by add_locution().