The class np implements a dummy proper noun recognizer. More...
#include <np.h>
Public Member Functions | |
np (const std::string &) | |
Constructor. | |
void | annotate (sentence &) |
Specify that "annotate" must be inherited from "automat" and not from "ner". | |
Private Member Functions | |
int | ComputeToken (int, sentence::iterator &, sentence &) |
Compute the right token code for word j from given state. | |
void | ResetActions () |
Reset flag about capitalized noun at sentence start. | |
void | StateActions (int, int, int, sentence::const_iterator) |
Perform necessary actions in "state" reached from state "origin" via word j interpreted as code "token": Basically, set flag about capitalized noun at sentence start. | |
void | SetMultiwordAnalysis (sentence::iterator, int) |
Set the appropriate lemma and parole for the new multiword. | |
bool | ValidMultiWord (const word &) |
Perform last minute validation before effectively building multiword. | |
sentence::iterator | BuildMultiword (sentence &, sentence::iterator, sentence::iterator, int, bool &) |
Private function to re-arrange sentence when match found. | |
Private Attributes | |
std::set< std::string > | func |
set of function words | |
std::set< std::string > | punct |
set of special punctuation tags | |
std::set< std::string > | names |
set of words to be considered possible NPs at sentence beggining | |
std::map< std::string, int > | ignore_tags |
set of words/tags to be ignored as NE parts, even if they are capitalized | |
std::map< std::string, int > | ignore_words |
bool | initialNoun |
it is a noun at the beggining of the sentence | |
RegEx | RE_NounAdj |
RegEx | RE_Closed |
RegEx | RE_DateNumPunct |
The class np implements a dummy proper noun recognizer.
np::np | ( | const std::string & | npFile | ) |
Constructor.
Create a proper noun recognizer.
References ERROR_CRASH, automat::Final, FUN, func, ignore_tags, ignore_words, IN, automat::initialState, MAX_STATES, MAX_TOKENS, names, ner::NE_tag, NP, punct, RE_Closed, RE_DateNumPunct, RE_NounAdj, ner::splitNPs, STOP, automat::stopState, ner::Title_length, TK_mFun, TK_mUpper, TK_sNounUpp, TK_sUnkUpp, TRACE, and automat::trans.
void np::annotate | ( | sentence & | se | ) |
Specify that "annotate" must be inherited from "automat" and not from "ner".
Reimplemented from automat.
sentence::iterator np::BuildMultiword | ( | sentence & | se, | |
sentence::iterator | start, | |||
sentence::iterator | end, | |||
int | fs, | |||
bool & | built | |||
) | [private, virtual] |
Private function to re-arrange sentence when match found.
Arrange the sentence grouping all words from start to end in a multiword.
Reimplemented from automat.
References ner::NE_tag, ResetActions(), SetMultiwordAnalysis(), ner::splitNPs, TRACE, and ValidMultiWord().
int np::ComputeToken | ( | int | state, | |
sentence::iterator & | j, | |||
sentence & | se | |||
) | [private, virtual] |
Compute the right token code for word j from given state.
Implements automat.
References ignore_tags, ignore_words, punct, and TK_other.
void np::ResetActions | ( | ) | [private, virtual] |
Reset flag about capitalized noun at sentence start.
Implements automat.
References initialNoun.
Referenced by BuildMultiword().
void np::SetMultiwordAnalysis | ( | sentence::iterator | i, | |
int | fstate | |||
) | [private, virtual] |
Set the appropriate lemma and parole for the new multiword.
Implements automat.
References initialNoun, ner::NE_tag, and TRACE.
Referenced by BuildMultiword().
void np::StateActions | ( | int | origin, | |
int | state, | |||
int | token, | |||
sentence::const_iterator | j | |||
) | [private, virtual] |
Perform necessary actions in "state" reached from state "origin" via word j interpreted as code "token": Basically, set flag about capitalized noun at sentence start.
Implements automat.
References initialNoun, NP, TK_sNounUpp, and TRACE.
bool np::ValidMultiWord | ( | const word & | w | ) | [private, virtual] |
Perform last minute validation before effectively building multiword.
Reimplemented from automat.
References ner::Title_length.
Referenced by BuildMultiword().
std::map<std::string,int> np::ignore_tags [private] |
set of words/tags to be ignored as NE parts, even if they are capitalized
Referenced by ComputeToken(), and np().
std::map<std::string,int> np::ignore_words [private] |
Referenced by ComputeToken(), and np().
bool np::initialNoun [private] |
it is a noun at the beggining of the sentence
Referenced by ResetActions(), SetMultiwordAnalysis(), and StateActions().
std::set<std::string> np::names [private] |
set of words to be considered possible NPs at sentence beggining
Referenced by np().
std::set<std::string> np::punct [private] |
set of special punctuation tags
Referenced by ComputeToken(), and np().
RegEx np::RE_Closed [private] |
Referenced by np().
RegEx np::RE_DateNumPunct [private] |
Referenced by np().
RegEx np::RE_NounAdj [private] |
Referenced by np().