There are two different modules able to perform NE recognizer. The application should decide which method is to be used, and instantiate the right class.
The first NER module is the np class, which is a just a FSA that basically detects sequences of capitalized words, taking into account some functional words (e.g. Bank of England) and captialization at sentence begginings.
The second module, named bioner, is based on machine learning algorithms in Omlet&Fries libraries, and has to be trained from a tagged corpus.
The np module is simple and fast, and easy to adapt for use in new languages, provided capitalization is the basic clue for NE detection. The estimated performance of this module is about 85% correctly recognized named entities. Its API is the following:
class np: public ner, public automat { public: /// Constructor, receives a configuration file. np(const std::string &); /// ("annotate" is inherited from "automat") void annotate(sentence &); };
The bioner module has a higher precision (over 90%), but is much slower, and adaptation to new languages requires a training corpus, and some feature engineering.
class bioner: public ner { public: /// Constructor, receives the name of the configuration file. bioner ( const std::string & ); /// Recognize NEs in given sentence void annotate ( sentence & ); };