Class probabilities sets lexical probabilities for each PoS tag of each word in a sentence. More...
#include <probabilities.h>
Public Member Functions | |
probabilities (const std::string &, const std::string &, double) | |
Constructor. | |
void | annotate (sentence &) |
Assign probabilities to tags using default options. | |
void | annotate_word (word &) |
Assign probabilities for each analysis of given word. | |
Private Member Functions | |
void | smoothing (word &) |
Smooth probabilities for the analysis of given word. | |
double | compute_probability (const std::string &, double, const std::string &) |
Compute p(tag|suffix) using recursively shorter suffixes. | |
double | guesser (word &, double) |
Guess possible tags, keeping some mass for previously assigned tags. | |
Private Attributes | |
RegEx | RE_PunctNum |
Auxiliary regexps. | |
double | ProbabilityThreshold |
Probability threshold for unknown words tags. | |
std::string | Language |
double | CheckerOverGuesser |
Weight factor to favor checher over guesser for unknown words. | |
double | ExactMatchBonus |
Weight factor to favor spell chekings with exact phonetic match. | |
double | AlternativeAnalysisMass |
Mass threshold to reduce count of analysis for an alternative. | |
std::map< std::string, double > | single_tags |
unigram probabilities | |
std::map< std::string, std::map< std::string, double > > | class_tags |
probabilities for usual ambiguity classes | |
std::map< std::string, std::map< std::string, double > > | lexical_tags |
lexical probabilities for frequent words | |
std::map< std::string, double > | unk_tags |
list of tags and probabilities to assign to unknown words | |
std::map< std::string, std::map< std::string, double > > | unk_suffs |
list of tag frequencies for unknown word suffixes | |
double | theeta |
unknown words suffix smoothing parameter; | |
std::string::size_type | long_suff |
length of longest suffix |
Class probabilities sets lexical probabilities for each PoS tag of each word in a sentence.
probabilities::probabilities | ( | const std::string & | Lang, | |
const std::string & | probFile, | |||
double | Threshold | |||
) |
Constructor.
Create a probability assignation module, loading appropriate file.
References AlternativeAnalysisMass, CheckerOverGuesser, class_tags, ERROR_CRASH, ExactMatchBonus, Language, lexical_tags, long_suff, ProbabilityThreshold, single_tags, theeta, TRACE, unk_suffs, and unk_tags.
void probabilities::annotate | ( | sentence & | se | ) |
Assign probabilities to tags using default options.
Annotate probabilities for each analysis of each word in given sentence, using given options.
References annotate_word(), and TRACE_SENTENCE.
Referenced by maco::analyze().
void probabilities::annotate_word | ( | word & | w | ) |
Assign probabilities for each analysis of given word.
Annotate probabilities for each analysis of given word.
move alternatives proposed by the spell checker to the analysis list, so that the tagger may take them into account.
References AlternativeAnalysisMass, CheckerOverGuesser, ExactMatchBonus, guesser(), RE_PunctNum, smoothing(), and TRACE.
Referenced by annotate().
double probabilities::compute_probability | ( | const std::string & | tag, | |
double | prob, | |||
const std::string & | s | |||
) | [private] |
double probabilities::guesser | ( | word & | w, | |
double | mass | |||
) | [private] |
Guess possible tags, keeping some mass for previously assigned tags.
References compute_probability(), Language, ProbabilityThreshold, TRACE, and unk_tags.
Referenced by annotate_word().
void probabilities::smoothing | ( | word & | w | ) | [private] |
Smooth probabilities for the analysis of given word.
References class_tags, Language, lexical_tags, single_tags, TRACE, and WARNING.
Referenced by annotate_word().
double probabilities::AlternativeAnalysisMass [private] |
Mass threshold to reduce count of analysis for an alternative.
Referenced by annotate_word(), and probabilities().
double probabilities::CheckerOverGuesser [private] |
Weight factor to favor checher over guesser for unknown words.
Referenced by annotate_word(), and probabilities().
std::map<std::string,std::map<std::string,double> > probabilities::class_tags [private] |
probabilities for usual ambiguity classes
Referenced by probabilities(), and smoothing().
double probabilities::ExactMatchBonus [private] |
Weight factor to favor spell chekings with exact phonetic match.
Referenced by annotate_word(), and probabilities().
std::string probabilities::Language [private] |
Referenced by guesser(), probabilities(), and smoothing().
std::map<std::string,std::map<std::string,double> > probabilities::lexical_tags [private] |
lexical probabilities for frequent words
Referenced by probabilities(), and smoothing().
std::string::size_type probabilities::long_suff [private] |
length of longest suffix
Referenced by probabilities().
double probabilities::ProbabilityThreshold [private] |
Probability threshold for unknown words tags.
Referenced by guesser(), and probabilities().
RegEx probabilities::RE_PunctNum [private] |
Auxiliary regexps.
Referenced by annotate_word().
std::map<std::string,double> probabilities::single_tags [private] |
unigram probabilities
Referenced by probabilities(), and smoothing().
double probabilities::theeta [private] |
unknown words suffix smoothing parameter;
Referenced by compute_probability(), and probabilities().
std::map<std::string,std::map<std::string,double> > probabilities::unk_suffs [private] |
list of tag frequencies for unknown word suffixes
Referenced by compute_probability(), and probabilities().
std::map<std::string,double> probabilities::unk_tags [private] |
list of tags and probabilities to assign to unknown words
Referenced by guesser(), and probabilities().