probabilities Class Reference

Class probabilities sets lexical probabilities for each PoS tag of each word in a sentence. More...

#include <probabilities.h>

List of all members.

Public Member Functions

 probabilities (const std::string &, const std::string &, double)
 Constructor.
void annotate (sentence &)
 Assign probabilities to tags using default options.
void annotate_word (word &)
 Assign probabilities for each analysis of given word.

Private Member Functions

void smoothing (word &)
 Smooth probabilities for the analysis of given word.
double compute_probability (const std::string &, double, const std::string &)
 Compute p(tag|suffix) using recursively shorter suffixes.
double guesser (word &, double)
 Guess possible tags, keeping some mass for previously assigned tags.

Private Attributes

RegEx RE_PunctNum
 Auxiliary regexps.
double ProbabilityThreshold
 Probability threshold for unknown words tags.
std::string Language
double CheckerOverGuesser
 Weight factor to favor checher over guesser for unknown words.
double ExactMatchBonus
 Weight factor to favor spell chekings with exact phonetic match.
double AlternativeAnalysisMass
 Mass threshold to reduce count of analysis for an alternative.
std::map< std::string, double > single_tags
 unigram probabilities
std::map< std::string,
std::map< std::string, double > > 
class_tags
 probabilities for usual ambiguity classes
std::map< std::string,
std::map< std::string, double > > 
lexical_tags
 lexical probabilities for frequent words
std::map< std::string, double > unk_tags
 list of tags and probabilities to assign to unknown words
std::map< std::string,
std::map< std::string, double > > 
unk_suffs
 list of tag frequencies for unknown word suffixes
double theeta
 unknown words suffix smoothing parameter;
std::string::size_type long_suff
 length of longest suffix

Detailed Description

Class probabilities sets lexical probabilities for each PoS tag of each word in a sentence.


Constructor & Destructor Documentation

probabilities::probabilities ( const std::string &  Lang,
const std::string &  probFile,
double  Threshold 
)

Constructor.

Create a probability assignation module, loading appropriate file.

References AlternativeAnalysisMass, CheckerOverGuesser, class_tags, ERROR_CRASH, ExactMatchBonus, Language, lexical_tags, long_suff, ProbabilityThreshold, single_tags, theeta, TRACE, unk_suffs, and unk_tags.


Member Function Documentation

void probabilities::annotate ( sentence &  se  ) 

Assign probabilities to tags using default options.

Annotate probabilities for each analysis of each word in given sentence, using given options.

References annotate_word(), and TRACE_SENTENCE.

Referenced by maco::analyze().

void probabilities::annotate_word ( word &  w  ) 

Assign probabilities for each analysis of given word.

Annotate probabilities for each analysis of given word.

move alternatives proposed by the spell checker to the analysis list, so that the tagger may take them into account.

References AlternativeAnalysisMass, CheckerOverGuesser, ExactMatchBonus, guesser(), RE_PunctNum, smoothing(), and TRACE.

Referenced by annotate().

double probabilities::compute_probability ( const std::string &  tag,
double  prob,
const std::string &  s 
) [private]

Compute p(tag|suffix) using recursively shorter suffixes.

Compute probability of a tag given a word suffix.

References theeta, and unk_suffs.

Referenced by guesser().

double probabilities::guesser ( word &  w,
double  mass 
) [private]

Guess possible tags, keeping some mass for previously assigned tags.

References compute_probability(), Language, ProbabilityThreshold, TRACE, and unk_tags.

Referenced by annotate_word().

void probabilities::smoothing ( word &  w  )  [private]

Smooth probabilities for the analysis of given word.

References class_tags, Language, lexical_tags, single_tags, TRACE, and WARNING.

Referenced by annotate_word().


Member Data Documentation

Mass threshold to reduce count of analysis for an alternative.

Referenced by annotate_word(), and probabilities().

Weight factor to favor checher over guesser for unknown words.

Referenced by annotate_word(), and probabilities().

std::map<std::string,std::map<std::string,double> > probabilities::class_tags [private]

probabilities for usual ambiguity classes

Referenced by probabilities(), and smoothing().

Weight factor to favor spell chekings with exact phonetic match.

Referenced by annotate_word(), and probabilities().

std::string probabilities::Language [private]

Referenced by guesser(), probabilities(), and smoothing().

std::map<std::string,std::map<std::string,double> > probabilities::lexical_tags [private]

lexical probabilities for frequent words

Referenced by probabilities(), and smoothing().

std::string::size_type probabilities::long_suff [private]

length of longest suffix

Referenced by probabilities().

Probability threshold for unknown words tags.

Referenced by guesser(), and probabilities().

RegEx probabilities::RE_PunctNum [private]

Auxiliary regexps.

Referenced by annotate_word().

std::map<std::string,double> probabilities::single_tags [private]

unigram probabilities

Referenced by probabilities(), and smoothing().

double probabilities::theeta [private]

unknown words suffix smoothing parameter;

Referenced by compute_probability(), and probabilities().

std::map<std::string,std::map<std::string,double> > probabilities::unk_suffs [private]

list of tag frequencies for unknown word suffixes

Referenced by compute_probability(), and probabilities().

std::map<std::string,double> probabilities::unk_tags [private]

list of tags and probabilities to assign to unknown words

Referenced by guesser(), and probabilities().


The documentation for this class was generated from the following files:
Generated on Tue Jul 27 16:29:33 2010 for FreeLing by  doxygen 1.6.3