HMM-Tagger Parameter File

This file contains the statistical data for the Hidden Markov Model, plus some additional data to smooth the missing values. Initial probabilities, transition probabilities, lexical probabilities, etc.

The file may be generated by your own means, or using a tagged corpus and the script src/utilitities/TRAIN provided in FreeLing package. See comments in the script file to find out which format the corpus is expected to have.

The file has seven sections: <Tag>, <Bigram>, <Trigram>, <Initial>, <Word>, <Smoothing>, and <Forbidden>. Each section is closed by it corresponding tag </Tag>, </Bigram>, </Trigram>, etc.

The tag (unigram), bigram, and trigram probabilities are used in Linear Interpolation smoothing by the tagger to compute state transition probabilities ($\alpha_{ij}$ parameters of the HMM).

Lluís Padró 2010-09-02