Parse-tree completion rules

Section <GRPAR> contains rules to complete the partial parsing provided by the chart parser. The tree is completed by combining chunk pairs as stated by the rules. Rules are applied from highest priority (lower values) to lowest priority (higher values), and left-to right. That is, the pair of adjacent chunks matching the most prioritary rule is found, and the rule is applied, joining both chunks in one. The process is repeated until only one chunk is left.

The rules can be enabled/disabled via the activation of global flags. Each rule may be stated to be enabled only if certain flags are on. If none of its enabling flags are on, the rule is not applied. Each rule may also state which flags have to be toggled on/off after its application, thus enabling/disabling other rule subsets.


Each line contains a rule, with the format:

priority flags context (ancestor,descendant) operation op-params flag-ops
where:

For instance, the rule:

  20 - - (np,pp<of>) top_left RELABEL - -

states that if two subtrees labelled np and pp are found contiguous in the partial tree, and the second head word has lemma of, then the later (rightmost) is added as a new child of the former (leftmost), whatever the context is, without need of any special flag active, and performing no relabelling of the new tree root.

The supported tree-building operations are the following:

The context may be specified as a sequence of chunk labels, separated by underscores ``_''. One of the chunk labels must be $$, and refers to the pair of chunks which the rule is being applied to.

For instance, the rule:

   20 - $$_vp (np,pp<of>) top_left RELABEL -

would add the rightmost chunk in the pair (pp<of>) under the leftmost (np) only if the chunk immediate to the right of the pair is labeled vp.

Other admitted labels in the context are: ? (matching exactly one chunk, with any label), * (matching zero or more chunks with any label), and OUT (matching a sentence boundary).

For instance the context np_$$_*_vp_?_OUT would match a sentence in which the focus pair of chunks is immediately after an np, and the second-to-last chunk is labeled vp.

Context conditions can be globally negated preceding them with an exclamation mark (!). E.g. !np_$$_*_vp would cause the rule to be applied only if that particular context is not satisfied.

Context condition components may also be individually negated preceding them with the symbol ~. E.g. the rule np_$$_~vp would be satisfied if the preceding chunk is labeled np and the following chunk has any label but vp.

Enabling flags may be defined and used at the grammarian's will. For instance, the rule:

20 INIT|PH1 $$_vp (np,pp<of>) last_left MATCHING npms[animal] +PH2 -INIT -PH1

Will be applied if either INIT or PH1 flags are on, the chunk pair is a np followed by a pp with head lemma of, and the context (one vp chunk following the pair) is met. Then, the deepest rightmost node matching the label npms[animal] will be sought in the left chunk, and the right chunk will be linked as one of its children. If no such node is found, the rule will not be applied.

After applying the rule, the flag PH2 will be toggled on, and the flags INIT and PH1 will be toggled off.

The only predefined flag is INIT, which is toggled on when the parsing starts. The grammarian can define any alphanumerical string as a flag, simply toggling it on in some rule.

Lluís Padró 2010-09-02