Data-oriented parsing with discontinuous constituents and function tags


Andreas van Cranenburgh, 1. Huygens ING, Royal Dutch Academy of Science 2. Institute for Logic, Language and Computation, University of Amsterdam, Netherlands
Remko Scha, Institute for Logic, Language and Computation, University of Amsterdam, Netherlands
Rens Bod, Institute for Logic, Language and Computation, University of Amsterdam, Netherlands

Abstract


Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. 

We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing.

The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch.

Keywords


discontinuous constituents, statistical parsing, tree-substitution grammar

Full Text:

PDF


DOI: http://dx.doi.org/10.15398/jlm.v4i1.100

ISSN of the paper edition: 2299-856X