A syntactic component for Vietnamese language processing

Authors

  • Phuong Le-Hong Hanoi University of Science
  • Azim Roussanaly LORIA, Nancy
  • Thi-Minh-Huyen Nguyen Hanoi University of Science

Keywords:

language, parsing, syntactic component, tree-adjoining grammar, Vietnamese

Abstract

This paper presents the development of a syntactic component for the Vietnamese language. We first discuss the construction of a lexicalized tree-adjoining grammar using an automatic extraction approach. We then present the construction and evaluation of a deep syntactic parser based on the extracted grammar. This is a complete system integrating necessary tools to process Vietnamese text, which permits to take as input raw texts and produce syntactic structures. A dependency annotation scheme for Vietnamese and an algorithm for extracting dependency structures from derivation trees are also proposed. At present, this is the first Vietnamese parsing system capable of producing both constituency and dependency analyses with encouraging performances: 69.33% and 73.21% for constituency and dependency analysis accuracy, respectively. The parser also compares favourably to a statistical parser which is trained and tested on the same data sets.

DOI:

https://doi.org/10.15398/jlm.v3i1.89

Full article

Published

2015-06-29

How to Cite

Le-Hong, P., Roussanaly, A., & Nguyen, T.-M.-H. (2015). A syntactic component for Vietnamese language processing. Journal of Language Modelling, 3(1), 145–184. https://doi.org/10.15398/jlm.v3i1.89