Slovak Morphosyntactic Tagset


Radovan Garabík, Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Slovakia
Mária Šimková, Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Slovakia

Abstract


Morphological annotation constitutes essential, very useful and very common linguistic information presented in corpora, especially for highly inflectional languages. The morphological tagset used in the Slovak National Corpus has been designed with several goals in mind – the tags are compact and easily human-readable, without sacrificing their informational contents. The tags consist of ASCII letters, numbers and several other characters. In general, they have a variable number of symbols, but their order is obligatory, and each category or specific feature is assigned a particular character, which can be shared among several parts of speech. The tagset is highly functional and pragmatic, although some allowances had to be made to accommodate traditional analysis of Slovak morphology and part of speech categories. In particular, function words are classified according to their syntactic (and semantic) roles, which is a reason why the tagset is sometimes described as a morphosyntactic one.

Keywords


tagset; Slovak; morphology; corpus

Full Text:

PDF


DOI: http://dx.doi.org/10.15398/jlm.v0i1.35

ISSN of the paper edition: 2299-856X