Slovak Morphosyntactic Tagset


  • Radovan Garabík Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences
  • Mária Šimková Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences


Slovak language, corpus, tagset, morphology, part of speech, grammatical categories


Morphological annotation constitutes essential, very useful and very common linguistic information presented in corpora, especially for highly inflectional languages. The morphological tagset used in the Slovak National Corpus has been designed with several goals in mind – the tags are compact and easily human-readable, without sacrificing their informational contents. The tags consist of ASCII letters, numbers and several other characters. In general, they have a variable number of symbols, but their order is obligatory, and each category or specific feature is assigned a particular character, which can be shared among several parts of speech. The tagset is highly functional and pragmatic, although some allowances had to be made to accommodate the traditional analysis of Slovak morphology and part of speech categories.


How to Cite

Garabík, R., & Šimková, M. (2012). Slovak Morphosyntactic Tagset. Journal of Language Modelling, (1), 41–63.