Corpus-based measures discriminate inflection and derivation cross-linguistically
Keywords:
inflection, derivation, morphology, distributional semantics, typologyAbstract
In morphology, a distinction is commonly drawn between inflection and derivation. However, a precise definition of this distinction which reflects the way it manifests across languages remains elusive within linguistic theory, typically being based on subjective tests. In this study, we present 4 quantitative measures which use the statistics of a raw text corpus in a language to estimate to what extent a given morphological construction changes the form and distribution of lexemes. In particular, we measure both the average and the variance of this change across lexemes. Crucially, distributional information captures syntactic and semantic properties and can be operationalised by word embeddings. Based on a sample of 26 languages, we find that we can reconstruct 89±1% of the classification of constructions into inflection and derivation in UniMorph using our 4 measures, providing large-scale cross-linguistic evidence that the concepts of inflection and derivation are associated with measurable signatures in terms of form and distribution that behave consistently across a variety of languages. We also use our measures to identify in a quantitative way whether categories of inflection which have been considered noncanonical in the linguistic literature, such as inherent inflection or transpositions, appear so in terms of properties of their form and distribution. We find that while combining multiple measures reduces the amount of overlap between inflectional and derivational constructions, there are still many constructions near the model’s decision boundary between the two categories. This indicates a gradient, rather than categorical, distinction.
DOI:
https://doi.org/10.15398/jlm.v12i2.351Full article
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Coleman Haley, Edoardo M. Ponti, Sharon Goldwater
This work is licensed under a Creative Commons Attribution 4.0 International License.