Corpus-based measures discriminate inflection and derivation cross-linguistically

Authors

Keywords:

inflection, derivation, morphology, distributional semantics, typology

Abstract

In morphology, a distinction is commonly drawn between inflection and derivation. However, a precise definition of this distinction which reflects the way it manifests across languages remains elusive within linguistic theory, typically being based on subjective tests. In this study, we present 4 quantitative measures which use the statistics of a raw text corpus in a language to estimate to what extent a given morphological construction changes the form and distribution of lexemes. In particular, we measure both the average and the variance of this change across lexemes. Crucially, distributional information captures syntactic and semantic properties and can be operationalised by word embeddings. Based on a sample of 26 languages, we find that we can reconstruct 89±1% of the classification of constructions into inflection and derivation in UniMorph using our 4 measures, providing large-scale cross-linguistic evidence that the concepts of inflection and derivation are associated with measurable signatures in terms of form and distribution that behave consistently across a variety of languages. We also use our measures to identify in a quantitative way whether categories of inflection which have been considered noncanonical in the linguistic literature, such as inherent inflection or transpositions, appear so in terms of properties of their form and distribution. We find that while combining multiple measures reduces the amount of overlap between inflectional and derivational constructions, there are still many constructions near the model’s decision boundary between the two categories. This indicates a gradient, rather than categorical, distinction.

DOI:

https://doi.org/10.15398/jlm.v12i2.351

Full article

Published

2024-12-10

How to Cite

Haley, C., Ponti, E. M., & Goldwater, S. (2024). Corpus-based measures discriminate inflection and derivation cross-linguistically. Journal of Language Modelling, 12(2), 477–529. https://doi.org/10.15398/jlm.v12i2.351

Issue

Section

Special issue on Computational Approaches to Morphological Typology