Inferring inflection classes with description length

Sacha Beniamine, Université Paris Diderot, Laboratoire de linguistique formelle, France
Olivier Bonami, Université Paris Diderot, Laboratoire de linguistique formelle, France
Benoît Sagot, Inria, France


We discuss the notion of an inflection class system, a traditional ingredient of the description of inflection systems of nontrivial complexity. We distinguish systems of microclasses, which partition a set of lexemes in classes with identical behavior, and systems of macroclasses, which group lexemes that are similar enough in a few larger classes. On the basis of the intuition that macroclasses should contribute to a concise description of the system, we propose one algorithmic method for inferring macroclasses from raw inflectional paradigms, based on minimisation of the description length of the system under a given strategy for identifying morphological alternations in paradigms. We then exhibit classifications produced by our implementation on French and European Portuguese conjugation data, and argue that they constitute an appropriate systematisation of traditional classifications. To arrive at such a concincing systematisation, it is crucial though that we use a local approach to class similarity (based on pairwise comparisons of paradigm cells) rather than a global approach (based on simultaneous comparison of all cells). We conclude that it is indeed possible to infer inflectional macroclasses objectively.


morphology; MDL; inflection classes

