Temporal predictive regression models for linguistic style analysis


Carmen Klaussner, Trinity College Dublin, Ireland
Carl Vogel, Trinity College Dublin, Ireland

Abstract


This paper presents work on modelling language change over time. In particular we use different feature types, i.e.~character, word stem, part-of-speech and word ngrams to predict the publication year of texts. We do this for two different corpora, one containing texts published over an approximately fifty year period, from two individual authors and one larger set containing a variety of text types and authors to approximate an average language style over time, for the same temporal span as the two authors.  Our linear regression models achieve good accuracy in the two authors case and very good results in the case of the reference set.

Keywords


Natural Language Processing; text analysis; statistics

Full Text:

PDF


DOI: http://dx.doi.org/10.15398/jlm.v6i1.177

ISSN of the paper edition: 2299-856X