Temporal predictive regression models for linguistic style analysis
Keywords:Natural Language Processing, text analysis, statistics
This paper presents work on modelling language change over time. In particular we use different feature types, i.e.~character, word stem, part-of-speech and word ngrams to predict the publication year of texts. We do this for two different corpora, one containing texts published over an approximately fifty year period, from two individual authors and one larger set containing a variety of text types and authors to approximate an average language style over time, for the same temporal span as the two authors. Our linear regression models achieve good accuracy in the two authors case and very good results in the case of the reference set.
How to Cite
All content is licensed under the Creative Commons Attribution 4.0 International License.