Journal of Language Modelling https://jlm.ipipan.waw.pl/index.php/JLM <p>Journal of Language Modelling is a free (for readers and authors alike) open-access peer-reviewed journal aiming to bridge the gap between theoretical linguistics and natural language processing. Although typical articles are concerned with linguistic generalisations – either with their application in natural language processing, or with their discovery in language corpora – possible topics range from linguistic analyses which are sufficiently precise to be implementable to mathematical models of aspects of language, and further to computational systems making non-trivial use of linguistic insights.</p> <p><br />Papers are reviewed within less than three months of their receipt, and they appear as soon as they have been accepted – there are no delays typical of traditional paper journals. Accepted articles are then collected in half-yearly numbers and yearly volumes, with continuous page numbering, and are made available as hard copies via print on demand, at a nominal fee. On the other hand, Journal of Language Modelling has a fully traditional view of quality: all papers are carefully refereed by at least three reviewers (usually including at least one member of the Editorial Board) and they are only accepted if they adhere to the highest scientific, typographic and stylistic standards.<br /><br />Apart from full-length articles, the journal also accepts squibs and polemics with other papers. All journal content appears on the <a href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 <span class="cc-license-title">International</span> Licence</a>. JLM is indexed by <a href="https://www.scopus.com/results/results.uri?src=s&amp;sot=b&amp;sdt=b&amp;origin=searchbasic&amp;rr=&amp;sl=39&amp;s=SRCTITLE(Journal%20of%20Language%20Modelling)&amp;searchterm1=Journal%20of%20Language%20Modelling&amp;searchTerms=&amp;connectors=&amp;field1=SRCTITLE&amp;fields=">SCOPUS</a>, <a href="https://dbh.nsd.uib.no/publiseringskanaler/erihplus/periodical/info?id=480322">ERIH PLUS</a>, <a href="http://dblp.uni-trier.de/db/journals/jlm/">DBLP</a>, <a href="https://doaj.org/toc/a339b4740e97425ea7e5a2a32655eba5">DOAJ</a>, <a href="https://www.ebsco.com/">EBSCO</a>, <a href="http://www.linguisticsabstracts.com/">Linguistics Abstracts Online</a>, <a href="https://www.mla.org/Publications/MLA-International-Bibliography/MLA-Directory-of-Periodicals/About-the-Directory-of-Periodicals">MLA Directory of Periodicals</a>, and the Polish <a href="http://www.nauka.gov.pl/aktualnosci-ministerstwo/juz-sa-nowe-listy-punktowanych-czasopism-na-2015-rok.html">Ministry of Science and Education</a> (list B). JLM is also a member of <a href="http://oaspa.org/member/journal-of-language-modelling/">OASPA</a>.<br /><br />In order to submit an article, you have to be registered. Further <strong><a href="https://jlm.ipipan.waw.pl/index.php/JLM/about/submissions">submission instructions for Authors are available here</a></strong>.</p> Institute of Computer Science of the Polish Academy of Sciences en-US Journal of Language Modelling 2299-856X <p><a href="http://creativecommons.org/licenses/by/3.0/" rel="license"><img style="border-width: 0;" src="http://i.creativecommons.org/l/by/3.0/80x15.png" alt="Creative Commons License" /></a><br />All content is licensed under the <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank" rel="license noopener">Creative Commons Attribution 4.0 International Licence</a>.</p> QRGS – Question Responses Generation via crowdsourcing https://jlm.ipipan.waw.pl/index.php/JLM/article/view/372 <p>QRGS stands for the Question Responses Generation System. It is an online game-like framework designed for gathering various types of question responses. A QRGS user is asked to read a simple story and impersonate its main character. As the story unfolds the user is confronted with four questions and (s)he is expected to answer these in the way the main character would. In this way, we obtain responses to questions of a desired type. The data gathered via QRGS is a useful supplement to the linguistic data already present in language corpora – especially for languages for which such resources are sparse. As such, it opens the possibility for better understanding of the use of questions in natural language dialogues and analysing the response space of such questions. In this paper, we present the main idea of QRGS and the results of five studies (in Polish and in English) that test the framework. Our discussion addresses issues concerning the efficiency and accuracy of the proposed approach. We also discuss the availability of the QRGS and its potential future improvements.</p> Paweł Łupkowski Jonathan Ginzbug Ewelina Chmurska Adrianna Płatosz Aleksandra Kwiecień Barbara Adamska Magdalena Szkalej Copyright (c) 2024 Pawel Lupkowski, Jonathan Ginzbug, Ewelina Chmurska, Adrianna Płatosz, Aleksandra Kwiecień, Barbara Adamska, Magdalena Szkalej https://creativecommons.org/licenses/by/4.0 2024-09-03 2024-09-03 12 1 213–270 213–270 The expected sum of edge lengths in planar linearizations of trees https://jlm.ipipan.waw.pl/index.php/JLM/article/view/362 <p>Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntacticallydependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or their variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time O(n). Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a O(n)-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline.</p> Lluís Alemany-Puig Ramon Ferrer-i-Cancho Copyright (c) 2024 Lluís Alemany-Puig, Ramon Ferrer-i-Cancho https://creativecommons.org/licenses/by/4.0 2024-02-27 2024-02-27 12 1 1–42 1–42 10.15398/jlm.v12i1.362 Detecting inflectional patterns for Croatian verb stems using class activation mappings https://jlm.ipipan.waw.pl/index.php/JLM/article/view/347 <p>All verbal forms in the Croatian language can be derived from two basic forms: the infinitive and the present stems. In this paper, we present a neural computation model that takes a verb in an infinitive form and finds a mapping to a present form. The same model can be applied vice-versa, i.e. map a verb from its present form to its infinitive form. Knowing the present form of a given verb, one can deduce its inflections using grammatical rules. We experiment with our model on the Croatian language, which belongs to the Slavic group of languages. The model learns a classifier through these two classification tasks and uses class activation mapping to find characters in verbs contributing to classification. The model detects patterns that follow established grammatical rules for deriving the present stem form from the infinitive stem form and vice-versa. If mappings can be found between such slots, the rest of the slots can be deduced using a rule-based system.</p> Domagoj Ševerdija Rebeka Čorić Marko Orešković Lucian Šošić Copyright (c) 2024 Domagoj Ševerdija, Rebeka Čorić, Lucian Šošić, Marko Orešković https://creativecommons.org/licenses/by/4.0 2024-05-21 2024-05-21 12 1 43–68 43–68 10.15398/jlm.v12i1.347 Control, inner topicalisation, and focus fronting in Mandarin Chinese: modelling in parallel constraint-based grammatical architecture https://jlm.ipipan.waw.pl/index.php/JLM/article/view/365 <p>This paper proposes a formal analysis of two displacement phenomena in Mandarin Chinese, namely inner topicalisation and focus fronting, capturing their correlational relationships with control and complementation. It examines a range of relevant data, including corpus examples, to derive empirical generalisations. Acceptability-judgment tasks, followed by mixed-effects statistical models, were conducted to provide additional evidence. This paper presents a constraint-based lexicalist proposal that is couched in the framework of Lexical-Functional Grammar (LFG). The lexicon plays an important role in regulating the behaviour of complementation verbs as they participate in the displacement phenomena. Unlike previous analyses that cast inner topicalisation and focus fronting as restructuring phenomena, this lexicalist proposal does not rely on hypothesised clause-size differences. It captures the empirical properties more accurately and accounts for a wider range of empirical patterns. Adopting the formally explicit framework of LFG, this proposal uses constraints that have mathematical precision. The constraints are computationally implemented using the grammar engineering tool Xerox Linguistic Environment, safeguarding their precision.</p> Chit-Fung Lam Copyright (c) 2024 Chit-Fung Lam https://creativecommons.org/licenses/by/4.0 2024-06-14 2024-06-14 12 1 69–153 69–153 10.15398/jlm.v12i1.365 On German verb sense disambiguation: A three-part approach based on linking a sense inventory (GermaNet) to a corpus through annotation (TGVCorp) and using the corpus to train a VSD classifier (TTvSense) https://jlm.ipipan.waw.pl/index.php/JLM/article/view/356 <p>We develop a three-part approach to Verb Sense Disambiguation (VSD) in German. After considering a set of lexical resources and corpora, we arrive at a statistically motivated selection of a subset of verbs and their senses from GermaNet. This sub-inventory is then used to disambiguate the occurrences of the corresponding verbs in a corpus resulting from the union of TüBa-D/Z, Salsa, and E-VALBU. The corpus annotated in this way is called TGVCorp. It is used in the third part of the paper for training a classifier for VSD and for its comparative evaluation with a state-of-the-art approach in this research area, namely EWISER. Our simple classifier outperforms the transformer-based approach on the same data in both accuracy and speed in German but not in English and we discuss possible reasons.</p> Dominik Mattern Wahed Hemati Andy Lücking Alexander Mehler Copyright (c) 2024 Dominik Mattern, Wahed Hemati, Andy Lücking, Alexander Mehler https://creativecommons.org/licenses/by/4.0 2024-09-03 2024-09-03 12 1 155–212 155–212