Algoritmos de agrupamiento y lingüística de corpus: ortografía y léxico en documentos mallorquines del siglo XVIII

Louf, T.; Sánchez, D.; Miguel Franco, R.
Scripta manent. Historia del español, documentación archivística y humanidades digitales (edited by Calderón Campos, M. and González Sopeña, I.), Peter Lang, 563-586 (2023)

The aim of this work is to characterise the historical variety of Spanish in contact with Catalan in Majorca by means of the application of classification algorithms. For this purpose, a corpus of 18th century documentation from Corpus Mallorca has been used, together with a control corpus from peninsular Spain. After processing the texts, an unsupervised and a supervised algorithm were applied to the subcorpuses; the word lists thus obtained were analysed statistically and in their chronological development. The results of the study point to important differences, especially in the field of lexical orthography: the Majorcan documents show features already well established at that moment, some due to contact with Catalan and some others related to particular writing traditions. Therefore, this interdisciplinary methodology has revealed characteristics of the historical variety of Majorcan Spanish that had not thus far been analyzed.

