How can computational techniques help us understand a centuries-old musical tradition? A novel study published in ACM’s Journal on Computing and Cultural Heritage applies Natural Language Processing and machine learning to Flamenco lyrics, shedding light on how different genres (or palos) are distinguished through language. The research, which took place at the Institute for Cross-Disciplinary Physics and Complex Systems (IFISC, CSIC-UIB), explores the hidden structure behind flamenco's oral tradition, with implications for cultural heritage preservation and the digital humanities.
IFISC researchers Pablo Rosillo-Rodes, Maxi San Miguel, and David Sánchez analyzed a corpus of over 2000 Flamenco lyrics and applied a machine learning method that uses metrics based on word frequencies to predict the category of a text to distinguish different palos based solely on their lexical (i.e., word-based) content. "We found that, besides rhythm and tonality, the lexicon itself contains enough information to classify songs into their correct palo with high accuracy", explains Pablo Rosillo-Rodes, lead author of the study and PhD researcher at IFISC. "This quantitative approach not only validates traditional knowledge but also unveils new relationships between flamenco styles".
Lexical patterns in flamenco
Using techniques from computational linguistics and network analysis, the researchers identified characteristic words for each genre and mapped the relationships between styles. "For example, we observed the known close historical ties between soleá and bulerías, and between tientos and tangos, purely from the lyrics", adds David Sánchez, senior researcher and full professor at IFISC.
Beyond classification, the study also reveals deep cultural patterns. Lyrics associated with seguiriyas are rich in vocabulary related to sorrow and spirituality, while alegrías highlight themes of celebration and geography, particularly references to the city of Cádiz. "The language of flamenco encodes the lived experiences, struggles, and celebrations of its communities", says IFISC emeritus professor Maxi San Miguel.
Moreover, by calculating lexical distances and applying network analysis techniques, the study builds a "relationship tree" between the main palos. In this tree, bulerías emerge as a central node connecting different stylistic branches, such as the Málaga cantes (fandangos and malagueñas) and styles of gypsy origin (seguiriyas and soleá).
A bridge between artificial intelligence and cultural heritage
This work represents the first large-scale computational analysis of Flamenco lyrics, and it opens new doors for research into traditional music. "By applying machine learning to intangible cultural heritage, we can preserve and better understand the complex histories embedded in oral traditions", concludes Rosillo-Rodes.
This study not only complements traditional qualitative research but also contributes to the niche field of exploring the evolution of flamenco through data-driven methods. By bridging cultural heritage and artificial intelligence, it opens exciting avenues for interdisciplinary research and offers powerful tools to better preserve and understand one of Spain’s most iconic musical traditions, recognized by UNESCO as part of the Intangible Cultural Heritage of Humanity.
Left: painting of a bailaora // Pixabay. Right: accuracy of the classification of several flamenco genres. Tangos (Ta) and tientos (Ti) are confused due to their related origin.
Rosillo-Rodes, Pablo; San Miguel, Maxi; Sánchez, David. ACM J. Comput. Cult. Herit. (2025). DOI: https://doi.org/10.1145/3748729