Twitter lexical analysis reveals the existence of distinct cultural regions in the U.S.
An international team of researchers, led
by scientists from IFISC (UIB-CSIC), has mapped the different cultural regions
in the United States of America through a
lexical analysis of the content that citizens themselves post on their social
networks. The results show a clear separation between Northern and Southern
cultures, the latter influenced by the African-American population, as well as subtler
differences between the East-West axis and urban or rural populations. To
obtain the extent of these regions, they
calculated the occurrence frequency of words within 3.3 billion geolocated
tweets, published between 2015 and 2021.
This allowed them to find the hotspots where discussions or debates on
specific topics were held. These results have been recently appeared in
Nature’s Humanities and Social Sciences Communications.
The idea of the existence of cultural
areas in the United States of America is used as a case study in various fields
of social sciences. However, the selection of common characteristics that make
up a cultural region can be arbitrary and influenced by prejudices and biases.
Therefore, an approach is needed to identify these cultural regions in an
unbiased and more objective manner.
Taking advantage of the enormous amount of data generated on the internet,
especially through social networks, represents a relatively new opportunity
with high potential.
The researchers decided to analyze the
case of the United States for several reasons, including having a huge set of
geolocated Twitter data. In addition, the vast majority of Americans speak the
same language (English), which is crucial for using the analysis tools. Another
relevant aspect, the authors explain, is that the history of the USA is relatively recent but rich and varied,
so the formation of different cultural regions within the same national
territory is possible.
The method presented in this paper is
based on the principle that cultural affiliation can be inferred from the
topics that people discuss with each other. The more messages sent from a
region, the greater the interest of the population of that area in the topics
contained in the tweets. Specifically, the
authors measured regional variations in written discourse in U.S. social
networks, using frequency distributions of content words in geolocated
tweets to find those regional hotspots where certain topics appeared more
frequently than others. From there, principal components of regional variation
were derived and hierarchical clustering analysis was applied to derive the
distinct cultural areas and the topics of discussion that define them.
The study found a clear North-South
separation influenced primarily by African-American culture, as well as other
divisions that provide a complete picture of modern American cultural areas.
While the work has confirmed that factors such as ethnicity and religion are
important in defining American cultural regions, it has also found substantial
variations in the relevance of these factors across the country. In other
words, the study not only mapped cultural regions, but also identified the cultural factors that are important in defining
these regions. In addition, the analysis identified other subtler cultural
patterns such as attention to social interaction, interest in outdoor
activities, family or leisure. The identification of these patterns is a
novelty in the analysis of the U.S. society, as they are difficult to capture
through analysis of traditional sources.
The authors of the study conclude that,
although their method has only analyzed one genre of American English, it could also be applied to any big data
resource with linguistic value and provide a basis for a more complete
picture of the cultural landscape, both for the U.S. case and for different
nations.Louf, T., Gonçalves, B., Ramasco, J.J. et al. American cultural regions mapped through the lexical analysis of social media. Humanit Soc Sci Commun 10, 133 (2023). https://doi.org/10.1057/s41599-023-01611-3El Diari de la UIB
http://ifisc.uib-csic.es/en/news/twitter-lexical-analysis-reveals-existence…