The Twitter of Babel: Mapping World Languages through Microblogging Platforms

  • Bruno Gonçalves
  • Centre de Physique Théorique, Campus de Luminy, Aix-Marseille Université, France
  • 5 de setembre de 2013 a les 12:30
  • IFISC Seminar Room
Large scale analysis and statistics of socio-technical systems that
just a few short years ago would have required the use of consistent
economic and human resources can nowadays be conveniently performed
by mining the enormous amount of digital data produced by human
activities. Although a characterization of several aspects of our
societies is emerging from the data revolution, a number of
questions concerning the reliability and the biases inherent to the
big data “proxies” of social life are still open. Here, we survey
worldwide linguistic indicators and trends through the analysis of a
large-scale dataset of microblogging posts. We show that available
data allow for the study of language geography at scales ranging
from country-level aggregation to specific city neighborhoods. The
high resolution and coverage of the data allows us to investigate
different indicators such as the linguistic homogeneity of different
countries, the touristic seasonal patterns within countries and the
geographical distribution of different languages in multilingual
regions. This work highlights the potential of geolocalized studies
of open data sources to improve current analysis and develop
indicators for major social phenomena in specific communities.

