Cross-checking different sources of mobility information

Maxime Lenormand1, Miguel Picornell2, Oliva G. Cantú-Ros2, Antònia Tugores1, Thomas Louail3,4, Ricardo Herranz2, Marc Barthelemy3,5, Enrique Frías-Martínez6 and José J. Ramasco1

1Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), 07122 Palma de Mallorca, Spain.
2Nommon Solutions and Technologies, calle Cañas 8, 28043 Madrid, Spain.
3Institut de Physique Théorique, CEA-CNRS (URA 2306), F-91191, Gif-sur-Yvette, France.
4Gégraphie-Cités, CNRS-Paris 1-Paris 7 (UMR 8504), 13 rue du four, FR-75006 Paris, France.
5Centre d'Analyse et de Mathématique Sociales, EHESS-CNRS (UMR 8557), 190-198 avenue de France, FR-75013 Paris, France.
6Telefónica Research, 28050 Madrid, Spain.

(April 2014)

The pervasive use of new mobile devices has allowed a better characterization in space and time of human concentrations and mobility in general. Besides its theoretical interest, describing mobility is of great importance for a number of practical applications ranging from the forecast of disease spreading to the design of new spaces in urban environments. While classical data sources, such as surveys or census, have a limited level of geographical resolution (e.g., districts, municipalities, counties are typically used) or are restricted to generic workdays or weekends, the data coming from mobile devices can be precisely located both in time and space. Most previous works have used a single data source to study human mobility patterns. Here we perform instead a cross-check analysis by comparing results obtained with data collected from three different sources: Twitter, census and cell phones. The analysis is focused on the urban areas of Barcelona and Madrid, for which data of the three types is available. We assess the correlation between the datasets on different aspects: the spatial distribution of people concentration, the temporal evolution of people density and the mobility patterns of individuals. Our results show that the three data sources are providing comparable information. Even though the representativeness of Twitter geolocated data is lower than that of mobile phone and census data, the correlations between the population density profiles and mobility patterns detected by the three datasets are close to one in a grid with cells of 2x2 and 1x1 square kilometers. This level of correlation supports the feasibility of interchanging the three data sources at the spatio-temporal scales considered.