Is spatial information in ICT data reliable?

Maxime Lenormand1,, Thomas Louail2, Marc Barthelemy3,4 and José J. Ramasco2

1Irstea, UMR TETIS, 500 rue Francois Breton, FR-34093 Montpellier, France.
2Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), Palma de Mallorca, Spain.
3CNRS, UMR 8504 G´eographie-Cit´es, 13 rue du four, FR-75006 Paris, France.
4Institut de Physique Th´eorique, CEA-CNRS (URA 2306), FR-91191, Gif-sur-Yvette, France.
5CAMS, EHESS-CNRS (UMR 8557), 190-198 avenue de France, FR-75013 Paris, France.

(September 2016)

n increasing number of human activities are studied using data produced by individuals' ICT devices. In particular, when ICT data contain spatial information, they represent an invaluable source for analyzing urban dynamics. However, there have been relatively few contributions investigating the robustness of this type of results against fluctuations of data characteristics. Here, we present a stability analysis of higher-level information extracted from mobile phone data passively produced during an entire year by 9 million individuals in Senegal. We focus on two information-retrieval tasks: (a) the identification of land use in the region of Dakar from the temporal rhythms of the communication activity; (b) the identification of home and work locations of anonymized individuals, which enable to construct Origin-Destination (OD) matrices of commuting flows. Our analysis reveal that the uncertainty of results highly depends on the sample size, the scale and the period of the year at which the data were gathered. Nevertheless, the spatial distributions of land use computed for different samples are remarkably robust: on average, we observe more than 75% of shared surface area between the different spatial partitions when considering activity of at least 100,000 users whatever the scale. The OD matrix is less stable and depends on the scale with a share of at least 75% of commuters in common when considering all types of flows constructed from the home-work locations of 100,000 users. For both tasks, better results can be obtained at larger levels of aggregation or by considering more users. These results confirm that ICT data are very useful sources for the spatial analysis of urban systems, but that their reliability should in general be tested more thoroughly.