Dissemination of words in online discussion groups

  • Talk

  • Eduardo Altmann
  • Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
  • 10 de maig de 2011 a les 15:00
  • IFISC Seminar Room
  • Announcement file

Statistics of word usage is an increasing popular tool in the recent analysis of human activities in large databases. In this seminar I will report on our investigation of word usage in Usenet groups. Usenet groups were the first online discussion groups of the Internet and contains not only data about human activities but also interesting historical data on the scale of decades (e.g., exogenously-driven rise of products
and endogenously-driven rise of Internet slangs). Our aim is to go beyond frequency counts and develop statistical measures able to quantify the importance of users and topics in word usage. To deal with the strong fluctuations in word frequency, we introduce a measure of word dissemination in respect to users and topics. We observe that most words are less disseminated than a random marker with same frequency and that dissemination is positively correlated with frequency change, meaning that words concentrated in a small \"niche\'\' are more probable to decay in frequency or get \"extinct\'\'. Finally we show that users are more important than topics in determining the usage of words, suggesting that the heterogeneity of people is the single strongest factor in lexical diversity. Finally I will report on some ongoing work on a stochastic model able to discriminate between fluctuations in word frequency and fluctuation in the population size.


Detalls de contacte:

Ernesto M. Nicola

Contact form


Aquesta web utilitza cookies per a la recollida de dades amb un propòsit estadístic. Si continues navegant, vol dir que acceptes la instal·lació de la cookie.


Més informació D'accord