On the shape of semantic space - what can we infer from large-scale statistical properties of texts?

Czegel,Daniel (Supervisor: Maxi San Miguel)
Master Thesis (2017)

The large amount of digitized linguistic data opens up the unique possibility
of using the methodology of complex systems to understand high-level human
cognitive processes. Two such issues are i) the way we categorize the
continuous space of real-world features into discrete concepts, and ii) the
way we use language to copy a line a thought from one brain to another. In
this work I address both questions by formulating a simple text generation
model which reproduces the three major characteristic large-scale statistical
laws of human language streams, namely Zipf’s law, Heaps’ law and
Burstiness. Furthermore, the generation itself can be described as a random
walk on a scale-free, highly clustered and low dimensional complex network,
suggesting that this class of networks is appropriate as a minimal model of
the semantic space. Entangling the global characteristics of the semantic
space is an inevitable step towards analyzing texts as trajectories in such
a space, with promising applications such as author or style identification,
personal disorder diagnosis, or the evolution of cultural traits mirrored by
text production characteristics.


Aquesta web utilitza cookies per a la recollida de dades amb un propòsit estadístic. Si continues navegant, vol dir que acceptes la instal·lació de la cookie.


Més informació D'accord