On the power laws of language: word frequency distributions

04 Pubblicazione in atti di convegno
CHIERICHETTI FLAVIO, Kumar Ravi, Pang Bo

About eight decades ago, Zipf postulated that the word frequency distribution of languages is a power law, i.e., it is a straight line on a log-log plot. Over the years, this phenomenon has been documented and studied extensively. For many corpora, however, the empirical distribution barely resembles a power law: when plotted on a loglog scale, the distribution is concave and appears to be composed of two differently sloped straight lines joined by a smooth curve. A simple generative model is proposed to capture this phenomenon. Theword frequency distributions produced by this model are shown to match the observations both analytically and empirically. © 2017 Copyright held by the owner/author(s).

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma