Words are fundamental linguistic units that connect thoughts and things through meaning. However, words do not appear independently in a text sequence. The existence of syntactic rules induces correlations among neighboring words. Using an ordinal pattern approach, we present an analysis of lexical statistical connections for 11 major languages. We find that the diverse manners that languages utilize to express word relations give rise to unique pattern structural distributions. Furthermore, fluctuations of these pattern distributions for a given language can allow us to determine both the historical period when the text was written and its author. Taken together, our results emphasize the relevance of ordinal time series analysis in linguistic typology, historical linguistics, and stylometry.

1.
M. D.
Hauser
,
N.
Chomsky
, and
W. T.
Fitch
, “
The faculty of language: What is it, who has it, and how did it evolve?
,”
Science
298
,
1569
1579
(
2002
).
2.
G. K.
Zipf
,
The Psycho-biology of Language
(
Houghton-Mifflin
,
New York
,
1935
).
3.
S. T.
Piantadosi
, “
Zipf’s word frequency law in natural language: A critical review and future directions
,”
Psichon. Bull. Rev.
21
,
1112
1130
(
2014
).
4.
B.
Mandelbrot
, “On the theory of word frequencies and on related Markovian models of discourse,” in Proceedings of Symposia in Applied Mathematics, American Mathematical Society (AMS, 1953), Vol. XII, pp. 190–219.
5.
R.
Ferrer i Cancho
and
R. V.
Solé
, “
Least effort and the origins of scaling in human language
,”
Proc. Natl. Acad. Sci. U.S.A.
100
,
788
791
(
2003
).
6.
L. Q.
Ha
,
P.
Hanna
,
J.
Ming
, and
F. J.
Smith
, “
Extending Zipf’s law to n-grams for large corpora
,”
Artif. Intell. Rev.
32
,
101
113
(
2009
).
7.
J.-B.
Michel
,
Y. K.
Shen
,
A. P.
Aiden
,
A.
Veres
,
M. K.
Gray
,
T. G. B.
Team
,
J. P.
Pickett
,
D.
Hoiberg
,
D.
Clancy
,
P.
Norvig
,
J.
Orwant
,
S.
Pinker
,
M. A.
Nowak
, and
E. L.
Aiden
, “
Quantitative analysis of culture using millions of digitized books
,”
Science
331
,
176
182
(
2011
).
8.
C.
Bandt
and
B.
Pompe
, “
Permutation entropy: A natural complexity measure for time series
,”
Phys. Rev. Lett.
88
,
174102
(
2002
).
9.
M.
Zanin
and
F.
Olivares
, “
Ordinal patterns-based methodologies for distinguishing chaos from noise in discrete time series
,”
Commun. Phys.
4
,
190
(
2021
).
10.
H. Y. D.
Sigaki
,
M.
Perc
, and
H. V.
Ribeiro
, “
History of art paintings through the lens of entropy and complexity
,”
Proc. Natl. Acad. Sci. U.S.A.
115
,
E8585
E8594
(
2018
).
11.
O. A.
Rosso
,
H.
Craig
, and
P.
Moscato
, “
Shakespeare and other English renaissance authors as characterized by information theory complexity quantifiers
,”
Physica A
388
,
916
926
(
2009
).
12.
K.
Tanaka-Ishii
and
A.
Bunde
, “
Long-range memory in literary texts: On the universal clustering of the rare words
,”
PLoS One
11
,
e0164658
(
2016
).
13.
A. A.
Tsonis
,
Chaos: From Theory to Applications
(
Plenum Press
,
New York
,
1992
).
14.
A.
Schenkel
,
J.
Zhang
, and
Y.-C.
Zhang
, “
Long range correlations in human writings
,”
Fractals
1
,
47
57
(
1993
).
15.
W.
Ebeling
and
T.
Pöschel
, “
Entropy and long-range correlations in literary English
,”
Europhys. Lett.
26
,
241
246
(
1994
).
16.
M. A.
Montemurro
and
P. A.
Pury
, “
Long-range fractal correlations in literary corpora
,”
Fractals
10
,
451
461
(
2002
).
17.
E. G.
Altmann
,
G.
Cristadoro
, and
M. D.
Esposti
, “
On the origin of long-range correlations in texts
,”
Proc. Natl. Acad. Sci. U.S.A.
109
,
11582
11587
(
2012
).
18.
E.
Alvarez-Lacalle
,
B.
Dorow
,
J.-P.
Eckmann
, and
E.
Moses
, “
Hierarchical structures induce long-range dynamical correlations in written texts
,”
Proc. Natl. Acad. Sci. U.S.A.
103
,
7956
7961
(
2006
).
19.
R. G.
Gordon
,
Ethnologue. Languages of the World
, 15th ed. (
SIL International
,
Dallas, TX
,
2005
).
20.
C.
Christodouloupoulos
and
M.
Steedman
, “
A massively parallel corpus: The Bible in 100 languages
,”
Lang. Resour. Eval.
49
,
375
395
(
2015
).
21.
A.
Mehri
and
M.
Jamaati
, “
Variation of Zipf’s exponent in one hundred live languages: A study of the holy Bible translations
,”
Phys. Lett. A
381
,
2470
2477
(
2017
).
22.
G.
Sampson
,
The Concise Cambridge History of English Literature
, 3rd ed. (
Cambridge University Press
,
Cambridge
,
1970
).
23.
“Project Gutenberg,” Urbana, IL, 2021, see https://www.gutenberg.org/.
24.
S.
Bird
,
E.
Klein
, and
E.
Loper
,
Natural Language Processing With Python: Analyzing Text With The Natural Language Toolkit
(
O’Reilly Media, Inc.
,
2009
).
25.
Á.
Corral
,
G.
Boleda
, and
R.
Ferrer-i Cancho
, “
Zipf’s law for word frequencies: Word forms versus lemmas in long texts
,”
PLoS One
10
,
e0129031
(
2015
).
26.
D.
Crystal
,
The Cambridge Encyclopedia of Language
, 2nd ed. (
Cambridge University Press
,
Cambridge
,
1997
).
27.
J. H.
Greenberg
, “Some universals of grammar with particular reference to the order of meaningful elements,” in Universals of Language, edited by J. H. Greenberg (MIT Press, 1963), Vol. 298, pp. 73–113.
28.
M. A.
Montemurro
and
D. H.
Zanette
, “
Universal entropy of word ordering across linguistic families
,”
PLoS One
6
,
e19875
(
2011
).
29.
L.
Zunino
,
F.
Olivares
,
H. V.
Ribeiro
, and
O. A.
Rosso
, “
Permutation Jensen-Shannon distance: A versatile and fast symbolic tool for complex time-series analysis
,”
Phys. Rev. E
105
,
045310
(
2022
).
30.
J.
Grieve
, “
Quantitative authorship attribution: An evaluation of techniques
,”
Lit. Linguist. Comput.
22
,
251
270
(
2007
).
31.
C. E.
Shannon
, “
Prediction and entropy of printed English
,”
Bell Syst. Tech. J.
30
,
50
64
(
1951
).
32.
Dataset: “Ordinal analysis of lexical patterns. The Bible in 11 languages and a historical corpus of English works,”
Figshare
. .
33.
P.
Norvig
, “Natural language corpus data,” in Beautiful Data, edited by T. Segaran and J. Hammerbacher (O’Reilly Media, 2009), pp. 219–242.
34.
Y.
Lin
,
J.-B.
Michel
,
E.
Lieberman Aiden
,
J.
Orwant
,
W.
Brockman
, and
S.
Petrov
, “
Syntactic annotations for the google books ngram corpus
,” in
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics
(Association for Computational Linguistics, 2012), pp. 169–174.
You do not currently have access to this content.