Information theoretic models in language evolution - ScienceDirect.com

Viewer
Transcript

Electronic Notes in Discrete Mathematics 21 (2005) 97–100 www.elsevier.com/locate/endm

Information theoretic models in language evolution 1 Rudolf Ahlswede, Erdal Arikan, Lars B¨aumer, Christian Deppe Universit¨ at Bielefeld, Fakult¨ at f¨ ur Mathematik, Postfach 100131, 33501 Bielefeld, Germany Abstract We study a model for language evolution which was introduced by Nowak and Krakauer ([2]). We analyze discrete distance spaces and prove a conjecture of Nowak for all metrics with a positive semideﬁnite associated matrix. This natural class of metrics includes all metrics studied by diﬀerent authors in this connection. In particular it includes all ultra-metric spaces. Furthermore, the role of feedback is explored and multi-user scenarios are studied. In all models we give lower and upper bounds for the ﬁtness.

The human language is used to store and transmit information. Therefore there is signiﬁcant interest in the mathematical models of language development. These models aim to explain how natural selection can lead to the gradual emergence of human language. Nowak and coworkers created such a mathematical model [2], [3]. A language L in Nowak’s model is a system L = (O, X n , d, r) consisting of the following elements (i) O is a ﬁnite set of objects, O = {o1 , . . . , oN }. (ii) X is a ﬁnite set of phonemes which model the elementary sounds in the spoken language. The set X n models the set of all possible words of length n. (iii) Each object is mapped to a word by the function r : O → X n . Thus, the words for all objects have the same length n. The model allows several objects to be mapped to the same word. With some abuse of notation, 1

supported in part by INTAS-00-738

1571-0653/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.endm.2005.07.002

98

R. Ahlswede et al. / Electronic Notes in Discrete Mathematics 21 (2005) 97–100

we use L to denote the set of all words in the language, L = {xn : xn = r(oi ) for some 1 ≤ i ≤ N }. (iv) d : X × X → R+ is a measure of distance between phonemes; i.e., a function that is symmetric d(x, y) = d(y, x) and non-negative d(x, y) ≥ 0, with d(x, y) = 0 if and only if x = y. The distance between two n n words is deﬁned by dn (x , y ) = ni=1 d(xi , yi ), where xn , y n ∈ X n , xn = (x1 , . . . , xn ), y n = (y1 , . . . , yn ). (v) The model postulates that the conditional probability of the event that the listener understands the word y n ∈ L given that the speaker utters the word xn ∈ L is given by exp(−dn (xn , y n )) n n v n ∈L exp(−dn (x , v ))

p(y n |xn ) =

Nowak deﬁned the ﬁtness of a language L with words over X n as p(xn | xn ) F (L, X n ) = xn ∈L

Nowak was interested in the maximum possible ﬁtness for languages. So, he deﬁned the ﬁtness of the space X n as F (X n ) = sup{F (L, X n ) : L is a language over X n } and he posed the determination of the quantity F (X n ) for general spaces (X, d) as an open problem. He conjectured that F (X n ) = (F (X ))n when (X , d) is a metric space, i.e., when the distance function d satisﬁes the triangle inequality d(x, y) + d(y, z) ≥ d(x, z). We show that Nowak’s conjecture is true for a class of spaces deﬁned by a certain condition on the distance function. Let us call a space (X, d) a p.s.d. space if the matrix [e−d(x,y) ]x∈X ,y∈X is positive semideﬁnite. The main result is the following Theorem 1 For any p.s.d. space (X , d) where X is a ﬁnite set, the ﬁtness is given by (1)

F (X n ) = F (X )n = enR0

where (2)

R0 = R0 (X , d) = − log min λ

x

λx λy e−d(x,y)

y

where the minimum is over all probability distributions λ = (λ1 , . . . , λ|X | ) on X.

R. Ahlswede et al. / Electronic Notes in Discrete Mathematics 21 (2005) 97–100

99

In other words, for p.s.d. spaces Nowak’s conjecture holds and the ﬁtness is given by powers of eR0 . For any p.s.d. space, there exists a“channel” that (i) W(z|x)≥0, all x, z, (ii) zW(z|x) = [W(z|x)]x∈X ,z∈Z for some set Z such −d(x,y) = z W (z|x)W (z|y), all x, y. The parameter R0 1, all x, and (iii) e equals the cutoﬀ rate of the channel W in the standard information-theoretic sense. This indicates a connection between Nowak’s model and standard information-theoretic models. Indeed, the proof of the above result makes use of Gallager’s results on reliability exponents and speciﬁcally his “parallel channels theorem” [1, p. 149] to achieve the single-letterization demanded by Nowak’s conjecture. Examples of spaces (X , d) for which Nowak’s conjecture is settled by the above result are (i) the Hamming space where X is an arbitrary ﬁnite set and d(x, y) = δx,y is the Hamming metric, (ii) X is a ﬁnite set of reals and d(x, y) = |x − y|, and (iii) X is a ﬁnite set of reals and d(x, y) = (x − y)2 . All of these spaces are p.s.d. Some other partial results are as follows: (i) All ﬁnite ultra-metric spaces are p.s.d. (Recall that in an ultra-metric space for all three points a, b, c it holds that d(a, b) ≤ max{d(a, c), d(c, b)}.) (ii) All metric spaces with 3 and 4 elements are p.s.d. (iii) There exists some metric spaces with 5 elements which are not p.s.d. (iv) For every metric space (X , d) where X is a subset of reals, there exists a scaling dα (x, y) = αd(x, y) for some α > 0 and for all x, y ∈ X such that the space (X , dα ) is p.s.d. (v) Nowak’s conjecture does not hold if we do not allow multiplicity of words. We have shown that the product conjecture is true in particular for the Hamming model. The optimal ﬁtness is attained, if one use all possible words in the language. In general the memory of the individuals is restricted. For this reason we look for languages, which use only a fraction of all possible words, but have large ﬁtness. We consider simple and perfect codes: The Hamming codes ([?]). With FH (n) we denote the ﬁtness of a Hamming Code of length n. Theorem 2 The ﬁtness of the Hamming code approaches asymptotically the optimal ﬁtness. Not only limn→∞ n1 FH (n)=limn→∞ n1 F (X n ) and limn→∞ FFH(X(n) n) = 1, but even the stronger condition lim FH (n) − F (X n ) = 0

n→∞

holds. Next we show that ratewise the ﬁtness of the Hamming space is attained if we choose the middle level as a language.

100

R. Ahlswede et al. / Electronic Notes in Discrete Mathematics 21 (2005) 97–100

Theorem 3 Let L be the language in the Hamming space X n that consists of all words of weight n2 . Then the ﬁtness of the language L is ratewise optimal, i.e., 1 1 lim log F (L, X n ) − log F (X n ) = 0. n→∞ n n These theoretical models of ﬁtness of a language enable the investigations of classical information theoretical problems in this context. In particular this is true for feedback problems, transmission problems for multiway channels etc. In the feedback model we developed we show that feedback increases the ﬁtness of a language.

Acknowledgment: The authors would like to thank V. Blinovsky and E. Telatar for discussions on these problems and P. Harremoes for drawing their attention to the counter-example in the case without multiplicity.

References [1] R.G. Gallager, Information Theory and Reliable Communication, New York, Wiley, 1968. [2] M.A. Nowak and D.C. Krakauer, The evolution of language, PNAS 96, 14, 8028-8033, 1999. [3] M.A. Nowak, D.C. Krakauer, and A. Dress, An error limit for the evolution of language, Proceedings of the Royal Society Biological Sciences Series B, 266, 1433, 2131-2136, 1999.

game theoretic models of computation

Language Evolution in Populations - Linguistics and English Language

On Speeding Up Computation In Information Theoretic Learning

Constrained Information-Theoretic Tripartite Graph Clustering to ...

Information-Theoretic Identities, Part 1

Sticky Information Models in Dynare

Game Theoretic Explanations and the Evolution of Justice

Game Theoretic Equilibria and the Evolution of Learning

Empirical Game Theoretic Models: Computational Issues

Language Evolution in Populations: extending the Iterated Learning ...

Complex Systems in Language Evolution - Linguistics and English ...

Context-theoretic Semantics for Natural Language

Cue Fusion using Information Theoretic Learning - Semantic Scholar

Information-theoretic Multi-view Domain Adaptation

Information Theoretic Approach to Extractive Text ...

An Information-Theoretic Privacy Criterion for Query ...

Robust Feature Extraction via Information Theoretic ...

An Information-Theoretic Privacy Criterion for Query ...

On the Information Theoretic Limits of Learning Ising ...

An information-theoretic look at MIMO energy-efficient ...