Electronic Notes in Discrete Mathematics 21 (2005) 97–100 www.elsevier.com/locate/endm

Information theoretic models in language evolution 1 Rudolf Ahlswede, Erdal Arikan, Lars B¨aumer, Christian Deppe Universit¨ at Bielefeld, Fakult¨ at f¨ ur Mathematik, Postfach 100131, 33501 Bielefeld, Germany Abstract We study a model for language evolution which was introduced by Nowak and Krakauer ([2]). We analyze discrete distance spaces and prove a conjecture of Nowak for all metrics with a positive semidefinite associated matrix. This natural class of metrics includes all metrics studied by different authors in this connection. In particular it includes all ultra-metric spaces. Furthermore, the role of feedback is explored and multi-user scenarios are studied. In all models we give lower and upper bounds for the fitness.

The human language is used to store and transmit information. Therefore there is significant interest in the mathematical models of language development. These models aim to explain how natural selection can lead to the gradual emergence of human language. Nowak and coworkers created such a mathematical model [2], [3]. A language L in Nowak’s model is a system L = (O, X n , d, r) consisting of the following elements (i) O is a finite set of objects, O = {o1 , . . . , oN }. (ii) X is a finite set of phonemes which model the elementary sounds in the spoken language. The set X n models the set of all possible words of length n. (iii) Each object is mapped to a word by the function r : O → X n . Thus, the words for all objects have the same length n. The model allows several objects to be mapped to the same word. With some abuse of notation, 1

supported in part by INTAS-00-738

1571-0653/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.endm.2005.07.002

98

R. Ahlswede et al. / Electronic Notes in Discrete Mathematics 21 (2005) 97–100

we use L to denote the set of all words in the language, L = {xn : xn = r(oi ) for some 1 ≤ i ≤ N }. (iv) d : X × X → R+ is a measure of distance between phonemes; i.e., a function that is symmetric d(x, y) = d(y, x) and non-negative d(x, y) ≥ 0, with d(x, y) = 0 if and only  if x = y. The distance between two n n words is defined by dn (x , y ) = ni=1 d(xi , yi ), where xn , y n ∈ X n , xn = (x1 , . . . , xn ), y n = (y1 , . . . , yn ). (v) The model postulates that the conditional probability of the event that the listener understands the word y n ∈ L given that the speaker utters the word xn ∈ L is given by exp(−dn (xn , y n )) n n v n ∈L exp(−dn (x , v ))

p(y n |xn ) = 

Nowak defined the fitness of a language L with words over X n as  p(xn | xn ) F (L, X n ) = xn ∈L

Nowak was interested in the maximum possible fitness for languages. So, he defined the fitness of the space X n as F (X n ) = sup{F (L, X n ) : L is a language over X n } and he posed the determination of the quantity F (X n ) for general spaces (X, d) as an open problem. He conjectured that F (X n ) = (F (X ))n when (X , d) is a metric space, i.e., when the distance function d satisfies the triangle inequality d(x, y) + d(y, z) ≥ d(x, z). We show that Nowak’s conjecture is true for a class of spaces defined by a certain condition on the distance function. Let us call a space (X, d) a p.s.d. space if the matrix [e−d(x,y) ]x∈X ,y∈X is positive semidefinite. The main result is the following Theorem 1 For any p.s.d. space (X , d) where X is a finite set, the fitness is given by (1)

F (X n ) = F (X )n = enR0

where (2)

R0 = R0 (X , d) = − log min λ

 x

λx λy e−d(x,y)

y

where the minimum is over all probability distributions λ = (λ1 , . . . , λ|X | ) on X.

R. Ahlswede et al. / Electronic Notes in Discrete Mathematics 21 (2005) 97–100

99

In other words, for p.s.d. spaces Nowak’s conjecture holds and the fitness is given by powers of eR0 . For any p.s.d. space, there exists a“channel” that (i) W(z|x)≥0, all x, z, (ii) zW(z|x) = [W(z|x)]x∈X ,z∈Z for some set Z such   −d(x,y) = z W (z|x)W (z|y), all x, y. The parameter R0 1, all x, and (iii) e equals the cutoff rate of the channel W in the standard information-theoretic sense. This indicates a connection between Nowak’s model and standard information-theoretic models. Indeed, the proof of the above result makes use of Gallager’s results on reliability exponents and specifically his “parallel channels theorem” [1, p. 149] to achieve the single-letterization demanded by Nowak’s conjecture. Examples of spaces (X , d) for which Nowak’s conjecture is settled by the above result are (i) the Hamming space where X is an arbitrary finite set and d(x, y) = δx,y is the Hamming metric, (ii) X is a finite set of reals and d(x, y) = |x − y|, and (iii) X is a finite set of reals and d(x, y) = (x − y)2 . All of these spaces are p.s.d. Some other partial results are as follows: (i) All finite ultra-metric spaces are p.s.d. (Recall that in an ultra-metric space for all three points a, b, c it holds that d(a, b) ≤ max{d(a, c), d(c, b)}.) (ii) All metric spaces with 3 and 4 elements are p.s.d. (iii) There exists some metric spaces with 5 elements which are not p.s.d. (iv) For every metric space (X , d) where X is a subset of reals, there exists a scaling dα (x, y) = αd(x, y) for some α > 0 and for all x, y ∈ X such that the space (X , dα ) is p.s.d. (v) Nowak’s conjecture does not hold if we do not allow multiplicity of words. We have shown that the product conjecture is true in particular for the Hamming model. The optimal fitness is attained, if one use all possible words in the language. In general the memory of the individuals is restricted. For this reason we look for languages, which use only a fraction of all possible words, but have large fitness. We consider simple and perfect codes: The Hamming codes ([?]). With FH (n) we denote the fitness of a Hamming Code of length n. Theorem 2 The fitness of the Hamming code approaches asymptotically the optimal fitness. Not only limn→∞ n1 FH (n)=limn→∞ n1 F (X n ) and limn→∞ FFH(X(n) n) = 1, but even the stronger condition lim FH (n) − F (X n ) = 0

n→∞

holds. Next we show that ratewise the fitness of the Hamming space is attained if we choose the middle level as a language.

100

R. Ahlswede et al. / Electronic Notes in Discrete Mathematics 21 (2005) 97–100

Theorem 3 Let L be the language in the Hamming space X n that consists of all words of weight n2 . Then the fitness of the language L is ratewise optimal, i.e., 1 1 lim log F (L, X n ) − log F (X n ) = 0. n→∞ n n These theoretical models of fitness of a language enable the investigations of classical information theoretical problems in this context. In particular this is true for feedback problems, transmission problems for multiway channels etc. In the feedback model we developed we show that feedback increases the fitness of a language.

Acknowledgment: The authors would like to thank V. Blinovsky and E. Telatar for discussions on these problems and P. Harremoes for drawing their attention to the counter-example in the case without multiplicity.

References [1] R.G. Gallager, Information Theory and Reliable Communication, New York, Wiley, 1968. [2] M.A. Nowak and D.C. Krakauer, The evolution of language, PNAS 96, 14, 8028-8033, 1999. [3] M.A. Nowak, D.C. Krakauer, and A. Dress, An error limit for the evolution of language, Proceedings of the Royal Society Biological Sciences Series B, 266, 1433, 2131-2136, 1999.

Information theoretic models in language evolution - ScienceDirect.com

Information theoretic models in language evolution. 1. Rudolf Ahlswede, Erdal Arikan, Lars Bäumer, Christian Deppe. Universität Bielefeld, Fakultät für Mathematik, Postfach 100131, 33501 Bielefeld,. Germany. Abstract. We study a model for language evolution which was introduced by Nowak and. Krakauer ([2]).

145KB Sizes 0 Downloads 260 Views

Recommend Documents

game theoretic models of computation
Dec 15, 2004 - he has suffered the most); Daniel Peng (for hosting me in Princeton where i actually ... Definition 1.2.1. A Game can be defined as interaction between rational decision makers. Game theory provides us with a bag of analytical tools de

Language Evolution in Populations - Linguistics and English Language
A particular feature of this work has been its foundations in 1) linguistic theory and 2) ... new social networks among children and younger people: These possibilities are influ- .... the competing grammars, in order to decide which is best. ... Gra

On Speeding Up Computation In Information Theoretic Learning
On Speeding Up Computation In Information Theoretic Learninghttps://sites.google.com/site/sohanseth/files-1/IJCNN2009.pdfby S Seth - ‎Cited by 22 - ‎Related articleswhere G is a n × n lower triangular matrix with positive diagonal entries. This

Constrained Information-Theoretic Tripartite Graph Clustering to ...
bDepartment of Computer Science, University of Illinois at Urbana-Champaign. cMicrosoft Research, dDepartment of Computer Science, Rensselaer ...

Constrained Information-Theoretic Tripartite Graph Clustering to ...
1https://www.freebase.com/. 2We use relation expression to represent the surface pattern of .... Figure 1: Illustration of the CTGC model. R: relation set; E1: left.

Information-Theoretic Identities, Part 1
Jan 29, 2007 - Each case above has one inequality which is easy to see. If. X − Y − Z forms a Markov chain, then, I(X; Z|Y ) = 0. We know that I(X; Z) ≥ 0. So, I(X; Z|Y ) ≤ I(X; Z). On the other hand, if X and Z are independent, then I(X; Z)

Sticky Information Models in Dynare
Mar 25, 2013 - as an alternative to the widely used Calvo sticky price framework. Agents update their ... compute the ergodic variance to initialize the Kalman filter. This is actually ... independent and normally distributed innovations. 3 Sticky ..

Sticky Information Models in Dynare
Mar 25, 2013 - widely used software package for solving dynamic stochastic .... In particular, in each period only a fraction λ of firms ... be used for other models with a small number of lagged expectation terms ... of T = 32 the amount of time sp

Game Theoretic Explanations and the Evolution of Justice
selves data to be explained by evolution. Skyrms asks why this ... Evolution here is driven by "frequency-dependent" selection. The fitness of a ..... This by itself is perhaps not surprising, inasmuch as stories about "wicked step-parents" are a ...

Game Theoretic Equilibria and the Evolution of Learning
Dec 14, 2011 - rules, in some sense, tend toward best-responses and can reach NE in a .... Of course, this is not a counter-example to Harley's idea that only rules ..... ios. If, on the other hand, any evolutionarily successful learning rule is one.

Empirical Game Theoretic Models: Computational Issues
solutions currently exist. An illustration to a set of procurement data from the French aerospace ... privately draw individual 'types' or 'signals' from a probability distribution F, which is ...... ≤50) we generally set c = 1 and ε = 10−8 . As

Language Evolution in Populations: extending the Iterated Learning ...
In their list of factors influencing the outcome of dialect contact, Kerswill & ... new social networks among children and younger people: These possibilities are ...

Complex Systems in Language Evolution - Linguistics and English ...
individual, and how this linguistic competence is derived from the data the indivi- ..... Production An analysis of a meaning or signal is an ordered set of ...

Context-theoretic Semantics for Natural Language
Figure 2.1 gives a sample of occurrences of the term “fruit” in the British National ... Christmas ribbon and wax fruit can be added for colour. .... tree .041 .847 2.33 -.68 1.35 4.36 1.68 1.78 computer 1.56 .679 .731 3.13 1.62 -1.53 .635 -.455.

Cue Fusion using Information Theoretic Learning - Semantic Scholar
Apr 28, 2006 - hR. S (x, y) = g(x, y, π/2),. (2.1) with g(·,·,·) is a vertically oriented Gabor kernel centered ..... 1http://www.ncrg.aston.ac.uk/GTM/3PhaseData.html.

Information-theoretic Multi-view Domain Adaptation
data from single domain, assuming that either view alone is ... theoretical Multi-view Adaptation Model (IMAM) based on ..... tional Linguistics, pages 360-367.

Information Theoretic Approach to Extractive Text ...
A Thesis. Submitted For the Degree of. Doctor of Philosophy in the Faculty of Engineering by. G.Ravindra. Supercomputer Education and Research Center. Indian Institute of Science. BANGALORE – 560 012. FEBRUARY 2006. Page 2. i [email protected]. FEBRUARY

An Information-Theoretic Privacy Criterion for Query ...
During the last two decades, the Internet has gradually become a part of everyday life. ... between users and the information service provider [3]. Although this.

Robust Feature Extraction via Information Theoretic ...
function related to the Renyi's entropy of the data fea- tures and the Renyi's .... ties, e.g., highest fixed design breakdown point (Miz- era & Muller, 1999). Problem ...

Robust Feature Extraction via Information Theoretic ...
Jun 17, 2009 - Training Data. ▫ Goal:Search for a ... Label outliers: mislabeling of training data. ▫ Robust feature ... Performance comparison on MNIST set ...

An Information-Theoretic Privacy Criterion for Query ...
During the last two decades, the Internet has gradually become a part of everyday life. ... been put forth. In [7,8], a solution is presented, aimed to preserve the privacy of a group of users sharing an access point to the Web while surf- ing the In

On the Information Theoretic Limits of Learning Ising ...
IIS-1320894, IIS-1447574, and DMS-1264033. K.S. and A.D. acknowledge the support of NSF via. CCF 1422549, 1344364, 1344179 and DARPA ... lower bounds for distributed statistical estimation with communication constraints. In Ad- vances in Neural Infor

An information-theoretic look at MIMO energy-efficient ...
REFERENCES. [1] H. Kremling, “Making mobile broadband networks a success ... Globecom Technical Conf., San Francisco, California, USA,. Nov./Dec. 2006.