Functional Load in the Lexicons of the World’s Languages: Towards a New Measure Alexander Martin Laboratoire de Sciences Cognitives et Psycholinguistique (ENS, EHESS, CNRS) Département d’Études Cognitives (École Normale Supérieure – PSL Research University) Keywords: functional load, phonological features, corpus, minimal pair Phonological contrasts have traditionally been considered to be either distinctive or allophonic in the world’s languages. Recent work has aempted to categorize “intermediate” relationships that lie somewhere between full distinctiveness and full allophony (e.g., Hall, ). But even phonological relationships that appear fully distinctive (e.g., two sounds that are contrastive in all syllabic positions) may be more or less important than other contrasts in a given language. We can define this phenomenon—the weight that a contrast has in a given phonological system—as functional load (henceforth FL). Going back to the Prague School, FL has been proposed as a predictor of sound change. Specifically, contrasts that are consistently used to distinguish words in a given language are predicted to be less likely to merge over time than contrasts that “do less work”. In other words, a contrast with a high FL is more resistent to sound change. Formalism for this idea, termed the “Functional Load Hypothesis”, was introduced in the middle of the th century (Hocke, ), but not tested until much later. Hocke proposed the application of information theory and specifically information entropy (Shannon, ) as a measure of the FL of phonemic contrasts. is formalism was revisited in the early s and was expanded beyond simple phonemic contrasts (Surendran & Niyogi, , ). Surendran and Niyogi laid down a theoretical basis for the application of the measure of entropy to many levels of phonological contrast going from the whole word down to the phonological feature. e elegance of this method is that it allows comparisons between levels. It has, for example, been used to demonstrate the high FL of tone in languages like Mandarin relative to vowel quality (Surendran & Levow, ). e Functional Load Hypothesis was recently put to the test and was somewhat verified using the formalism developed by Surendran & Niyogi (Wedel, Kaplan, & Jackson, ). Wedel and colleagues also applied the simpler measures of minimal pair (henceforth MP) count and relative phoneme probability to the corpora they examined and found them to be more accurate predictors of contrast lost, somewhat circularly claiming them to be beer measures of FL than information entropy. e lack of agreement on the way to properly measure FL is not surprising given the various factors that may or may not be incorporated into what is to be considered “work” done by a given contrast. It has been argued that when considering MPs, those containing words with lower frequency should have a smaller weight compared to those containing words with high frequency (van Severen et al., ), but it is not clear that this is necessarily the best method. Should the frequency of the words phonemes appear in be considered part of the “work” they do? What about the frequency of the phonemes themselves? e present study offers a consideration of some of these factors as they pertain to French and proposes a new measure of FL based on phonological features. Specifically, we propose a measure that places all phonological features on equal ground, neutralizing frequency effects both through the sheer token frequency of words and through the simple frequency of phonemes in lexical entries. We also abstract phonotactics away from the equation, considering that phonotactic constraints are on a different level than phonological contrast. e method we propose is based on MP count and considers each phonological feature separately. We count the number of MPs that could theoretically exist and the number of MPs that do exist which allows us to form a simple proportion. As an example implementation of our measure, we considered the twelve obstruents of French and the three basic phonological features of consonants: place, manner, and voicing. We did this by exploiting the lexicon of French such as it is represented in the L database (New, Pallier, Ferrand, & Matos, ).
oi,j r (i, j) = log ( ) () We used the formula in Equation where the function r gives a score pi,j for a phoneme i and a phonological feature j where for each occurrence of i, j, we count the number of possible MPs p and the number of minimal pairs that are actually aested o. For example, when we are looking at the voicing feature and the phoneme /p/, upon encountering the word /po/ (peau, “skin”), we know that an MP is theoretically possible (i.e., a change in voicing on the segment /p/ yields the possible word /bo/). us p = . Furthermore, we know that the word /bo/ does exist (beau, “beautiful”). us o = . If we next consider a case such as /plɥi/ (pluie, “rain”), we observe that the theoretical MP it would form with a voicing change is possible (i.e., /blɥi/ is a phonotactically legal word). us p = now (the scores are cumulative as we iterate over the lexicon). However, this word does not actually exist in French, and o therefore remains at . Now if we consider the case of /psikoloʒi/ (psychologie, “psychology”), we know that the theoretical MP it would form from a voicing change is not possible (i.e., /bsikoloʒi/ does not follow French phonotactics because adjacent obstruents must agree in voicing morpheme-internally (Dell, )), thus p remains at . Of course if an MP is not possible, it will not be observed, and indeed o remains at . is process is continued over the entire lexicon and the two scores (o and p) then form a proportion in our function r (the number of observed MPs over the number of possible MPs). e function r is then applied to each feature and each phoneme (for French obstruents this yields a × matrix). e final matrix may be submied to an Analysis of Variance and scores for each feature (or for each phoneme) may then be compared. With regards to French, our results vary as a function of syntactic category, something that has not to date been discussed in the literature. Previous results have however shown that nouns tend to ressemble other nouns phonologically speaking and the same seems to be true of verbs and that speakers seem to be sensitive to these paerns (Farmer, Christiansen, & Monaghan, ). is idea therefore warrants future investigation. In French, we found no feature with a significantly higher FL than any other when considering the entire lexicon (F < ). When considering only nouns however, we found a clear asymmetry (F = ., p < .). e place feature was found to have a significantly higher FL than manner (adjusted p < .) and a marginally significantly higher FL than voicing (adjusted p = .). ese features however were not found to be different from one another. is result is somewhat in accordance with Surendran & Niyogi’s findings for English, Dutch, German, and Mandarin. ey showed—using their entropy measure—that the place feature had a higher FL in all of these languages than the two other features. It is unclear, however, how robust the effect they observed was, given that it is impossible to use inferential statistics to evaluate their scores. A natural question that arises when considering these data is: are listeners sensitive to these differences and can they exploit them online during speech processing? is question, in addition to the cross-linguistic nature of this effect is an important target of future research.
References Dell, F. (). Consonant clusters and phonological syllables in French. Lingua, (-), –. Farmer, T. A., Christiansen, M. H., & Monaghan, P. (). Phonological typicality influences on-line sentence comprehension. Proceedings of the National Academy of Sciences of the United States of America, (), –. Hall, K. C. (). A typology of intermediate phonological relationships. e Linguistic Review, (), –. Hocke, C. (). A Manuel of Phonology. International Journal of American Linguistics, (). Hume, E., Hall, K. C., Wedel, A., Ussishkin, A., Adda-Decker, M., & Gendrot, C. (). Anti-markedness paerns in French epenthesis: An information-theoretic approach. In Proceedings of the th annual meeting of the berkeley linguistics society (pp. –). New, B., Pallier, C., Ferrand, L., & Matos, R. (). Une base de données lexicales du français contemportain sur internet: LEXIQUE. L’année psychologique, , –.
Shannon, C. E. (). A Mathematical eory of Communication. e Bell System Technical Journal, , –. Surendran, D. & Levow, G.-a. (). e Functional Load of Tone in Mandarin is as High as that of Vowels. In Speech prosody (pp. –). Surendran, D. & Niyogi, P. (). Measuring the Usefulness (Functional Load) of Phonological Contrasts. Department of Comuter Science, University of Chicago. Surendran, D. & Niyogi, P. (). anitifying the Functional Load of Phonemic Oppositions, Distinctive Features, and Suprasegmentals. In O. Nedergaard omsen (Ed.), Competing models of linguistic change: evolution and beyond (pp. –). van Severen, L., Gillis, J. J. M., Molemans, I., van den Berg, R., De Maeyer, S., & Gillis, S. (). e relation between order of acquisition, segmental frequency and function: the case of wordinitial consonants in Dutch. Journal of Child Language, (), –. Wedel, A., Kaplan, A., & Jackson, S. (). High functional load inhibits phonological contrast loss: a corpus study. Cognition, (), –.