Discrimination between High and Low tones in Tataltepec de Valdés Chatino: A Report from an English Pilot Study1 Ryan Sullivant University of Texas at Austin May 9, 2010 Introduction Tataltepec Chatino is a Chatino language spoken by some 2,800 residents of the municipality of Tataltepec de Valdés in Oaxaca, Mexico (INEGI 2009). This language was first identified as one of the three “dialects” of Chatino by Franz Boas (1913). These erstwhile dialects are now considered to be separate languages, which together with the Zapotec languages form the Zapotecan branch of the Otomanguean language family (Woodbury 2009). Tataltepec Chatino is a language with a complex system of lexical tone in which words may be distinguished by movements in fundamental frequency alone. The latest analysis of the tone system of this language identifies five non-sandhi-derived tones which may appear in isolation: Relaxed, Low, High, High-High and High-Low (Sullivant and Woodbury 2009). The tones are stem-linked and each tone extends over the entire word. Figure (1) provides an example of a near-minimal quintuple across these five tones. Figure (1). A near-minimal tonal quintuple. From left to right: [nda:] „bean‟, „give.HAB‟, „give.COMPL‟, „give.HAB.2S‟, and [ndɑ̃] „give.HAB.1S‟.
Pitch (Hz)
300 250 200 150
Relaxed
Low
High
0
High-High
High-Low 3.669
Time (s)
Each tone may surface with a different shape depending on if the stem it is appearing on is a short monosyllable, a long monosyllable or a disyllable (Figure 2). Though any complete analysis of the perception of Tataltepec Chatino tones would identify the acoustic cues used to distinguish among these tones (as well as the at least two sandhi-derived tones which occur in 1
I gratefully acknowledge support for my work through Endangered Language Documentation Programme grant MDP0153 to the University of Texas at Austin, offered by the Hans Rausing Endangered Language Project at the School of Oriental and African Languages, University of London. The recordings used by this current project were created by Emiliana Cruz and Anthony Woodbury.
limited contexts), this paper will only concern the perception between two tones in one word shape: Low and High in long monosyllables. Long monosyllables were chosen for this analysis since only one consonant2 which may affect tone will be present in the stimulus, and long monosyllables provide the greatest amount of time for f0 movements to be detected across the phonation of the vowel, allowing for satisfactory manipulation and discrimination of tones. Tone perception across disyllables and short monosyllables will be examined as my tone manipulation skills increase. The Low and High tones are chosen for this discrimination task since my impressionistic observations of these tones suggest that (in long monosyllables) each has two inflection points (as opposed to Relaxed, High-High, and High-Low tones which may have one, two or three inflection points (Figure 3)3), and thus simple manipulation of f0 at these points rather than adjustment of the inflection points‟ locations within the word would allow for some insight into the discrimination strategy between these two tones. Figure 2. Different word shapes (CVCV, CV:, CV) bring about differently-shaped realizations of the L tone.
Based upon the impressionistic observations of myself and many others, it is hypothesized that higher values of f0 at the two inflection points within the syllable will increase the likelihood that a listener will interpret the syllable as carrying a High tone. I have also observed that High tone words often appear to rise more sharply than Low tone words, and therefore I hypothesize that a listener‟s likelihood to identify a word‟s tone as High will increase as the magnitude of f0 rise across the word increases. The perception experiment described here will be performed with Tataltepec Chatino speakers in the summer of 2010, and what follows is the methodology and results of a pilot study of this experiment performed on native English speakers due to the scarcity of Tataltepec Chatino speakers in and around Austin, Texas. 2
Syllables in the Chatino languages may only be closed by a glottal stop, and given the marked effect such a consonant could play on the production and perception of f0, minimal pairs containing glottal stops were not selected for tone discrimination tests at this point. Likewise, words containing phonemically nasalized vowels were not considered. 3 These presumptions I have made about the salient cues speakers use to distinguish tone await experimental justification.
Figure 3. Hypothetical schematization of f0 movements across long monosyllables, with supposed inflection points circled.
Methodology Stimuli Preparation Recordings of a native speaker of Tataltepec Chatino during a word-elicitation task were recorded. Among the words recorded were the minimal pair koò „cloud‟ and koó „H. hampei coffee borer beetle‟. Figure 4 shows pitch tracks for both words and Table 1 summarizes the important f0 values of each, though the quite high initial f0 is due to the release of the consonant. The minimal pair was originally recorded during a word-elicitation session, and are tokens of each word being pronounced in isolation, and each member of the pair is taken from a different portion of the original file. Figure 4. The source of the stimuli, a minimal pair; left to right: koó „coffee borer beetle‟ (High) and koò „cloud‟ (Low). 0.44
0
-0.2851 0.05254
1.983 Time (s)
Pitch (Hz)
250
190 0.05254
1.983 Time (s)
Table 1. f0 data for /ko:/ minimal pair. All values in Hz.
High koó Low koò
Initial f0 233 245
Minimum f0 216 200
Maximum f0 238 227
Final f0 221 200
Using the Praat program (Boersma 2001), these original two files were manipulated and used to synthesize the stimulus materials. For each word, the f0 minimum and maximum were found and a target frequency was chosen for each of the two time points (refered to hereafter as the points of initial f0 and final f0) and the f0 values between these two points were interpolated. This process was performed on both the High and the Low tone members of the minimal pair to control for any non-tonal cue (e.g. vowel length, or the relative timing of the f0 inflection points to some other feature) which a listener may use to distinguish between tones. The f0 values assigned to the two timepoints were one of six values along a continuum of 195 Hz to 245 Hz in 10 Hz increment. The endpoints of this continuum were chosen to be just slightly lower than the lower minimum pitch and just higher than the higher maximum pitch in the original sounds. All f0 movement before the f0 minimum was presumed to be unimportant for tone discrimination and was left unaltered in the stimuli. Any combination of endpoint frequencies which would result in a falling pitch was not synthesized as a falling pitch would likely implicate the Relaxed tone (and a non-word) rather than the two tones under consideration, resulting in 21 tokens based on each original syllable, for a total of 42 tokens. Figure 5 shows the pitch tracts of three stimulus tokens created by this process. Figure 5. Pitch tracts of three example stimuli. Endpoints from left to right, 215-235 Hz, 245-245 Hz and 195-205 Hz.
Subjects For this pilot study, four English-speaking volunteers were recruited by word of mouth from among my friends and acquaintances. All subjects reported normal hearing, and were not compensated for their time. The Experiment Subjects were seated in front of a laptop running the Alvin stimulus presentation software (Hillenbrand and Gayvert 2005) and were asked to listen to the recordings which will be played
through headphones and click on the appropriate button on the screen. Each button was labeled with the word “High” or “Low”. As none of the subjects were speakers of Tataltepec Chatino, a short familiarization stage was designed to teach the difference between the two tones. Before the experiment itself began, the subjects were played tokens of koò and koó spoken by a different speaker of Tataltepec Chatino than the woman whose recordings were used to produce the experimental stimuli. The High and Low tones were identified as such before being played for each subject, and were played back until the subjects said that they could hear and identify the differences in the tones. The stimuli were randomized, and each stimulus was played a total of ten times. Subjects were allowed as much time as they would like to respond to each stimulus, but were not able to replay a sound. Results A logical regression fitting the independent variables initial f0 and final f0 against the dependent variable response (coded as either High or Low tone) found initial f0 and final f0 significantly affect identification of the tone of the stimulus as a Low tone (coefficient -0.039, z = -8.8, p<0.01 and coefficient -0.063, z = -13.0, p< 0.01 respectively). Initial and final f0 were not found to interact significantly (coefficient 0.001, z= 1.5, p=0.12). Figure 6 shows the percentage of tokens with a given initial or final f0 was identified as having a High tone. As the initial f0 rises, High tones were heard more often. Likewise, as final f0 rises, High tones are heard more often as well, in confirmation of the hypothesis. The magnitude of f0 rise for each stimulus was calculated by subtracting final f0 from initial f0. Figure 7 shows the percentage of tokens of identified as High by absolute f0 rise. A logistic regression found absolute f0 rise to significantly affect identification of tokens as Low tones (coefficient 0.01, z = 2.5, p < 0.05). Seeing as how absolute differences in frequency are not likely to be very salient for listeners, the percent rise of f0 over the syllable was calculated, yet the percent rise of f0 was not found to be significant (coefficient 1.128, z=1.73, p = 0.83). A series of logistic regressions were carried out for each subject, and the results of these analyses are found in Table 2. When considering only one subject‟s data at a time, only Subject 4 was found to use the percentage of f0 rise to distinguish between High and Low tones. Discussion Given that the subjects of this study are not speakers of the target language, it would be folly to attempt to generalize the results reported above to speakers of Tataltepec Chatino. What can be mentioned are my expectations of how Tataltepec Chatino speakers will perform differently on this task and how I expect to alter my methodology as a result of performing this pilot study. Firstly, I expect Chatino speakers to exhibit greater evidence of categorical perception of the two tones than the English speakers who evaluated each token by some relative judgment (which was likely developed during the course of the experiment) of the pitch instead of comparing the sound they hear against items in their lexicon. Furthermore, whereas the
Anglophone subjects appear to use initial f0, final f0 and f0 rise to distinguish between High and Low tones, Chatino speakers may only use a subset of these features, or may even utilize different discrimination strategies depending on the qualities of their input. Figure 6. Percent of tokens of a given initial or final f0 which were identified as High tones.
100
% Identified as High by Initial and Final F0
60 40 0
20
Percent Identified as High
80
Initial F0 FinalF0
195
205
215
225
235
245
Fundamental Frequency (Hz) As to how I will change the design of the experiment, I will no longer have subjects respond via a clicking buttons on a screen using a mouse, but rather by hitting one or another marked key on a keyboard. This will accomplish two goals: first, it will reduce the overall physical and mental complexity of the subjects‟ task and it will facilitate the collection of response time data, which could be useful to identify those stimuli which speakers find difficult to interpret and therefore better identifying the edges of their perceptual categories. The buttons on the screen which will correspond to the response keys will not contain linguistic or metalinguistic labeling (which is what the “High” and “Low” labels essentially were, even for
the Anglophone subjects) but will instead be pictures of a beetle and a cloud, the referents of koó and koò. Figure 7. Percent of tokens with a given absolute rise in f0 which were identified as High tones.
60 40 0
20
Percent Identified as High
80
100
% Identified as High by Rise in F0
0
10
20
30
40
50
Rise in Fundamental Frequency (Hz) Table 2. Results of logistic regressions for each subject. A negative coefficient means that a speaker was less likely to identify a token as High as the value of the independent variable increased. Initial f0 Final f0 Initial f0:Final f0 Percent rise in f0 Subject1 -0.307, z=-1.61, p=0.11 -0.257, z=-1.54, p=0.12 0.001, z=1.30, p= 0.19 1.864, z=1.43, p = 0.15 Subject2 -0.193, z= -1.08, p=0.28 -0.152, z=-0.97, p=0.33 0.0006, z= 0.76 p=0.45 2.552, z=1.93, p=0.054 Subject3 -0.181, z= -0.51, p=0.61 -0.273, z= -0.86, p=0.39 0.0006, z=0.43, p=0.67 -2.567, z=-1.68, p=0.093 Subject4 0.117, z= 0.32, p=0.75 -0.024, z= -0.07, p=0.94 -0.0007, z=-0.42, p=0.67 -9.121, z= -5.49, p<0.01
Works Cited Boas, Franz. 1913. Notes on the Chatino Language of Mexico. American Anthropologist, New Series, Vol. 15, No. 1, (Jan. - Mar., 1913), pp. 78-86. Boersma, Paul. 2001. Praat, a system for doing phonetics by computer. Glot International. 5:9/10, 341-345. Hillenbrand, James M. and Robert T. Gayvert. 2005. Open Source Software for Experimental Design and Control. Journal of Speech, Language, and Hearing Research. Vol. 48. pp. 45-60. February 2005. INEGI. 2009. Perfil sociodemográfico de la población que habla lengua indígena. Instituto Nacional de Estadística y Geografía. Sullivant, Ryan and Anthony C. Woodbury. 2009. El tono y el sandhi del tono en el chatino de Tataltepec de Valdés. Proceedings of the Conference on Indigenous Languages of Latin America (CILLA), Teresa Lozano Long Institute of Latin American Studies at the University of Texas at Austin. To appear. Woodbury, Anthony C. 2009. On the internal classification of Chatino. Ms. University of Texas at Austin.