Guilherme Duarte Garcia (McGill) .
Stress and gradient weight in Portuguese
This paper proposes a novel analysis of stress in Portuguese based on weight gradience. Previous analyses have argued that weight effects in Portuguese are categorical and restricted to the wordfinal syllable. I show that weight effects in the Portuguese lexicon [7] are gradient, and can be found in all three positions in the stress domain (trisyllabic window). These effects are statistically significant whether we consider syllables or intervals [12] as the domain of weight computation. I suggest that a statistical analysis based on intervals is more economical, more empirically motivated and more accurate. Finally, I demonstrate how this analysis can be mapped into a Maximum Entropy (MaxEnt) grammar [3, 13, 6]. Crucially, the vast majority of words in the lexicon are accounted for (≈ 80%) by one single predictor, namely, weight—including a portion of irregular words, where stress is unpredictable. Unlike previous analyses, no abstract notions such as extrametricality and catalexis are used.
Background Weight: In languages where stress is weight-sensitive, syllables with greater weight are more likely to be prominent, i.e., to attract stress [5, 4]. In interval theory [12], greater weight entails greater duration in a given interval, defined as the rhythmic unit that spans from a vowel up to (but not including) the following vowel. Segments preceding the leftmost vowel are not included in any interval—though such segments may have an impact on the interval if we assume the pcenter model [11]. Intervals (ι) have no a priori constituency, and predict different rhythmic units when compared to syllables (σ): onset segments in a given syllable are computed as part of the preceding interval. For example, the syllable string CVCσ CCVCσ is equivalent to the interval string 〈C〉VCCCι VCι . The onset effects on stress found in the most comprehensive Portuguese lexicon [7] seem to be better captured by intervals, as they are negatively correlated with stress. Portuguese: Previous analyses [1, 2, 8] have argued that weight effects on stress in Portuguese non-verbs are restricted to the word-final syllable (stress in verbs is not phonologically conditioned): stress is final if the word-final syllable is heavy. Otherwise, stress falls on the penult syllable (regardless of weight). Both final and penult stress patterns are (mostly) regular. Antepenult stress is irregular, and no pre-antepenult stress is allowed. Approximately 72% of the non-verbs in the language (N=163,626) have regular/predictable stress. Scholars have employed different factors to account for stress regularities (e.g., foot binarity, foot type, metrical alignment) and irregularities (e.g., extrametricality, catalexis, theme vowel influence) in the language.
Methodology I examine a large subset of the Houaiss corpus of Portuguese [7] (n = 137,970), and model stress likelihood with Ordinal Regressions (clm() in R [10]) based on syllables and intervals. The stress domain was treated as a 3-point scale, where the antepenult (3) and final (1) positions demarcate the end points of the scale. For comparison, two binomial Logistic Regressions (glm() in R) were also used in a separate analysis, which confirmed the results present in the Ordinal Regressions used in this study. Phonotactic patterns were controlled for to avoid unnatural words. As well, borrowings and monosyllables were excluded. Intervals and syllables are compared side by side, each with its own statistical model. onset, nucleus and coda were included as (binary) predictors
in the syllable-based model (×3 = 9 predictors). The interval-based model included interval size as the predictor (×3 = 3 predictors): int3 (antepenult), int2 (penult) and int1 (final).
Results Both syllable- and interval-based models confirmed the gradual effect of weight on stress. Results show that weight effects are highly significant in all positions in the stress domain (p < 0.00001), and weaken as we move away from the right edge of the word. For the interval model, effect sizes were: int3 (βˆ = 0.315), int2 (βˆ = −0.216) and int1 (βˆ = −2.184); positive coefficient values indicate increased likelihood of antepenult stress. In other words, the more segments an interval contains, the more likely that interval is to attract stress. Onset effects in the syllable model confirm the negative correlation observed in the data, indicating that the interval-based model is more empirically motivated—these models also have fewer predictors, by definition, since intervals have no internal structure a priori. The interval-based model also presents (i) a lower aic value, which indicates a better fit, and (ii) higher accuracy (79.32% vs. 76.71%) when compared to the syllable-based model. Finally, the interval-based analysis proposed in this paper can be mapped into a MaxEnt grammar, where constraints are weighted. I assume a modified version of the Weight-to-Stress Principle [9, p. 3]: wspn , where n represents a given position in the stress domain. wspn assigns one violation mark to each segment in n if and only if n is unstressed. In this grammar, inputs represent unique sequences of intervals in the lexicon, and outputs represent possible stress patterns (N = 3). Constraint weights were determined using the MaxEnt Grammar Tool [13]: wsp3 = 0.14, wsp2 = 1.46 and wsp1 = 2.05. As expected, this analysis also captures the weight gradience found in the lexicon. Importantly, predicted probabilities and observed frequencies are highly correlated: r(85) = 0.871, p < 0.00001. Therefore, this novel analysis shows that (i) weight is not constrained to the word-final position; (ii) weight is gradient; and (iii) weight alone is able to account for stress in the vast majority of words in the language. As a result, this proposal is more economical, more empirically motivated and more accurate than previous analyses, which had to employ different mechanisms to account for the regular and irregular patterns found in the language.
References [1] Bisol, L. (1992). O Acento: Duas Alternativas de An´ alise. Unpublished manuscript. [2] Bisol, L. (1994). The stress in Portuguese. Actas do Workshop sobre Fonologia. [3] Goldwater, S. and Johnson, M. (2003). Learning ot constraint rankings using a maximum entropy model. In Proceedings of the Stockholm workshop on variation within Optimality Theory, pages 111–120. [4] Gordon, M. (2013). Syllable weight: phonetics, phonology, typology. London: Routledge. [5] Hayes, B. (1995). Metrical Stress Theory. Chicago: University Of Chicago Press. [6] Hayes, B. and Wilson, C. (2008). A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry, 39(3):379–440. [7] Houaiss, A., Villar, M., and de Mello Franco, F. M. (2001). Dicion´ ario eletrˆ onico Houaiss da l´ıngua portuguesa. Rio de Janeiro: Objetiva. [8] Lee, S.-H. (1994). A regra de acento do portuguˆes: outra alternativa. Letras de Hoje, 98:37–42. [9] Prince, A. (1990). Quantitative consequences of rhythmic organization. Cls, 26(2):355–398. [10] R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [11] Ryan, K. M. (2014). Onsets contribute to syllable weight: statistical evidence from stress and meter. Language, 90(2):309–341. [12] Steriade, D. (2012). Intervals vs. syllables as units of linguistic rhythm. Handouts, EALING, Paris. [13] Wilson, C. (2006). Learning phonology with substantive bias: An experimental and computational study of velar palatalization. Cognitive science, 30(5):945–982.