Discovering New Vowel Harmony Patterns Using A Pairwise Statistical Model

Document Type

Poster Session

Publication Date


Published In

20th Manchester Phonology Meeting


Vowel harmony is typically analyzed as a primarily categorical phenomenon: either a language has harmony, or it does not; either a given vowel in a given environment harmonizes, or it does not. However, a more gradient measure of harmony can reveal finer-grained information. For example, according to Harrison et al.’s (2002–2004) vowel harmony calculator (henceforth, VHC), an arbitrary word in Tuvan has a greater chance of being fully harmonic (94%) than one in Turkish (62%), even though both languages categorically have backness harmony. This fact cannot be predicted by traditional categorical analysis — it must be derived statistically. In this work, we propose a new method to statistically measure harmony based on feature agreement within pairs of tier-adjacent vowels, and we compare our results to those from VHC’s whole-word measure. While the results of these methods correlate very well, they also differ in ways that open up new avenues for future research and provide interesting challenges for categorical phonological theories. VHC uses the word as a categorical harmonic domain: if all of a word’s vowels are the same for a given feature, then the word is classified as harmonic; otherwise, it is disharmonic. In particular, VHC considers Turkish words like krematoryum ‘crematorium’ and ekskavatör ‘excavator’ to be equally disharmonic for backness, even though krematoryum has three adjacent harmonic vowels. If a language’s disharmonic words are unevenly distributed between these two types, the acquisition, productivity, and/or long-term stability of the harmony pattern could be affected. We propose a statistical measure of harmony in a corpus that looks at adjacent vowel pairs instead of entire words. Unlike with VHC’s exact measure of word-based harmony, analytically calculating a similar exact normalized measure for pairwise harmony is incredibly complex. It is much more tractable to use computer simulations to estimate it instead, so we bootstrap 2000 randomly generated corpora using the same vowel and word-length probability distributions of a given corpus. We then compare the proportion of harmonic vowel pairs in the original corpus to the distribution of the harmony proportions in the randomly generated corpora, calculating the original corpus’s z-score: how many standard deviations it is from the mean harmony proportion of all of the randomly generated corpora. Crucially, because z-scores are inherently normalized quantities, they can be meaningfully compared between languages and/or features, allowing for synchronic and diachronic comparison of different harmony patterns within and across languages. Our results for 15 languages correlate strongly with VHC (r≈0.89 for backness), which is expected: a language with many fully harmonic words should also have many harmonic vowel pairs. However, our model also reveals information that is missed by word-based statistics and traditional phonological analysis. For example, VHC finds that Estonian and Uzbek have little whole-word backness harmony, but we find that Estonian has a large z-score for pairwise harmony (z≈24), greater than Votic (z≈13), which is harmonic. This suggests statistically significant “hidden harmony” between vowels that need not extend to the entire word. We also find negative z-scores, such as for Uzbek’s backness harmony (z≈−3), which suggests “anti-harmony”: a preference for disharmony. For Uzbek, anti-harmony reflects the result of the loss of historical harmony due to vowel merger, but such negative z-scores could also arise from other factors. We have proposed a new statistical measure of vowel harmony, looking within pairs of tier-adjacent vowels, rather than across entire words. This model can be used on any corpus and can find at least two new pairwise harmony patterns that are invisible to more traditional analyses: hidden harmony (as in Estonian) and anti-harmony (as in Uzbek). These new patterns enrich traditional categorical descriptions of harmony, which opens new areas for understanding the fundamental nature of harmony and how to represent it formally in phonological theory.


20th Manchester Phonology Meeting

Conference Dates

May 24-26, 2012

Conference Location

Manchester, England