Humans usually talk nearly as fast as possible, and the rate of speech is limited by how fast we can drive our muscles (e.g. tongue, jaw, vocal folds, ...). Consequently, the muscle dynamics and the control strategies that the brain uses to control the speech muscles are critical in understanding this important communication system. I describe physics-based models of pitch dynamics in Mandarin (Chinese) and Cantonese speech. These are tone languages, where changes of pitch can switch a syllable from one word to a completely different one. The model treats speech as an optimized communication
system which is attempting to simultaneously minimize the communication error rate and the effort required to produce speech. Previous approaches have used ad-hoc models, often from machine learning systems, and have been largely unsuccessful at connecting acoustic parameters to features of the language. These models yield a set of parameters which correspond to the linguistic concept of the "strength" or importance of a word. We have shown consistent use of strength to mark boundaries in the speech, and mark words with high information content. The strength values are among the first objective, and quantitative measurements that can be compared to linguistic theories.
Professor Greg P. Kochanski is a candidate for the ATOP-funded Assistant Professor position in the Department.