Intonational speech prosody encoding
As people speak, they raise and lower the pitch of their voices to communicate meaning. For example, a person can ask a question simply by raising their pitch at the end of a sentence, “Anna likes to go camping?”. Although the use of pitch in languages is very important, it is not known exactly how a listener’s brain represents the pitch information in speech. Here, we recorded neural activity in human auditory cortex as people listened to speech with different intonation contours (or changes in pitch over the course of the sentence), different words, and different speakers. We found that some neural populations responded to changes in the intonation contour. These neural populations were not sensitive to the phonetic features that make up consonants and vowels and responded similarly to male and female speakers. We then showed that the neural activity could be explained by relative pitch encoding. That is, the amount of activity did not correspond to absolute pitch values. Instead, the activity represented speaker-normalized values of pitch, such that male and female speakers were considered on the same scale.
Perceptual restoration of masked speech
Social communication often takes place in noisy environments (restaurants, street corners, parties), yet we often don’t even notice when background sounds completely cover up parts of the words we’re hearing. Even though this happens all the time, it is a major mystery how our brains allow us to understand speech in these situations. While we recorded neural signals directly from the brain, participants listened to words where specific segments were completely removed and replaced by noise. For example, when they were presented with the ambiguous word “fa*ter” (where * is a loud masking noise, like a cough), they either heard the word “faster” or “factor”. Importantly, participants reported hearing the word as if the “s” or “c” was actually there. We found that brain activity in a specific region of the human auditory cortex, the superior temporal gyrus, is responsible for “restoring” the missing sound. When participants heard the same physical sound (e.g., “fa*ter) as either “factor” or “faster”, brain activity in the superior temporal gyrus acted as if the sound they perceived was actually present. Remarkably, we also found that using machine learning analyses, we could predict which word a listener would report hearing before they actually heard it. Brain signals in another language region, the inferior frontal cortex, contained information about whether a listener would hear “factor” or “faster”. It is as if the inferior frontal cortex decides which word it will hear, and then the superior temporal gyrus creates the appropriate percept.
Try it out for yourself:
Decoding the English phonetic inventory
Speech is composed by elementary linguistic units (e.g. /b/, /d/, /g/, /i/, /a/…), which are called phonemes. In any given language, a limited inventory of phonemes can be combined to create a nearly infinite number of words and meanings. In contrast to most studies that focus on a small set of selected sounds, we determined the neural encoding for the entire English phonetic inventory. By comparing neural responses to speech sounds in natural continuous speech, we could determine that individual sites in the human auditory cortex are selectively tuned, not to individual phonemes, but rather to an even smaller set of distinctive phonetic features that have been long postulated by linguists and speech scientists. We discovered that neural selectivity to phonetic features is directly related to tuning for higher-order spectrotemporal auditory cues in speech, thereby linking acoustic and phonetic representations. We systematically determined the encoding for the major classes of all English consonants and vowels.
Categorical speech encoding in the Human Superior Temporal Gyrus
Nat Neurosci. 2010 Nov;13(11):1428-32.
Human cortical sensorimotor network underlying feedback control of vocal pitch.
Chang EF, Niziolek CA, Nagarajan SS, Knight RT, Houde JF. Proc Natl Acad Sci U S A. 2013 Feb 12;110(7):2653-8.
The Functional Organization of Human Speech Sensorimotor Cortex
In this study, we recorded directly from the human cortical surface to answer precisely how we articulate. A major question that we address is how population neural activity gives rise to phonetic representations of speech. We first determined that the spatial organization of the speech sensorimotor cortex is laid out according to a somatotopic representation of the face and vocal tract (ABOVE). This is the first time this has been demonstrated during the act of speaking. The identification of several articulator representations itself, however, does not address the major challenge for speech motor control- that is, how do the speech articulators (lips, tongue, jaw, larynx) become precisely orchestrated to produce a simple syllables? To address this, we used recent methods for neural “state-space” analyses of the complex spatiotemporal neural patterns across the entire speech sensorimotor cortex. We demonstrate that the cortical state-space manifests two important properties-- it has a hierarchical and cyclical structure (BELOW). These properties may reflect cortical strategies to greatly simplify the complex coordination of articulators in fluent speech. Importantly, these results are consistent with an underlying organization reflecting the construct of a phoneme.