ARO 2003

Abstracts included from the following articles:

Frequency modulation detection in cochlear implants listeners.

Independent contributions of amplitude and frequency modulations to auditory perception. II. Melody, tone, and speaker identification.

Frequency modulation detection in cochlear implants listeners.

Clear Speech Perception in Normal-Hearing and Cochlear-Implant Listeners

Facts and Artifacts in Auditory Chimaeras

 

 

Independent contributions of amplitude and frequency modulations to auditory perception. I. Consonant, vowel, and sentence recognition.


Hongbin Chen, Fan-Gang Zeng
Univ. of California, Irvine
Irvine, CA92697 - 1257

Although amplitude modulation detection has been extensively studied in both acoustic and electric hearing, frequency modulation detection has been rarely studied in electric hearing. Here we systematically studied the cochlear implant listeners’ ability to detect three types of frequency modulations including, upward sweep, downward sweep, and sinusoidally frequency modulated stimuli. Difference limens (i.e., 70.7% correct response in a 3IFC, 2-down and 1-up procedure) were measured as a function of baseline frequency (from 75 to 1,000 Hz). Factors studied included electrode position (apical vs. basal), stimulation level (soft vs. comfortable), and modulation frequency (from 5 to 320 Hz dependent on the baseline frequency). Three postlingually deafened adults using Nucleus-22 cochlear implant participated in the experiment. For comparison, similar data were also collected in normal hearing listeners. Preliminary data showed an insignificant effect of electrode position and stimulation level but a significant effect of baseline frequency and modulation type on frequency modulation detection. Consistent with previous data in simple rate discrimination, difference limens in detecting all three types of frequency modulation increased monotonically as a function of the baseline frequency. Despite of large individual variability, difference limens for the sinusoidally frequency modulation were about half of that for the upward and downward frequency sweeps. The present data suggest that cochlear implant listeners may be more sensitive to dynamic frequency changes than steady-state frequency changes. We hope to explore this difference to dynamically encode the temporal fine structure in speech and music sounds for cochlear implant users. 



Independent contributions of amplitude and frequency modulations to auditory perception. II. Melody, tone, and speaker identification.

Ying-Yee Kong, Michael Vongphoe and Fan-Gang Zeng
University of California, Irvine
Irvine, CA 92697-1275

In a companion paper, we showed that amplitude modulation provides sufficient information for speech recognition in quiet, but additional frequency modulation is needed in noise. Here we evaluated relative contributions of amplitude and frequency modulations to melody, tone, and speaker identification. Twelve familiar melodies were generated with or without tempo information. Twenty-five Mandarin syllables, each having 4 tonal variations, were produced by a male and a female talker. Six vowel tokens (3 used for training and 3 used for testing) produced by 3 males, 3 females, 2 boys, and 2 girls were used for speaker identification. Stimuli were processed to extract slowly varying amplitude and frequency modulations from a number of frequency bands (1-64 bands). Melody and speaker identifications were conducted in both normal-hearing and cochlear-implant listeners, whereas tone identification was conducted in normal listeners only. Results showed that amplitude modulation only (i.e., 1 band) produced about 80% correct performance for melody identification with tempo and also for tone identification in quiet. However, for melody identification without tempo and for tone identification in noise (0 dB S/N), the performance dropped to about 40% even with 8 frequency bands. Similarly, listeners could recognize most of the vowels but could not identify the speakers. When frequency modulation was added, performance was restored to a level similar to the unprocessed stimuli. These results suggest that amplitude and frequency modulations independently contribute to auditory perception, with amplitude modulation contributing gross temporal information while frequency modulation contributing detailed spectral information for accurate pitch perception and signal-and-noise separation. Character Count: 1802 Max Characters: 2000



Clear Speech Perception in Normal-Hearing and Cochlear-Implant Listeners

Sheng Liu1, Elsa DelRio1, Ann R. Bradlow2, Fan-Gang Zeng1
1Hearing and Speech Research Laboratory, University of California, Irvine.
2Department of Linguistics, Northwestern University

Abstract
Previous studies have demonstrated that when instructed to speak clearly to people with hearing loss, talkers can produce “clear” speech, which has significantly higher intelligibility in noise than “conversational” speech. Here we measured clear and conversational speech perception at various signal to noise ratios covering a range over which intelligibility increased from about 0% to 100%. Stimuli consisted of ten sets of BKB sentences produced by a male and a female talker in clear and conversational speech. Speech-spectrum-shaped noise was used to produce the different signal-to-noise ratios. Real cochlear implant users and cochlear implant simulations were also tested to measure the contribution of temporal envelope cues to the clear speech advantage. A sigmoid function was used to fit the measured data, producing 2 parameters indicative of the speech reception threshold (i.e., the signal-to-noise ratio at which 50% intelligibility was achieved) and the slope of the psychometric function. We found that the speech reception threshold was –9.1 dB for clear speech and –6.3 dB for conversational speech in normal listeners, and was correspondingly –4.6 dB and 1.0 dB in implant listeners. The differences in speech reception threshold translated into about 15 percentage points in improved intelligibility scores for normal listeners and about 20 percentage points for implant listeners. Cochlear implant simulation produced similar results to that obtained in real implant listeners. The present results confirmed and extended previous findings in normal listeners. In addition, the implant and its simulation data suggest a direct contribution of temporal envelop cues to the intelligibility advantage of clear speech over conversational speech. Further analysis of temporal envelope cues in clear speech should yield results that are not only important for understanding mechanisms of speech perception but also for developing novel processing algorithms in auditory prostheses. 



Independent contributions of amplitude and frequency modulations to auditory perception. I. Consonant, vowel, and sentence recognition.

Kaibao Nie, Ginger Stickney, and Fan-Gang Zeng
University of California, Irvine
Irvine, CA 92697-1275

Previous studies have demonstrated that one can understand speech with primarily either temporal or spectral cues. However, it is not clear why both cues are present in natural sounds and how they are processed in the auditory system. Here we developed a signal processing strategy that independently extracted slowly-varying amplitude and frequency modulations within a frequency band with the number of bands as an independent variable. Normal-hearing listeners were presented with original speech sounds and processed sounds including amplitude modulation only and both amplitude and frequency modulations. The speech materials were vowels, consonants, and IEEE sentences, presented in quiet, or speech-shaped noise, or a single competing talker. The addition of frequency modulation significantly increased speech recognition with amplitude modulation only, particularly with less frequency bands and in challenging noise conditions. For example, the average vowel recognition score with amplitude modulation only and four frequency bands was 62%, 38%, and 32% for quiet, 0, and ­5 dB signal-to-noise ratio conditions, respectively. With the addition of frequency modulation, the corresponding score improved to 75%, 63%, and 52%, respectively. Similarly, the addition of frequency modulation improved sentence recognition by about 50 percentage points from 20% in the presence of a competing talker. These results suggest that, while amplitude modulation provides essential information for speech recognition in quiet, frequency modulation can enhance speech recognition by allowing the listener to extract signal from noise. Further results in tonal language, music and speaker identification will be presented in a companion paper. These results are relevant to design of cochlear implants and audio coding strategies. Character Count:1834. Max Characters: 2000



Facts and Artifacts in Auditory Chimaeras

Fan-Gang Zeng, Kai-Bao Nie, Ginger Stickney,
Sheng Liu, Elsa Del Rio, Ying-Yee Kong, Hong-Bin Chen
Hearing and Speech Research Laboratory,
University of California, Irvine, California 92697-1275, USA

Smith, Delgutte and Oxenham (Nature, 416:87–90, 2002) produced “auditory chimaeras” by systematically mixing one sound’s temporal envelope with another sound’s fine temporal structure as a function of frequency bands (1-64). They found that “the envelope is most important for speech reception, and the fine structure is most important for pitch perception and sound localization.” Here we identified two technical problems that one should be aware of when interpreting results derived from auditory chimaeras. First, one should be aware of the ear’s natural ability to recover the narrow-band envelope with the broad-band processing for a small number of frequency bands (e.g., 1 and 2). Second, one should be concerned about filter artifacts with the narrow-band processing for a large number of bands (e.g., 32, 48, and 64). In addition, we conducted two experiments to challenge Smith et al.’s assertion regarding the envelope and fine structure as the acoustic basis for the “what” and “where” mechanisms. In one experiment, we used Smith et al.’s program to chimaerize two sentences that had either a 15-dB interaural level difference or realistic interaural differences through HRTF filters. Under these conditions, we found that it was the envelope, rather than the fine structure, that determines sound localization. In another experiment, we performed classic filtering manipulation on the chimaerized sounds with 16 bands and a 700-ms delayed envelope or fine structure. With a low-pass filter having an 800-Hz cutoff frequency, we found that one could lateralize the sound to the side with leading fine structure but could not recognize speech. Conversely, with a high-pass filter having an 800-Hz cutoff frequency, one could easily recognize speech but could not lateralize the sound. This result suggests that the dichotomy revealed by the auditory chimaeras is an epiphenomenon of classic duplex perception between low- and high-frequency pathways.
Word count: 1965

back