In cochlear implants (CIs), different talkers often produce different levels of speech understanding because of the spectrally distorted speech patterns provided by the implant device. A spectral normalization approach was used to transform the spectral characteristics of one talker to those of another talker. In Experiment 1, speech recognition with two talkers was measured in CI users, with and without spectral normalization. Results showed that the spectral normalization algorithm had small but significant effect on performance. In Experiment 2, the effects of spectral normalization were measured in CI users and normal-hearing (NH) subjects; a pitch-stretching technique was used to simulate six talkers with different fundamental frequencies and vocal tract configurations. NH baseline performance was nearly perfect with these pitch-shift transformations. For CI subjects, while there was considerable intersubject variability in performance with the different pitch-shift transformations, spectral normalization significantly improved the intelligibility of these simulated talkers. The results from Experiments 1 and 2 demonstrate that spectral normalization toward more-intelligible talkers significantly improved CI users’ speech understanding with less-intelligible talkers. The results suggest that spectral normalization using optimal reference patterns for individual CI patients may compensate for some of the acoustic variability across talkers.

1.
Allen
,
J. S.
,
Miller
,
J. L.
, and
DeSteno
,
D.
(
2003
). “
Individual talker differences in voice-onset-time
,”
J. Acoust. Soc. Am.
113
,
544
552
.
2.
Assmann
,
P. F.
,
Nearey
,
T. M.
, and
Hogan
,
J. T.
(
1982
). “
Vowel identification: Orthographic, perceptual, and acoustic aspects
,”
J. Acoust. Soc. Am.
71
,
975
989
.
4.
Bond
,
Z. S.
, and
Moore
,
T. J.
(
1994
). “
A note on the acoustic-phonetic characteristics of inadvertently clear speech
,”
Speech Commun.
14
,
325
337
.
5.
Bradlow
,
A. R.
,
Torretta
,
G. M.
, and
Pisoni
,
D. B.
(
1996
). “
Intelligibility of normal speech 1. Global and fine-grained acoustic-phonetic talker characteristics
,”
Speech Commun.
20
,
255
272
.
6.
Cox
,
R. M.
,
Alexander
,
G. C.
, and
Gilmore
,
C.
(
1987
). “
Intelligibility of average talkers in typical listening environments
,”
J. Acoust. Soc. Am.
81
,
1598
1608
.
7.
Dorman
,
M. F.
,
Loizou
,
P. C.
, and
Rainey
,
D.
(
1997a
). “
Stimulating the effect of cochlear implant electrode insertion depth on speech understanding
,”
J. Acoust. Soc. Am.
102
,
2993
2996
.
8.
Dorman
,
M. F.
,
Loizou
,
P. C.
, and
Rainey
,
D.
(
1997b
). “
Speech intelligibility as a function of the number of channels of stimulation for signal processors using sine-wave and noise-band outputs
,”
J. Acoust. Soc. Am.
102
,
2403
2411
.
9.
Fu
,
Q. -J.
(
1997
). “
Speech perception in acoustic and electric hearing
,” Ph.D. dissertation,
University of Southern California
, Los Angeles, CA..
10.
Fu
,
Q. -J.
, and
Shannon
,
R. V.
(
1999
). “
Recognition of spectrally degraded and frequency shifted vowels in acoustic and electric hearing
,”
J. Acoust. Soc. Am.
105
,
1889
1900
.
11.
Fishman
,
K.
,
Shannon
,
R. V.
, and
Slattery
,
W. H.
(
1997
). “
Speech recognition as a function of the number of electrodes used in the SPEAK cochlear implant speech processor
,”
Hear. Res.
40
,
1201
1215
.
13.
Gordon-Salant
,
S.
, and
Fitzgibbons
,
P. J.
(
1997
). “
Selected cognitive factors and speech recognition performance among young and elderly listeners
,”
J. Speech Lang. Hear. Res.
40
,
423
431
.
14.
Green
,
T.
,
Katiri
,
S.
,
Faulkner
,
A.
, and
Rosen
,
S.
(
2007
). “
Talker intelligibility differences in cochlear implant listeners
,”
J. Acoust. Soc. Am.
121
,
EL223
EL229
.
15.
Greenwood
,
D. D.
(
1990
). “
A cochlear frequency-position function for several species—29 years later
,”
J. Acoust. Soc. Am.
87
,
2592
2605
.
16.
Hazan
,
V.
, and
Markham
,
D.
(
2004
). “
Acoustic-phonetic correlates of talker intelligibility for adults and children
,”
J. Acoust. Soc. Am.
116
,
3108
3118
.
17.
Hood
,
J. D.
, and
Poole
,
J. P.
(
1980
). “
Influence of the speaker and other factors affecting speech intelligibility
,”
Audiology
19
,
434
455
.
18.
Huang
,
X. -D.
,
Acero
,
A.
, and
Hon
,
H.-W.
(
2001
).
Spoken Language Processing—A Guide to Theory, Algorithm, and System Development
, (
Prentice Hall
, Englewood Cliffs, NJ).
19.
IEEE (
1969
).
IEEE Recommended Practice for Speech Quality Measurements
(
IEEE
, New York).
20.
Kain
,
A.
, and
Macon
,
M. W.
(
1998
). “
Spectral voice conversion for text-to-speech synthesis
,”
IEEE, ICASSP
,
1
,
285
288
.
21.
Kirk
,
K. I.
,
Pisoni
,
D. B.
, and
Miyamoto
,
R. C.
(
1997
). “
Effects of stimulus variability on speech perception in listeners with hearing impairment
,”
J. Speech Lang. Hear. Res.
40
,
1395
1405
.
22.
Kurdziel
,
S.
,
Noffsinger
,
D.
, and
Olsen
,
W.
(
1976
). “
Performance by cortical lesion patients on 40 and 60% time-compressed materials
,”
J. Am. Aud Soc.
2
,
3
7
.
23.
Liu
,
C.
,
Fu
,
Q. -J.
, and
Narayanan
,
S. S.
(
2006
). “
Smooth GMM based multi-talker spectral conversion for spectrally degraded speech
,”
IEEE ICASSP
5
,
141
144
.
24.
Liu
,
S.
,
Rio
,
E. D.
,
Bradlow
,
A. R.
, and
Zeng
,
F.-G.
(
2004
). “
Clear speech perception in acoustic and electric hearing
,”
J. Acoust. Soc. Am.
116
,
2374
2383
.
27.
Luo
,
X.
, and
Fu
,
Q. -J.
(
2005
). “
Speaker normalization for Chinese vowel recognition in cochlear implants
,”
IEEE Trans. Biomed. Eng.
52
,
1358
1361
.
28.
Mendel
,
J. M.
(
1995
).
Lessons on Estimation Theory for Signal Processing, Communications and Control
(
Prentice Hall
, Englewood Cliffs, NJ).
30.
Miller
,
J. L.
, and
Volaitis
,
L. E.
(
1989
). “
Effect of speaking rate on the perceptual structure of a phonetic category
,”
Percept. Psychophys.
46
,
505
512
.
31.
Mullennix
,
J. W.
,
Pisoni
,
D. B.
, and
Martin
,
C. S.
(
1989
). “
Some effects of talker variability on spoken word recognition
,”
J. Acoust. Soc. Am.
85
,
365
378
.
32.
Nejime
,
Y.
, and
Moore
,
B. C.
(
1998
). “
Evaluation of the effect of speech-rate slowing on speech intelligibility in noise using a simulation of cochlear hearing loss
,”
J. Acoust. Soc. Am.
103
,
572
576
.
36.
Pisoni
,
D. B.
(
1993
). “
Long term memory in speech perception: Some new findings on talker variability, speaking rate, and perceptual learning
,”
Speech Commun.
13
,
109
125
.
37.
Shannon
,
R. V.
,
Zeng
,
F. -G.
,
Kamath
,
V.
,
Wygonski
,
J.
, and
Ekelid
,
M.
(
1995
). “
Speech recognition with primarily temporal cues
,”
Science
270
,
303
304
.
38.
Sommers
,
M. S.
,
Nygaard
,
L. C.
, and
Pisoni
,
D. B.
(
1994
). “
Stimulus variability and spoken word recognition. I. Effects of variability in speaking rate and overall amplitude
,”
J. Acoust. Soc. Am.
96
,
1314
1324
.
39.
Stylianou
,
Y.
,
Cappe
,
O.
, and
Moulines
,
E.
(
1998
). “
Continuous probabilistic transform for voice conversion
,”
IEEE Trans. Speech Audio Process.
6
,
131
142
.
42.
Verbrugge
,
R. R.
,
Strange
,
W.
,
Shankweiler
,
D. P.
, and
Edman
,
T. R.
(
1976
). “
What information enables a listener to map a talker’s vowel space?
,”
J. Acoust. Soc. Am.
60
,
198
212
.
You do not currently have access to this content.