Individual head-related transfer functions (HRTFs) are usually measured with high spatial resolution or modeled with anthropometric parameters. This study proposed an HRTF individualization method using only spatially sparse measurements using a convolutional neural network (CNN). The HRTFs were represented by two-dimensional images, in which the horizontal and vertical ordinates indicated direction and frequency, respectively. The CNN was trained by using the HRTF images measured at specific sparse directions as input and using the corresponding images with a high spatial resolution as output in a prior HRTF database. The HRTFs of a new subject can be recovered by the trained CNN with the sparsely measured HRTFs. Objective experiments showed that, when using 23 directions to recover individual HRTFs at 1250 directions, the spectral distortion (SD) is around 4.4 dB; when using 105 directions, the SD reduced to around 3.8 dB. Subjective experiments showed that the individualized HRTFs recovered from 105 directions had smaller discrimination proportion than the baseline method and were perceptually undistinguishable in many directions. This method combines the spectral and spatial characteristics of HRTF for individualization, which has potential for improving virtual reality experience.

1.
Algazi
,
V. R.
,
Duda
,
R. O.
,
Thompson
,
D. M.
, and
Avendano
,
C.
(
2001
). “
The CIPIC HRTF database
,” in
Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575)
, October 24, New Paltz, NY, pp.
99
102
.
2.
Alon
,
D. L.
,
Ben-Hur
,
Z.
,
Rafaely
,
B.
, and
Mehra
,
R.
(
2018
). “
Sparse head-related transfer function representation with spatial aliasing cancellation
,” in
Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, April 15–20, Calgary, Canada, pp.
6792
6796
.
3.
Bates
,
A. P.
,
Khalid
,
Z.
, and
Kennedy
,
R. A.
(
2015
). “
Novel sampling scheme on the sphere for head-related transfer function measurements
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
23
(
6
),
1068
1081
.
4.
Cheng
,
C. I.
, and
Wakefield
,
G. H.
(
1999
). “
Spatial frequency response surfaces: An alternative visualization tool for head-related transfer functions (HRTFS)
,” in
Proceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP99) (Cat. No. 99CH36258)
, March 15–19, Phoenix, AZ, Vol.
2
, pp.
961
964
.
5.
Chun
,
C. J.
,
Moon
,
J. M.
,
Lee
,
G. W.
,
Kim
,
N. K.
, and
Kim
,
H. K.
(
2017
). “
Deep neural network based HRTF personalization using anthropometric measurements
,” in
Proceedings of the 143rd Convention of the Audio Engineering Society
, October 18–21, New York.
6.
Evans
,
M. J.
,
Angus
,
J. A.
, and
Tew
,
A. I.
(
1998
). “
Analyzing head-related transfer function measurements using surface spherical harmonics
,”
J. Acoust. Soc. Am.
104
(
4
),
2400
2411
.
7.
Grijalva
,
F.
,
Martini
,
L.
,
Florencio
,
D.
, and
Goldenstein
,
S.
(
2016
). “
A manifold learning approach for personalizing HRTFs from anthropometric features
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
24
(
3
),
559
570
.
8.
Grijalva
,
F.
,
Martini
,
L. C.
,
Florencio
,
D.
, and
Goldenstein
,
S.
(
2017
). “
Interpolation of head-related transfer functions using manifold learning
,”
IEEE Signal Process. Lett.
24
(
2
),
221
225
.
9.
Grindlay
,
G.
, and
Vasilescu
,
M. A. O.
(
2007
). “
A multilinear (tensor) framework for HRTF analysis and synthesis
,” in
2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07
, April 15–20, Honolulu, HI.
10.
Guillon
,
P.
,
Nicol
,
R.
, and
Simon
,
L.
(
2008
). “
Head-related transfer functions reconstruction from sparse measurements considering a priori knowledge from database analysis: A pattern recognition approach
,” in
Proceedings of the 125th Audio Engineering Society Convention
, October 2–5, San Francisco, CA.
11.
Hebrank
,
J.
, and
Wright
,
D.
(
1974
). “
Spectral cues used in the localization of sound sources on the median plane
,”
J. Acoust. Soc. Am.
56
(
6
),
1829
1834
.
12.
Jin
,
C. T.
,
Guillon
,
P.
,
Epain
,
N.
,
Zolfaghari
,
R.
,
Van Schaik
,
A.
,
Tew
,
A. I.
,
Hetherington
,
C.
, and
Thorpe
,
J.
(
2014
). “
Creating the Sydney York Morphological and Acoustic Recordings of Ears database
,”
IEEE Trans. Multimedia
16
(
1
),
37
46
.
13.
Kaneko
,
S.
,
Suenaga
,
T.
, and
Sekine
,
S.
(
2016
). “
Deepearnet: Individualizing spatial audio with photography, ear shape modeling, and neural networks
,” in
Proceedings of the Audio Engineering Society Conference: 2016 AES International Conference on Audio for Virtual and Augmented Reality
, September 30–October 1, Los Angeles.
14.
Katz
,
B. F.
(
2001
). “
Boundary element method calculation of individual head-related transfer function. I. rigid model calculation
,”
J. Acoust. Soc. Am.
110
(
5
),
2440
2448
.
15.
Kestler
,
G.
,
Yadegari
,
S.
, and
Nahamoo
,
D.
(
2019
). “
Head related impulse response interpolation and extrapolation using deep belief networks
,” in
Proceedings of ICASSP—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, May 12–17, Brighton, UK, pp.
266
270
.
16.
Kim
,
S.-M.
, and
Choi
,
W.
(
2005
). “
On the externalization of virtual sound images in headphone reproduction: A Wiener filter approach
,”
J. Acoust. Soc. Am.
117
(
6
),
3657
3665
.
17.
Kulkarni
,
A.
,
Isabelle
,
S.
, and
Colburn
,
H.
(
1999
). “
Sensitivity of human subjects to head-related transfer-function phase spectra
,”
J. Acoust. Soc. Am.
105
(
5
),
2821
2840
.
18.
Lee
,
G. W.
, and
Kim
,
H. K.
(
2018
). “
Personalized HRTF modeling based on deep neural network using anthropometric measurements and images of the ear
,”
Appl. Sci.
8
(
11
),
2180
.
19.
Lemaire
,
V.
,
Clerot
,
F.
,
Busson
,
S.
,
Nicol
,
R.
, and
Choqueuse
,
V.
(
2005
). “
Individualized HRTFs from few measurements: A statistical learning approach
,” in
Proceedings of the 2005 IEEE International Joint Conference on Neural Networks
, Montreal, Canada, July 31–August 4, Vol.
4
, pp.
2041
2046
.
20.
Li
,
L.
, and
Huang
,
Q.
(
2013
). “
HRTF personalization modeling based on RBF neural network
,” in
Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
, May 26–31, Vancouver, Canada, pp.
3707
3710
.
21.
Luo
,
Y.
,
Zotkin
,
D. N.
,
Daume
,
H.
, and
Duraiswami
,
R.
(
2013
). “
Kernel regression for head-related transfer function interpolation and spectral extrema extraction
,” in
Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
, May 26–31, Vancouver, Canada, pp.
256
260
.
22.
Meshram
,
A.
,
Mehra
,
R.
,
Yang
,
H.
,
Dunn
,
E.
,
Franm
,
J.-M.
, and
Manocha
,
D.
(
2014
). “
P-HRTF: Efficient personalized HRTF computation for high-fidelity spatial sound
,” in
Proceedings of the 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)
, September 10–12, Munich, Germany, pp.
53
61
.
23.
Minnaar
,
P.
,
Plogsties
,
J.
, and
Christensen
,
F.
(
2005
). “
Directional resolution of head-related transfer functions required in binaural synthesis
,”
J. Audio Eng. Soc.
53
(
10
),
919
929
.
24.
Nishino
,
T.
,
Inoue
,
N.
,
Takeda
,
K.
, and
Itakura
,
F.
(
2007
). “
Estimation of HRTFs on the horizontal plane using physical features
,”
Appl. Acoust.
68
(
8
),
897
908
.
25.
Pelzer
,
R.
,
Dinakaran
,
M.
,
Brinkmann
,
F.
,
Lepa
,
S.
,
Grosche
,
P.
, and
Weinzierl
,
S.
(
2020
). “
Head-related transfer function recommendation based on perceptual similarities and anthropometric features
,”
J. Acoust. Soc. Am.
148
(
6
),
3809
3817
.
26.
Qu
,
T.
,
Xiao
,
Z.
,
Gong
,
M.
,
Huang
,
Y.
,
Li
,
X.
, and
Wu
,
X.
(
2009
). “
Distance-dependent head-related transfer functions measured with high spatial resolution using a spark gap
,”
IEEE Trans. Audio Speech Lang. Process.
17
(
6
),
1124
1132
.
27.
Romigh
,
G. D.
(
2012
). “
Individualized head-related transfer functions: Efficient modeling and estimation from small sets of spatial samples
,” Ph.D. thesis,
Carnegie Mellon University
,
Pittsburgh, PA
.
28.
Ronneberger
,
O.
,
Fischer
,
P.
, and
Brox
,
T.
(
2015
). “
U-net: Convolutional networks for biomedical image segmentation
,” in
Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015
, edited by
N.
Navab
,
J.
Hornegger
,
W.
Wells
, and
A.
Frangi
(
Springer
,
Cham, Switzerland
), pp.
234
241
.
29.
Wang
,
J.
,
Liu
,
M.
,
Wang
,
X.
,
Liu
,
T.
, and
Xie
,
X.
(
2020
). “
Prediction of head-related transfer function based on tensor completion
,”
Appl. Acoust.
157
,
106995
.
30.
Wang
,
L.
, and
Zeng
,
X.
(
2016
). “
New method for synthesizing personalized head-related transfer function
,” in
Proceedings of the 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)
, September 13–16, Xi'an, China.
31.
Watanabe
,
K.
,
Iwaya
,
Y.
,
Suzuki
,
Y.
,
Takane
,
S.
, and
Sato
,
S.
(
2014
). “
Dataset of head-related transfer functions measured with a circular loudspeaker array
,”
Acoust. Sci. Tech.
35
(
3
),
159
165
.
32.
Wenzel
,
E. M.
,
Arruda
,
M.
,
Kistler
,
D. J.
, and
Wightman
,
F. L.
(
1993
). “
Localization using nonindividualized head-related transfer functions
,”
J. Acoust. Soc. Am.
94
(
1
),
111
123
.
33.
Xie
,
B.
(
2012
). “
Recovery of individual head-related transfer functions from a small set of measurements
,”
J. Acoust. Soc. Am.
132
(
1
),
282
294
.
34.
Yao
,
D.
,
Zhao
,
J.
,
Cheng
,
L.
,
Li
,
J.
,
Li
,
X.
,
Guo
,
X.
, and
Yan
,
Y.
(
2022
). “
An individualization approach for head-related transfer function in arbitrary directions based on deep learning
,”
JASA Express Lett.
2
(
6
),
064401
.
35.
Zhang
,
M.
,
Ge
,
Z.
,
Liu
,
T.
,
Wu
,
X.
, and
Qu
,
T.
(
2020
). “
Modeling of individual HRTFs based on spatial principal component analysis
,”
IEEE/ACM Trans. Audio Speech Lang. Process.
28
,
785
797
.
36.
Zhong
,
X.
, and
Xie
,
B.
(
2009
). “
Maximal azimuthal resolution needed in measurements of head-related transfer functions
,”
J. Acoust. Soc. Am.
125
(
4
),
2209
2220
.
37.
Zhou
,
Y.
,
Jiang
,
H.
, and
Ithapu
,
V. K.
(
2021
). “
On the predictability of HRTFs from ear shapes using deep networks
,” in
Proceedings of ICASSP 2021—IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, June 6–11, Toronto, Canada, pp.
441
445
.
You do not currently have access to this content.